1. Job Run On K8s/Yarn The target zeta engine only supports local and
standalone modes when submitting jobs. Standalone is suitable for CDC
synchronization scenarios with a large number of small tables. The
characteristic of real-time CDC synchronization is that it takes up
resources for a long time but may have a small amount of data. In this
scenario, using standalone mode to share resources can effectively improve
resource utilization. But in the scenario of offline batch synchronization,
running each job using a separate process can reduce the mutual influence
between jobs. The most effective way now is to submit the job to k8s or
yarn.
2. More connector support
3. Catalog adapts to more connectors. Catalog related adaptations can
mainly help connectors obtain more accurate data structure information,
facilitate downstream automatic table building, and implicit data type
conversion. However, currently only a portion of connectors have
implemented interfaces equivalent to catalog, and more connectors need to
be implemented in the future.
4. Design and adaptation of TypeConverter and DataTypeConverter. The goal
of TypeConverter is to enable each connector to more accurately describe
the conversion and inverse conversion between the database's own data type
and SeaTunnel data type. At the API level, development should be completed,
and all connectors need to be adapted and implemented in the future,
TypeConverter can help SeaTunnel better perform data model inference and
generate table creation statements during automatic table creation.
DataTypeConverter will work together with TypeConverter to help SeaTunnel
better achieve implicit conversion of data types between different
databases. For example, in the JDBC Oracle Sink scenario, when to use
setString when writing String types in SeaTunnel, and when to use blob,
DataTypeConverter will combine with TypeConverter to determine the length
of the field, the current field type, and other information.
5. Event notification machine. Currently, SeaTunnel lacks an event passing
mechanism, such as task failure, success, and the occurrence of certain
events.
6. Table level monitoring. The current job monitoring information is job
level, and multi table synchronization is already supported in the latest
version of SeaTunnel, which synchronizes data from multiple tables in one
job. The goal of table level monitoring is to enable users to understand
the synchronization status of each table through monitoring information.
7. Dirty data collection. During synchronization tasks, in some cases, some
data may not be able to be written to the target end properly. The current
approach is to directly fail the job. We plan to support dirty data
collection function, and store data that cannot be written as dirty data
first, without affecting the normal operation of the job.

Jia Fan <[email protected]> 于2024年3月6日周三 14:55写道:

> We need to provide a mature solution based on yarn or k8s.
>
> ________________________
>
> Jia Fan
>
>
>
> > 2024年3月5日 15:18,gaojun2048 <[email protected]> 写道:
> >
> > Hi, Community,
> >
> > The SeaTunnel community has made significant progress in 2023, with
> > SeaTunnel's features becoming increasingly powerful and the number of
> users
> > growing rapidly. Thank you to everyone in the community.
> >
> > As a professional tool for data synchronization, SeaTunnel still has a
> lot
> > of work to complete, such as run on k8s, run on yarn, more connectors,
> more
> > comprehensive support for automatic table creation, and data type
> inference.
> >
> > Here, everyone can discuss SeaTunnel's 2024 roadmap, which will determine
> > the main goals and directions of the community in the future.
> >
> >
> > --
> >
> > Best Regards
> >
> > ------------
> >
> > Apache ID: gaojun2048
> >
> > Github ID: EricJoy2048
> >
> > Mail: [email protected]
>
>

-- 

Best Regards

------------

Apache ID: gaojun2048

Github ID: EricJoy2048

Mail: [email protected]

Reply via email to