1. Job Run On K8s/Yarn The target zeta engine only supports local and standalone modes when submitting jobs. Standalone is suitable for CDC synchronization scenarios with a large number of small tables. The characteristic of real-time CDC synchronization is that it takes up resources for a long time but may have a small amount of data. In this scenario, using standalone mode to share resources can effectively improve resource utilization. But in the scenario of offline batch synchronization, running each job using a separate process can reduce the mutual influence between jobs. The most effective way now is to submit the job to k8s or yarn. 2. More connector support 3. Catalog adapts to more connectors. Catalog related adaptations can mainly help connectors obtain more accurate data structure information, facilitate downstream automatic table building, and implicit data type conversion. However, currently only a portion of connectors have implemented interfaces equivalent to catalog, and more connectors need to be implemented in the future. 4. Design and adaptation of TypeConverter and DataTypeConverter. The goal of TypeConverter is to enable each connector to more accurately describe the conversion and inverse conversion between the database's own data type and SeaTunnel data type. At the API level, development should be completed, and all connectors need to be adapted and implemented in the future, TypeConverter can help SeaTunnel better perform data model inference and generate table creation statements during automatic table creation. DataTypeConverter will work together with TypeConverter to help SeaTunnel better achieve implicit conversion of data types between different databases. For example, in the JDBC Oracle Sink scenario, when to use setString when writing String types in SeaTunnel, and when to use blob, DataTypeConverter will combine with TypeConverter to determine the length of the field, the current field type, and other information. 5. Event notification machine. Currently, SeaTunnel lacks an event passing mechanism, such as task failure, success, and the occurrence of certain events. 6. Table level monitoring. The current job monitoring information is job level, and multi table synchronization is already supported in the latest version of SeaTunnel, which synchronizes data from multiple tables in one job. The goal of table level monitoring is to enable users to understand the synchronization status of each table through monitoring information. 7. Dirty data collection. During synchronization tasks, in some cases, some data may not be able to be written to the target end properly. The current approach is to directly fail the job. We plan to support dirty data collection function, and store data that cannot be written as dirty data first, without affecting the normal operation of the job.
Jia Fan <[email protected]> 于2024年3月6日周三 14:55写道: > We need to provide a mature solution based on yarn or k8s. > > ________________________ > > Jia Fan > > > > > 2024年3月5日 15:18,gaojun2048 <[email protected]> 写道: > > > > Hi, Community, > > > > The SeaTunnel community has made significant progress in 2023, with > > SeaTunnel's features becoming increasingly powerful and the number of > users > > growing rapidly. Thank you to everyone in the community. > > > > As a professional tool for data synchronization, SeaTunnel still has a > lot > > of work to complete, such as run on k8s, run on yarn, more connectors, > more > > comprehensive support for automatic table creation, and data type > inference. > > > > Here, everyone can discuss SeaTunnel's 2024 roadmap, which will determine > > the main goals and directions of the community in the future. > > > > > > -- > > > > Best Regards > > > > ------------ > > > > Apache ID: gaojun2048 > > > > Github ID: EricJoy2048 > > > > Mail: [email protected] > > -- Best Regards ------------ Apache ID: gaojun2048 Github ID: EricJoy2048 Mail: [email protected]
