CalvinKirs commented on code in PR #3619: URL: https://github.com/apache/incubator-seatunnel/pull/3619#discussion_r1036624475
########## README.md: ########## @@ -19,49 +19,43 @@ been used in the production of nearly 100 companies. ## Why do we need SeaTunnel -SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data: +SeaTunnel focuses on data integration and data synchronization, and is mainly designed to solve common problems in the field of data integration: -- Data loss and duplication -- Task accumulation and delay -- Low throughput -- Long cycle to be applied in the production environment -- Lack of application running status monitoring +- Various data sources: There are hundreds of commonly-used data sources of which versions are incompatible. With the emergence of new technologies, more data sources are appearing. It is difficult for users to find a tool that can fully and quickly support these data sources. +- Complex synchronization scenarios: Data synchronization needs to support various synchronization scenarios such as offline-full synchronization, offline-incremental synchronization, CDC, real-time synchronization, and full database synchronization. +- High demand in resource: Existing data integration and data synchronization tools often require vast computing resources or JDBC connection resources to complete real-time synchronization of massive small tables. This has increased the burden on enterprises to a certain extent. +- Lack of quality and monitoring: Data integration and synchronization processes often experience loss or duplication of data. The synchronization process lacks monitoring, and it is impossible to intuitively understand the real-situation of the data during the task process. +- Complex technology stack: The technology components used by enterprises are different, and users need to develop corresponding synchronization programs for different components to complete data integration. +- Difficulty in management and maintenance: Limited to different underlying technology components (Flink/Spark) , offline synchronization and real-time synchronization often have be developed and managed separately, which increases the difficulty of the management and maintainance. -## SeaTunnel use scenarios +## Features of SeaTunnel -- Mass data synchronization -- Mass data integration -- ETL with massive data -- Mass data aggregation -- Multi-source data processing +- Rich and extensible Connector: SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run on many different engines, such as SeaTunnel Engine, Flink, Spark that are currently supported. +- Connector plugin: The plugin design allows users to easily develop their own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel has supported more than 70 Connectors, and the number is surging. There is the list of the currently supported connectors: xxxxxxx, and the list of planned connectors: xxxxxxx. Review Comment: We can link connector-status and connector-support plan to here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
