Re:[DISCUSS] Decoupling connectors from compute engines

CalvinKirs Mon, 09 May 2022 05:57:57 -0700

Hi,
 In my role as a community contributor, I would like to say that we are 
currently experiencing a few problems with the community:
1: A connector needs to be written twice (to adapt to different engines), even 
if it is actually copied, there is not much difference.
2: The adaptation of Flink or Spark version leads to a large number of code 
modifications, and each upgrade or version adaptation faces a large number of 
tests and modifications.
3: When we want to consider the improvement of peripheral functions, such as UI 
and monitoring, we actually need a unified API.

I'm glad to see Zongwen bring up the discussion on this, I would say, it's a
good idea, my only concern is whether we can complete the abstract design of
the API due to the differences in the engines themselves (API). It's just a
conservative concern, but anyway, I support the design and want to join it.

BTW, we only support English discussion.

Best wishes！
Calvin Kirs

On 05/8/2022 15:20，jianju1024<[email protected]> wrote：

大家好，

我也有同感：像Spark？Flink？Beam？四不像吧！！
1. 为什么要拆解ETL？到底什么是E T L？本身就解不开
2. 基于多套引擎的话题，我在几个微信群里，不少人讨论很久，也没有定论
3. 如果是解决ETL难题，为什么要纠结于抽象在两套不同的引擎上呢？
4. 相比DataX，FlinkX，差太多了吧

2022年5月7日 17:36，Zongwen Li <[email protected]> 写道：

The goal of Apache SeaTunnel is different from Apache Beam.
Apache SeaTunnel focuses on source and sink connectors, and develops
features in the field of data integration;
Apache Beam focuses and unifies all the functions of the compute engine,
including operators such as join, connect, map, etc. and it doesn't unify
streaming and batch source.

This improvement proposal is to solve the current problems encountered by
SeaTunnel . If you have better ideas, you can bring them up for discussion.

Best,
Zongwen Li

leo65535 <[email protected]> 于2022年4月29日周五 16:14写道：

Hi @zongwen,

I think this is not a good idea, it seems that we will be more and more
like Apache Beam,

Best,
Leo65535

At 2022-04-18 15:10:08, "李宗文" <[email protected]> wrote:
Hi All,
In the current implementation of SeaTunnel, the connector is coupled with
the computing engine, which results in a connector that needs to be
implemented for each engine, and it is difficult to support multiple
versions of the engine.

Through the questionnaire, it was found that users used multiple versions
of Spark and Flink engines, and they also hoped that SeaTunnel would
support Change Data Capture (CDC) connectors;

Based on the above questions and needs, I created an improvement proposal:
https://github.com/apache/incubator-seatunnel/issues/1608
Preliminary idea of Source and Sink API:
https://github.com/apache/incubator-seatunnel/issues/1701
https://github.com/apache/incubator-seatunnel/issues/1704

Please discuss away! Zongwen Li

Re:[DISCUSS] Decoupling connectors from compute engines

Reply via email to