[GitHub] [incubator-seatunnel] TyrantLucifer commented on a diff in pull request #2625: [Improve][Doc] Add key features in connector documents

GitBox Sat, 03 Sep 2022 04:44:07 -0700


TyrantLucifer commented on code in PR #2625:
URL: 
https://github.com/apache/incubator-seatunnel/pull/2625#discussion_r962143087



##########
docs/en/concept/connector-v2-features.md:
##########
@@ -0,0 +1,46 @@
+# Intro to connector V2 features
+
+## Source Connector Features
+
+### exactly-once
+
+If each piece of data in the data source will only be sent downstream by the 
source once, we think this source connector supports exactly once.
+
+In SeaTunnel, we can save the read **Split** and its **offset**(The position 
of the read data in split at that time,
+such as line number, byte size, offset, etc) as **StateSnapshot** when 
checkpoint. If the task restarted, we will get the last **StateSnapshot**
+and then locate the **Split** and **offset** read last time and continue to 
send data downstream.
+
+For example `File`, `Kafka`.
+
+### schema projection
+
+If the source connector supports selective reading of certain columns or 
supports the data format read through `schema` params, we think it supports 
schema projection.
+
+For example `JDBCSource` can use sql define read columns, `KafkaSource` can 
use `schema` params to define the read schema.
+
+### batch
+
+Batch Job Mode, The data read is bounded and the job will stop when all data 
read complete.
+
+### stream
+
+Streaming Job Mode, The data read is unbounded and the job never stop.
+
+### parallelism
+
+Parallelism Source Connector support config `parallelism`, every parallelism 
will create a task to read the data.

Review Comment:
   Agree with you.



##########
docs/en/concept/connector-v2-features.md:
##########
@@ -1,7 +1,18 @@
-# Intro to connector V2 features
+# Intro To Connector V2 Features
+
+## Differences Between Connector V2 And Connector v1
+
+Connector V2 is a connector defined based on the Seatunnel Connector API 
interface. Unlike Connector V1, Connector V2 supports the following features.
+
+* **Multi Engine Support** SeaTunnel Connector API is an engine independent 
API. The connectors developed based on this API can run in multiple engines. 
Currently, Flink and Spark are supported, and we will support other engines in 
the future.
+* **Multi Engine Version Support** Decoupling the connector from the engine 
through the translation layer solves the problem that most connectors need to 
modify the code in order to support a new version of the underlying engine.
+* **Unified Batch And Stream** Connector V2 can perform batch processing or 
streaming processing. We do not need to develop connectors for batch and stream 
separately.
+* **Multiplexing JDBC/Log connection.** Connector V2 supports JDBC resource 
reuse and sharing database log parsing.

Review Comment:
   How about add the design ideas about connector V2 api?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] TyrantLucifer commented on a diff in pull request #2625: [Improve][Doc] Add key features in connector documents

Reply via email to