[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #2615: [Feature][Connector-V2] Add iceberg source connector

GitBox Fri, 09 Sep 2022 01:54:51 -0700


EricJoy2048 commented on code in PR #2615:
URL: 
https://github.com/apache/incubator-seatunnel/pull/2615#discussion_r965551737



##########
docs/en/connector-v2/source/Iceberg.md:
##########
@@ -0,0 +1,140 @@
+# Apache Iceberg
+
+> Apache Iceberg source connector

Review Comment:
   From https://github.com/apache/incubator-seatunnel/pull/2625 you can know we 
have redefined the format of the document. Please modify the document according 
to the current latest document format



##########
docs/en/connector-v2/source/Iceberg.md:
##########
@@ -0,0 +1,140 @@
+# Apache Iceberg
+
+> Apache Iceberg source connector
+
+## Description
+
+Source connector for Apache Iceberg. It can support batch and stream mode.
+
+##  Options
+
+| name                              | type     | required | default value      
     |
+|-----------------------------------|----------|----------|-------------------------|
+| catalog_name                      | string   | yes      | -                  
     |
+| catalog_type                      | string   | yes      | -                  
     |
+| uri                               | string   | false    | -                  
     |
+| warehouse                         | string   | yes      | -                  
     |
+| namespace                         | string   | yes      | -                  
     |
+| table                             | string   | yes      | -                  
     |
+| case_sensitive                    | boolean  | false    | false              
     |
+| start_snapshot_timestamp          | long     | false    | -                  
     |
+| start_snapshot_id                 | long     | false    | -                  
     |
+| end_snapshot_id                   | long     | false    | -                  
     |
+| use_snapshot_id                   | long     | false    | -                  
     |
+| use_snapshot_timestamp            | long     | false    | -                  
     |
+| stream_scan_strategy              | string   | false    | 
FROM_LATEST_SNAPSHOT    |
+
+### catalog_name [string]
+
+User-specified catalog name.
+
+### catalog_type [string]
+
+The optional values are:
+- hive: The hive metastore catalog.
+- hadoop: The hadoop catalog.
+
+### uri [string]
+
+The Hive metastore’s thrift URI.
+
+### warehouse [string]
+
+The location to store metadata files and data files.
+
+### namespace [string]
+
+The iceberg database name in the backend catalog.
+
+### table [string]
+
+The iceberg table name in the backend catalog.
+
+### case_sensitive [boolean]
+
+If data columns where selected via fields(Collection), controls whether the 
match to the schema will be done with case sensitivity.
+
+### fields [array]
+
+Use projection to select data columns and columns order.
+
+### start_snapshot_id [long]
+
+Instructs this scan to look for changes starting from a particular snapshot 
(exclusive).
+
+### start_snapshot_timestamp [long]
+
+Instructs this scan to look for changes starting from  the most recent 
snapshot for the table as of the timestamp. timestamp – the timestamp in millis 
since the Unix epoch
+
+### end_snapshot_id [long]
+
+Instructs this scan to look for changes up to a particular snapshot 
(inclusive).
+
+### use_snapshot_id [long]
+
+Instructs this scan to look for use the given snapshot ID.
+
+### use_snapshot_timestamp [long]
+
+Instructs this scan to look for use the most recent snapshot as of the given 
time in milliseconds. timestamp – the timestamp in millis since the Unix epoch
+
+### stream_scan_strategy [string]
+
+Starting strategy for stream mode execution, Default to use 
`FROM_LATEST_SNAPSHOT` if don’t specify any value.
+The optional values are:
+- TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the 
incremental mode.
+- FROM_LATEST_SNAPSHOT: Start incremental mode from the latest snapshot 
inclusive.
+- FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest snapshot 
inclusive.
+- FROM_SNAPSHOT_ID: Start incremental mode from a snapshot with a specific id 
inclusive.
+- FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot with a 
specific timestamp inclusive.
+
+## Example
+
+simple
+
+```hocon
+source {
+  Iceberg {
+    catalog_name = "seatunnel"
+    catalog_type = "hadoop"
+    warehouse = "file:///tmp/seatunnel/iceberg/"

Review Comment:
   Whether we need change `file:///` to `hdfs://` while the `catalog_type` is 
`hadoop`.



##########
docs/en/connector-v2/source/Iceberg.md:
##########
@@ -0,0 +1,140 @@
+# Apache Iceberg
+
+> Apache Iceberg source connector
+
+## Description
+
+Source connector for Apache Iceberg. It can support batch and stream mode.
+
+##  Options
+
+| name                              | type     | required | default value      
     |
+|-----------------------------------|----------|----------|-------------------------|
+| catalog_name                      | string   | yes      | -                  
     |
+| catalog_type                      | string   | yes      | -                  
     |
+| uri                               | string   | false    | -                  
     |
+| warehouse                         | string   | yes      | -                  
     |
+| namespace                         | string   | yes      | -                  
     |
+| table                             | string   | yes      | -                  
     |
+| case_sensitive                    | boolean  | false    | false              
     |
+| start_snapshot_timestamp          | long     | false    | -                  
     |
+| start_snapshot_id                 | long     | false    | -                  
     |
+| end_snapshot_id                   | long     | false    | -                  
     |
+| use_snapshot_id                   | long     | false    | -                  
     |
+| use_snapshot_timestamp            | long     | false    | -                  
     |
+| stream_scan_strategy              | string   | false    | 
FROM_LATEST_SNAPSHOT    |
+
+### catalog_name [string]
+
+User-specified catalog name.
+
+### catalog_type [string]
+
+The optional values are:
+- hive: The hive metastore catalog.
+- hadoop: The hadoop catalog.
+
+### uri [string]
+
+The Hive metastore’s thrift URI.
+
+### warehouse [string]
+
+The location to store metadata files and data files.
+
+### namespace [string]
+
+The iceberg database name in the backend catalog.
+
+### table [string]
+
+The iceberg table name in the backend catalog.
+
+### case_sensitive [boolean]
+
+If data columns where selected via fields(Collection), controls whether the 
match to the schema will be done with case sensitivity.
+
+### fields [array]
+
+Use projection to select data columns and columns order.
+
+### start_snapshot_id [long]
+
+Instructs this scan to look for changes starting from a particular snapshot 
(exclusive).
+
+### start_snapshot_timestamp [long]
+
+Instructs this scan to look for changes starting from  the most recent 
snapshot for the table as of the timestamp. timestamp – the timestamp in millis 
since the Unix epoch
+
+### end_snapshot_id [long]
+
+Instructs this scan to look for changes up to a particular snapshot 
(inclusive).
+
+### use_snapshot_id [long]
+
+Instructs this scan to look for use the given snapshot ID.
+
+### use_snapshot_timestamp [long]
+
+Instructs this scan to look for use the most recent snapshot as of the given 
time in milliseconds. timestamp – the timestamp in millis since the Unix epoch
+
+### stream_scan_strategy [string]
+
+Starting strategy for stream mode execution, Default to use 
`FROM_LATEST_SNAPSHOT` if don’t specify any value.
+The optional values are:
+- TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the 
incremental mode.
+- FROM_LATEST_SNAPSHOT: Start incremental mode from the latest snapshot 
inclusive.
+- FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest snapshot 
inclusive.
+- FROM_SNAPSHOT_ID: Start incremental mode from a snapshot with a specific id 
inclusive.
+- FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot with a 
specific timestamp inclusive.
+
+## Example
+
+simple
+
+```hocon
+source {
+  Iceberg {
+    catalog_name = "seatunnel"
+    catalog_type = "hadoop"
+    warehouse = "file:///tmp/seatunnel/iceberg/"
+    namespace = "your_iceberg_database"
+    table = "your_iceberg_table"
+  }
+}
+```
+Or
+
+```hocon
+source {
+  Iceberg {
+    catalog_name = "seatunnel"
+    catalog_type = "hive"
+    uri = "thrift://localhost:9083"
+    warehouse = "file:///tmp/seatunnel/iceberg/"
+    namespace = "your_iceberg_database"
+    table = "your_iceberg_table"
+  }
+}
+```
+
+schema projection
+
+```hocon
+source {
+  Iceberg {
+    catalog_name = "seatunnel"
+    catalog_type = "hadoop"

Review Comment:
   If we use `hadoop` in connector, we need comment the hadoop version we can 
support in the document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #2615: [Feature][Connector-V2] Add iceberg source connector

Reply via email to