EricJoy2048 commented on code in PR #2615: URL: https://github.com/apache/incubator-seatunnel/pull/2615#discussion_r965551737
########## docs/en/connector-v2/source/Iceberg.md: ########## @@ -0,0 +1,140 @@ +# Apache Iceberg + +> Apache Iceberg source connector Review Comment: From https://github.com/apache/incubator-seatunnel/pull/2625 you can know we have redefined the format of the document. Please modify the document according to the current latest document format ########## docs/en/connector-v2/source/Iceberg.md: ########## @@ -0,0 +1,140 @@ +# Apache Iceberg + +> Apache Iceberg source connector + +## Description + +Source connector for Apache Iceberg. It can support batch and stream mode. + +## Options + +| name | type | required | default value | +|-----------------------------------|----------|----------|-------------------------| +| catalog_name | string | yes | - | +| catalog_type | string | yes | - | +| uri | string | false | - | +| warehouse | string | yes | - | +| namespace | string | yes | - | +| table | string | yes | - | +| case_sensitive | boolean | false | false | +| start_snapshot_timestamp | long | false | - | +| start_snapshot_id | long | false | - | +| end_snapshot_id | long | false | - | +| use_snapshot_id | long | false | - | +| use_snapshot_timestamp | long | false | - | +| stream_scan_strategy | string | false | FROM_LATEST_SNAPSHOT | + +### catalog_name [string] + +User-specified catalog name. + +### catalog_type [string] + +The optional values are: +- hive: The hive metastore catalog. +- hadoop: The hadoop catalog. + +### uri [string] + +The Hive metastore’s thrift URI. + +### warehouse [string] + +The location to store metadata files and data files. + +### namespace [string] + +The iceberg database name in the backend catalog. + +### table [string] + +The iceberg table name in the backend catalog. + +### case_sensitive [boolean] + +If data columns where selected via fields(Collection), controls whether the match to the schema will be done with case sensitivity. + +### fields [array] + +Use projection to select data columns and columns order. + +### start_snapshot_id [long] + +Instructs this scan to look for changes starting from a particular snapshot (exclusive). + +### start_snapshot_timestamp [long] + +Instructs this scan to look for changes starting from the most recent snapshot for the table as of the timestamp. timestamp – the timestamp in millis since the Unix epoch + +### end_snapshot_id [long] + +Instructs this scan to look for changes up to a particular snapshot (inclusive). + +### use_snapshot_id [long] + +Instructs this scan to look for use the given snapshot ID. + +### use_snapshot_timestamp [long] + +Instructs this scan to look for use the most recent snapshot as of the given time in milliseconds. timestamp – the timestamp in millis since the Unix epoch + +### stream_scan_strategy [string] + +Starting strategy for stream mode execution, Default to use `FROM_LATEST_SNAPSHOT` if don’t specify any value. +The optional values are: +- TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the incremental mode. +- FROM_LATEST_SNAPSHOT: Start incremental mode from the latest snapshot inclusive. +- FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest snapshot inclusive. +- FROM_SNAPSHOT_ID: Start incremental mode from a snapshot with a specific id inclusive. +- FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot with a specific timestamp inclusive. + +## Example + +simple + +```hocon +source { + Iceberg { + catalog_name = "seatunnel" + catalog_type = "hadoop" + warehouse = "file:///tmp/seatunnel/iceberg/" Review Comment: Whether we need change `file:///` to `hdfs://` while the `catalog_type` is `hadoop`. ########## docs/en/connector-v2/source/Iceberg.md: ########## @@ -0,0 +1,140 @@ +# Apache Iceberg + +> Apache Iceberg source connector + +## Description + +Source connector for Apache Iceberg. It can support batch and stream mode. + +## Options + +| name | type | required | default value | +|-----------------------------------|----------|----------|-------------------------| +| catalog_name | string | yes | - | +| catalog_type | string | yes | - | +| uri | string | false | - | +| warehouse | string | yes | - | +| namespace | string | yes | - | +| table | string | yes | - | +| case_sensitive | boolean | false | false | +| start_snapshot_timestamp | long | false | - | +| start_snapshot_id | long | false | - | +| end_snapshot_id | long | false | - | +| use_snapshot_id | long | false | - | +| use_snapshot_timestamp | long | false | - | +| stream_scan_strategy | string | false | FROM_LATEST_SNAPSHOT | + +### catalog_name [string] + +User-specified catalog name. + +### catalog_type [string] + +The optional values are: +- hive: The hive metastore catalog. +- hadoop: The hadoop catalog. + +### uri [string] + +The Hive metastore’s thrift URI. + +### warehouse [string] + +The location to store metadata files and data files. + +### namespace [string] + +The iceberg database name in the backend catalog. + +### table [string] + +The iceberg table name in the backend catalog. + +### case_sensitive [boolean] + +If data columns where selected via fields(Collection), controls whether the match to the schema will be done with case sensitivity. + +### fields [array] + +Use projection to select data columns and columns order. + +### start_snapshot_id [long] + +Instructs this scan to look for changes starting from a particular snapshot (exclusive). + +### start_snapshot_timestamp [long] + +Instructs this scan to look for changes starting from the most recent snapshot for the table as of the timestamp. timestamp – the timestamp in millis since the Unix epoch + +### end_snapshot_id [long] + +Instructs this scan to look for changes up to a particular snapshot (inclusive). + +### use_snapshot_id [long] + +Instructs this scan to look for use the given snapshot ID. + +### use_snapshot_timestamp [long] + +Instructs this scan to look for use the most recent snapshot as of the given time in milliseconds. timestamp – the timestamp in millis since the Unix epoch + +### stream_scan_strategy [string] + +Starting strategy for stream mode execution, Default to use `FROM_LATEST_SNAPSHOT` if don’t specify any value. +The optional values are: +- TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the incremental mode. +- FROM_LATEST_SNAPSHOT: Start incremental mode from the latest snapshot inclusive. +- FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest snapshot inclusive. +- FROM_SNAPSHOT_ID: Start incremental mode from a snapshot with a specific id inclusive. +- FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot with a specific timestamp inclusive. + +## Example + +simple + +```hocon +source { + Iceberg { + catalog_name = "seatunnel" + catalog_type = "hadoop" + warehouse = "file:///tmp/seatunnel/iceberg/" + namespace = "your_iceberg_database" + table = "your_iceberg_table" + } +} +``` +Or + +```hocon +source { + Iceberg { + catalog_name = "seatunnel" + catalog_type = "hive" + uri = "thrift://localhost:9083" + warehouse = "file:///tmp/seatunnel/iceberg/" + namespace = "your_iceberg_database" + table = "your_iceberg_table" + } +} +``` + +schema projection + +```hocon +source { + Iceberg { + catalog_name = "seatunnel" + catalog_type = "hadoop" Review Comment: If we use `hadoop` in connector, we need comment the hadoop version we can support in the document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
