xx789633 commented on code in PR #1640: URL: https://github.com/apache/fluss/pull/1640#discussion_r2320733428
########## website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md: ########## @@ -0,0 +1,343 @@ +--- +title: Iceberg +sidebar_position: 2 +--- + +# Iceberg + +[Apache Iceberg](https://iceberg.apache.org/) is an open table format for huge analytic datasets. It provides ACID transactions, schema evolution, and efficient data organization for data lakes. +To integrate Fluss with Iceberg, you must enable lakehouse storage and configure Iceberg as the lakehouse storage. For more details, see [Enable Lakehouse Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage). + +## Introduction + +When a table is created or altered with the option `'table.datalake.enabled' = 'true'` and configured with Iceberg as the datalake format, Fluss will automatically create a corresponding Iceberg table with the same table path. +The schema of the Iceberg table matches that of the Fluss table, except for the addition of three system columns at the end: `__bucket`, `__offset`, and `__timestamp`. +These system columns help Fluss clients consume data from Iceberg in a streaming fashion, such as seeking by a specific bucket using an offset or timestamp. + +```sql title="Flink SQL" +USE CATALOG fluss_catalog; + +CREATE TABLE fluss_order_with_lake ( + `order_key` BIGINT, + `cust_key` INT NOT NULL, + `total_price` DECIMAL(15, 2), + `order_date` DATE, + `order_priority` STRING, + `clerk` STRING, + `ptime` AS PROCTIME(), + PRIMARY KEY (`order_key`) NOT ENFORCED + ) WITH ( + 'table.datalake.enabled' = 'true', + 'table.datalake.freshness' = '30s' +); +``` + +Then, the datalake tiering service continuously tiers data from Fluss to Iceberg. The parameter `table.datalake.freshness` controls the frequency that Fluss writes data to Iceberg tables. By default, the data freshness is 3 minutes. +For primary key tables, changelogs are also generated in the Iceberg format, enabling stream-based consumption via Iceberg APIs. Primary key tables use merge-on-read (MOR) strategy for efficient updates and deletes. Review Comment: This is confusing to me. Does Iceberg provide change log? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
