Re: [PR] [lake/iceberg] Add iceberg documents for lakehouse support in fluss [fluss]

via GitHub Wed, 03 Sep 2025 22:50:17 -0700


MehulBatra commented on code in PR #1640:
URL: https://github.com/apache/fluss/pull/1640#discussion_r2320904477



##########
website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md:
##########
@@ -0,0 +1,343 @@
+---
+title: Iceberg
+sidebar_position: 2
+---
+
+# Iceberg
+
+[Apache Iceberg](https://iceberg.apache.org/) is an open table format for huge 
analytic datasets. It provides ACID transactions, schema evolution, and 
efficient data organization for data lakes.
+To integrate Fluss with Iceberg, you must enable lakehouse storage and 
configure Iceberg as the lakehouse storage. For more details, see [Enable 
Lakehouse 
Storage](maintenance/tiered-storage/lakehouse-storage.md#enable-lakehouse-storage).
+
+## Introduction
+
+When a table is created or altered with the option `'table.datalake.enabled' = 
'true'` and configured with Iceberg as the datalake format, Fluss will 
automatically create a corresponding Iceberg table with the same table path.
+The schema of the Iceberg table matches that of the Fluss table, except for 
the addition of three system columns at the end: `__bucket`, `__offset`, and 
`__timestamp`.  
+These system columns help Fluss clients consume data from Iceberg in a 
streaming fashion, such as seeking by a specific bucket using an offset or 
timestamp.
+
+```sql title="Flink SQL"
+USE CATALOG fluss_catalog;
+
+CREATE TABLE fluss_order_with_lake (
+    `order_key` BIGINT,
+    `cust_key` INT NOT NULL,
+    `total_price` DECIMAL(15, 2),
+    `order_date` DATE,
+    `order_priority` STRING,
+    `clerk` STRING,
+    `ptime` AS PROCTIME(),
+    PRIMARY KEY (`order_key`) NOT ENFORCED
+ ) WITH (
+     'table.datalake.enabled' = 'true',
+     'table.datalake.freshness' = '30s'
+);
+```
+
+Then, the datalake tiering service continuously tiers data from Fluss to 
Iceberg. The parameter `table.datalake.freshness` controls the frequency that 
Fluss writes data to Iceberg tables. By default, the data freshness is 3 
minutes.  
+For primary key tables, changelogs are also generated in the Iceberg format, 
enabling stream-based consumption via Iceberg APIs. Primary key tables use 
merge-on-read (MOR) strategy for efficient updates and deletes.

Review Comment:
   my bad corrected it with delete files 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [lake/iceberg] Add iceberg documents for lakehouse support in fluss [fluss]

Reply via email to