This is an automated email from the ASF dual-hosted git repository. jark pushed a commit to branch release-0.8 in repository https://gitbox.apache.org/repos/asf/fluss.git
commit 13a16e8682aab5fc4686eb6d0817154b2e5e64b4 Author: MehulBatra <[email protected]> AuthorDate: Fri Oct 31 13:22:02 2025 +0530 remove unwanted emojis to make document standard (#1865) (cherry picked from commit 0b5ec3ba4b54a1889588ffd3ef1983ecd0695180) --- .../integrate-data-lakes/iceberg.md | 46 +++++++++++----------- .../integrate-data-lakes/paimon.md | 2 +- 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md b/website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md index a51c967c0..4e299ebe1 100644 --- a/website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md +++ b/website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md @@ -13,9 +13,9 @@ To integrate Fluss with Iceberg, you must enable lakehouse storage and configure > **NOTE**: Iceberg requires JDK11 or later. Please ensure that both your > Fluss deployment and the Flink cluster used for tiering services are running > on JDK11+. -## ⚙️ Configure Iceberg as LakeHouse Storage +## Configure Iceberg as LakeHouse Storage -### 🔧 Configure Iceberg in Cluster Configurations +### Configure Iceberg in Cluster Configurations To configure Iceberg as the lakehouse storage, you must configure the following configurations in `server.yaml`: ```yaml @@ -27,13 +27,13 @@ datalake.iceberg.type: hadoop datalake.iceberg.warehouse: /tmp/iceberg ``` -#### 🔧 Configuration Processing +#### Configuration Processing Fluss processes Iceberg configurations by stripping the `datalake.iceberg.` prefix and uses the stripped configurations (without the prefix `datalake.iceberg.`) to initialize the Iceberg catalog. This approach enables passing custom configurations for Iceberg catalog initialization. Check out the [Iceberg Catalog Properties](https://iceberg.apache.org/docs/1.9.1/configuration/#catalog-properties) for more details on available catalog configurations. -#### 📋 Supported Catalog Types +#### Supported Catalog Types Fluss supports all Iceberg-compatible catalog types: @@ -56,7 +56,7 @@ datalake.iceberg.catalog-impl: <your_iceberg_catalog_impl_class_name> datalake.iceberg.catalog-impl: org.apache.iceberg.snowflake.SnowflakeCatalog ``` -#### 🔧 Prerequisites +#### Prerequisites ##### 1. Hadoop Dependencies Configuration @@ -95,19 +95,19 @@ Fluss only bundles catalog implementations included in the `iceberg-core` module The Iceberg version that Fluss bundles is based on `1.9.1`. Please ensure the JARs you add are compatible with `Iceberg-1.9.1`. -#### ⚠️ Important Notes +#### Important Notes - Ensure all JAR files are compatible with Iceberg 1.9.1 - If using an existing Hadoop environment, it's recommended to use the `HADOOP_CLASSPATH` environment variable - Configuration changes take effect after restarting the Fluss service -### 🚀 Start Tiering Service to Iceberg +### Start Tiering Service to Iceberg To tier Fluss's data to Iceberg, you must start the datalake tiering service. For guidance, you can refer to [Start The Datalake Tiering Service](maintenance/tiered-storage/lakehouse-storage.md#start-the-datalake-tiering-service). Although the example uses Paimon, the process is also applicable to Iceberg. -#### 🔧 Prerequisites: Hadoop Dependencies +#### Prerequisites: Hadoop Dependencies -**⚠️ Important**: Iceberg has a strong dependency on Hadoop. You must ensure Hadoop-related classes are available in the classpath before starting the tiering service. +**Important**: Iceberg has a strong dependency on Hadoop. You must ensure Hadoop-related classes are available in the classpath before starting the tiering service. ##### Option 1: Use Existing Hadoop Environment (Recommended) @@ -144,7 +144,7 @@ export HADOOP_HOME=$(pwd)/hadoop-3.3.5 export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` ``` -#### 🔧 Prepare Required JARs +#### Prepare Required JARs Follow the dependency management guidelines below for the [Prepare required jars](maintenance/tiered-storage/lakehouse-storage.md#prepare-required-jars) step: @@ -176,7 +176,7 @@ iceberg-aws-bundle-1.9.1.jar failsafe-3.3.2.jar ``` -#### 🚀 Start Datalake Tiering Service +#### Start Datalake Tiering Service When following the [Start Datalake Tiering Service](maintenance/tiered-storage/lakehouse-storage.md#start-datalake-tiering-service) guide, use Iceberg-specific configurations as parameters when starting the Flink tiering job: @@ -188,7 +188,7 @@ When following the [Start Datalake Tiering Service](maintenance/tiered-storage/l --datalake.iceberg.warehouse /tmp/iceberg ``` -#### ⚠️ Important Notes +#### Important Notes - Ensure all JAR files are compatible with Iceberg 1.9.1 - Verify that all required dependencies are in the `${FLINK_HOME}/lib` directory @@ -202,7 +202,7 @@ When a Fluss table is created or altered with the option `'table.datalake.enable The schema of the Iceberg table matches that of the Fluss table, except for the addition of three system columns at the end: `__bucket`, `__offset`, and `__timestamp`. These system columns help Fluss clients consume data from Iceberg in a streaming fashion, such as seeking by a specific bucket using an offset or timestamp. -### 🔧 Basic Configuration +### Basic Configuration Here is an example using Flink SQL to create a table with data lake enabled: @@ -224,7 +224,7 @@ CREATE TABLE fluss_order_with_lake ( ); ``` -### ⚙️ Iceberg Table Properties +### Iceberg Table Properties You can also specify Iceberg [table properties](https://iceberg.apache.org/docs/latest/configuration/#table-properties) when creating a datalake-enabled Fluss table by using the `iceberg.` prefix within the Fluss table properties clause. @@ -249,7 +249,7 @@ CREATE TABLE fluss_order_with_lake ( ); ``` -### 🔑 Primary Key Tables +### Primary Key Tables Primary key tables in Fluss are mapped to Iceberg tables with: @@ -289,7 +289,7 @@ CREATE TABLE user_profiles ( SORTED BY (__offset ASC); ``` -### 📝 Log Tables +### Log Tables The table mapping for Fluss log tables varies depending on whether the bucket key is specified or not. @@ -360,7 +360,7 @@ CREATE TABLE order_events ( SORTED BY (__offset ASC); ``` -### 🗂️ Partitioned Tables +### Partitioned Tables For Fluss partitioned tables, Iceberg first partitions by Fluss partition keys, then follows the above rules: @@ -394,7 +394,7 @@ CREATE TABLE daily_sales ( SORTED BY (__offset ASC); ``` -### 📊 System Columns +### System Columns All Iceberg tables created by Fluss include three system columns: @@ -406,7 +406,7 @@ All Iceberg tables created by Fluss include three system columns: ## Read Tables -### 🐿️ Reading with Apache Flink +### Reading with Apache Flink When a table has the configuration `table.datalake.enabled = 'true'`, its data exists in two layers: @@ -444,7 +444,7 @@ Key behavior for data retention: - **Expired Fluss log data** (controlled by `table.log.ttl`) remains accessible via Iceberg if previously tiered - **Cleaned-up partitions** in partitioned tables (controlled by `table.auto-partition.num-retention`) remain accessible via Iceberg if previously tiered -### 🔍 Reading with Other Engines +### Reading with Other Engines Since data tiered to Iceberg from Fluss is stored as standard Iceberg tables, you can use any Iceberg-compatible engine. Below is an example using [StarRocks](https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_catalog/): @@ -504,7 +504,7 @@ When integrating with Iceberg, Fluss automatically converts between Fluss data t ## Maintenance and Optimization -### 📦 Auto Compaction +### Auto Compaction The table option `table.datalake.auto-compaction` (disabled by default) provides per-table control over automatic compaction. When enabled for a specific table, compaction is automatically triggered during write operations to that table by the tiering service. @@ -528,7 +528,7 @@ CREATE TABLE example_table ( - **Storage**: Optimizes storage usage by removing duplicate data - **Maintenance**: Automatically handles data organization -### 📊 Snapshot Metadata +### Snapshot Metadata Fluss adds specific metadata to Iceberg snapshots for traceability: @@ -578,7 +578,7 @@ For partitioned tables, the metadata structure includes partition information: | `offset` | Offset within the partition's log | `3`, `1000` | -## 🚫 Current Limitations +## Current Limitations - **Complex Types**: Array, Map, and Row types are not supported - **Multiple bucket keys**: Not supported until Iceberg implements multi-argument partition transforms diff --git a/website/docs/streaming-lakehouse/integrate-data-lakes/paimon.md b/website/docs/streaming-lakehouse/integrate-data-lakes/paimon.md index a27d534af..6e1462435 100644 --- a/website/docs/streaming-lakehouse/integrate-data-lakes/paimon.md +++ b/website/docs/streaming-lakehouse/integrate-data-lakes/paimon.md @@ -176,7 +176,7 @@ The following table shows the mapping between [Fluss data types](table-design/da | BINARY | BINARY | | BYTES | BYTES | -## 📊 Snapshot Metadata +## Snapshot Metadata Fluss adds specific metadata to Paimon snapshots for traceability:
