This is an automated email from the ASF dual-hosted git repository.
pvary pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new f7916f2778 Docker, Docs, Site: Add Flink quickstart (#15062)
f7916f2778 is described below
commit f7916f2778866cff748a6c6a4118dde50c03181e
Author: Robin Moffatt <[email protected]>
AuthorDate: Thu Feb 19 15:31:09 2026 +0000
Docker, Docs, Site: Add Flink quickstart (#15062)
---
docker/iceberg-flink-quickstart/README.md | 23 ++-
docker/iceberg-flink-quickstart/docker-compose.yml | 7 -
docker/iceberg-flink-quickstart/test.sql | 78 +++++++++
docs/docs/flink.md | 5 +-
.../assets/images/flink-quickstart.excalidraw.png | Bin 0 -> 333949 bytes
site/docs/flink-quickstart.md | 174 +++++++++++++++++++++
site/mkdocs-dev.yml | 1 +
site/nav.yml | 1 +
8 files changed, 276 insertions(+), 13 deletions(-)
diff --git a/docker/iceberg-flink-quickstart/README.md
b/docker/iceberg-flink-quickstart/README.md
index c7243cac31..b844f2891d 100644
--- a/docker/iceberg-flink-quickstart/README.md
+++ b/docker/iceberg-flink-quickstart/README.md
@@ -21,6 +21,8 @@
A pre-configured Apache Flink image with Apache Iceberg dependencies for
quickly getting started with Iceberg on Flink.
+See the [Flink quickstart
documentation](https://iceberg.apache.org/flink-quickstart/) for details.
+
## Overview
This Docker image extends the official Apache Flink image to include:
@@ -60,18 +62,31 @@ docker build \
## Usage
-The easiest way to get started is using the quickstart docker-compose file
from the repository root:
+See the [Flink quickstart
documentation](https://iceberg.apache.org/flink-quickstart/) for details.
-```bash
+
+## Test Script
+
+A test script (`test.sql`) is provided to validate the Iceberg-Flink
integration and future changes to the Docker image.
+
+Start up the Docker containers:
+
+```sh
docker compose -f docker/iceberg-flink-quickstart/docker-compose.yml up -d
--build
```
-Then connect to Flink SQL client:
+Execute the test script directly from the host:
```bash
-docker exec -it jobmanager ./bin/sql-client.sh
+docker exec -i jobmanager ./bin/sql-client.sh <
docker/iceberg-flink-quickstart/test.sql
```
+**Expected behavior:**
+- Exit code: 0 (success)
+- Creates: 1 catalog (`iceberg_catalog`), 1 database (`nyc`), 1 table (`taxis`)
+- Inserts: 4 records
+- Final state: Table `iceberg_catalog.nyc.taxis` contains 4 rows
+
To stop the stack:
```bash
diff --git a/docker/iceberg-flink-quickstart/docker-compose.yml
b/docker/iceberg-flink-quickstart/docker-compose.yml
index 955d06a504..dd5d711f67 100644
--- a/docker/iceberg-flink-quickstart/docker-compose.yml
+++ b/docker/iceberg-flink-quickstart/docker-compose.yml
@@ -35,8 +35,6 @@ services:
condition: service_healthy
networks:
iceberg_net:
- ports:
- - "8081:8081"
command: jobmanager
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8081/overview"]
@@ -85,8 +83,6 @@ services:
condition: service_completed_successfully
networks:
iceberg_net:
- ports:
- - "8181:8181"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8181/v1/config"]
interval: 5s
@@ -112,9 +108,6 @@ services:
iceberg_net:
aliases:
- warehouse.minio
- ports:
- - "9000:9000"
- - "9001:9001"
command: server /data --console-address ":9001"
healthcheck:
test: ["CMD", "mc", "ready", "local"]
diff --git a/docker/iceberg-flink-quickstart/test.sql
b/docker/iceberg-flink-quickstart/test.sql
new file mode 100644
index 0000000000..a7de5a8958
--- /dev/null
+++ b/docker/iceberg-flink-quickstart/test.sql
@@ -0,0 +1,78 @@
+--
=============================================================================
+-- Iceberg Flink Quickstart Test Script
+--
=============================================================================
+--
+-- Prerequisites:
+-- docker compose -f docker/iceberg-flink-quickstart/docker-compose.yml up
-d --build
+-- docker exec -it jobmanager ./bin/sql-client.sh
+--
+-- Then paste this script or run line by line
+--
=============================================================================
+
+--
-----------------------------------------------------------------------------
+-- 1. Create the Iceberg REST catalog
+--
-----------------------------------------------------------------------------
+CREATE CATALOG iceberg_catalog WITH (
+ 'type' = 'iceberg',
+ 'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog',
+ 'uri' = 'http://iceberg-rest:8181',
+ 'warehouse' = 's3://warehouse/',
+ 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO',
+ 's3.endpoint' = 'http://minio:9000',
+ 's3.access-key-id' = 'admin',
+ 's3.secret-access-key' = 'password',
+ 's3.path-style-access' = 'true'
+);
+
+--
-----------------------------------------------------------------------------
+-- 2. Create a database and table
+--
-----------------------------------------------------------------------------
+CREATE DATABASE IF NOT EXISTS iceberg_catalog.nyc;
+
+CREATE TABLE iceberg_catalog.nyc.taxis (
+ vendor_id BIGINT,
+ trip_id BIGINT,
+ trip_distance FLOAT,
+ fare_amount DOUBLE,
+ store_and_fwd_flag STRING
+);
+
+--
-----------------------------------------------------------------------------
+-- 3. Enable checkpointing (required for Iceberg commits)
+--
-----------------------------------------------------------------------------
+SET 'execution.checkpointing.interval' = '10s';
+
+--
-----------------------------------------------------------------------------
+-- 4. Insert data
+--
-----------------------------------------------------------------------------
+INSERT INTO iceberg_catalog.nyc.taxis
+VALUES
+ (1, 1000371, 1.8, 15.32, 'N'),
+ (2, 1000372, 2.5, 22.15, 'N'),
+ (2, 1000373, 0.9, 9.01, 'N'),
+ (1, 1000374, 8.4, 42.13, 'Y');
+
+--
-----------------------------------------------------------------------------
+-- 5. Query the data
+--
-----------------------------------------------------------------------------
+SET 'sql-client.execution.result-mode' = 'tableau';
+SELECT * FROM iceberg_catalog.nyc.taxis;
+
+--
-----------------------------------------------------------------------------
+-- 6. Inspect Iceberg metadata
+--
-----------------------------------------------------------------------------
+-- Snapshots
+SELECT * FROM iceberg_catalog.nyc.`taxis$snapshots`;
+
+-- Data files
+SELECT content, file_path, file_format, record_count
+FROM iceberg_catalog.nyc.`taxis$files`;
+
+-- History
+SELECT * FROM iceberg_catalog.nyc.`taxis$history`;
+
+--
-----------------------------------------------------------------------------
+-- 7. Cleanup (optional)
+--
-----------------------------------------------------------------------------
+DROP TABLE iceberg_catalog.nyc.taxis;
+DROP DATABASE iceberg_catalog.nyc;
diff --git a/docs/docs/flink.md b/docs/docs/flink.md
index fd0a1077ed..50bdc2c482 100644
--- a/docs/docs/flink.md
+++ b/docs/docs/flink.md
@@ -1,5 +1,5 @@
---
-title: "Flink Getting Started"
+title: "Getting Started"
---
<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
@@ -18,7 +18,8 @@ title: "Flink Getting Started"
- limitations under the License.
-->
-# Flink
+!!!tip
+ For an overview of using Iceberg with Flink, see the [Flink
Quickstart](/flink-quickstart)
Apache Iceberg supports both [Apache Flink](https://flink.apache.org/)'s
DataStream API and Table API. See the [Multi-Engine
Support](../../multi-engine-support.md#apache-flink) page for the integration
of Apache Flink.
diff --git a/site/docs/assets/images/flink-quickstart.excalidraw.png
b/site/docs/assets/images/flink-quickstart.excalidraw.png
new file mode 100644
index 0000000000..033b4901c0
Binary files /dev/null and
b/site/docs/assets/images/flink-quickstart.excalidraw.png differ
diff --git a/site/docs/flink-quickstart.md b/site/docs/flink-quickstart.md
new file mode 100644
index 0000000000..657d13d806
--- /dev/null
+++ b/site/docs/flink-quickstart.md
@@ -0,0 +1,174 @@
+---
+title: "Flink and Iceberg Quickstart"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+This guide will get you up and running with Apache Iceberg™ using Apache
Flink™, including sample code to
+highlight some powerful features. You can learn more about Iceberg's Flink
runtime by checking out the [Flink](docs/latest/flink.md) section.
+
+## Quickstart environment
+
+The fastest way to get started is to use Docker Compose with the [Iceberg
Flink
Quickstart](https://github.com/apache/iceberg/tree/main/docker/iceberg-flink-quickstart)
image.
+
+To use this, you'll need to install the [Docker
CLI](https://docs.docker.com/get-docker/).
+
+The quickstart includes:
+
+* A local Flink cluster (Job Manager and Task Manager)
+* Iceberg REST Catalog
+* MinIO (local S3 storage)
+
+
+
+Clone the Iceberg repository and start up the Docker containers:
+
+```sh
+git clone https://github.com/apache/iceberg.git
+cd iceberg
+docker compose -f docker/iceberg-flink-quickstart/docker-compose.yml up -d
--build
+```
+
+Launch a Flink SQL client session:
+
+```sh
+docker exec -it jobmanager ./bin/sql-client.sh
+```
+
+## Creating an Iceberg Catalog in Flink
+
+Iceberg has several catalog back-ends that can be used to track tables, like
JDBC, Hive MetaStore and Glue.
+In this guide we use a REST catalog, backed by S3.
+To learn more, check out the
[Catalog](docs/latest/flink-configuration.md#catalog-configuration) page in the
Flink section.
+
+First up, we need to define a Flink catalog.
+Tables within this catalog will be stored on S3 blob store:
+
+```sql
+CREATE CATALOG iceberg_catalog WITH (
+ 'type' = 'iceberg',
+ 'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog',
+ 'uri' = 'http://iceberg-rest:8181',
+ 'warehouse' = 's3://warehouse/',
+ 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO',
+ 's3.endpoint' = 'http://minio:9000',
+ 's3.access-key-id' = 'admin',
+ 's3.secret-access-key' = 'password',
+ 's3.path-style-access' = 'true'
+);
+```
+
+Create a database in the catalog:
+
+```sql
+CREATE DATABASE IF NOT EXISTS iceberg_catalog.nyc;
+```
+
+## Creating a Table
+
+To create your first Iceberg table in Flink, run a [`CREATE
TABLE`](docs/latest/flink-ddl.md#create-table) command.
+Let's create a table using `iceberg_catalog.nyc.taxis` where `iceberg_catalog`
is the catalog name, `nyc` is the database name, and `taxis` is the table name.
+
+```sql
+CREATE TABLE iceberg_catalog.nyc.taxis
+(
+ vendor_id BIGINT,
+ trip_id BIGINT,
+ trip_distance FLOAT,
+ fare_amount DOUBLE,
+ store_and_fwd_flag STRING
+);
+```
+
+Iceberg catalogs support the full range of Flink SQL DDL commands, including:
+
+* [`CREATE TABLE ... PARTITIONED BY`](docs/latest/flink-ddl.md#partitioned-by)
+* [`ALTER TABLE`](docs/latest/flink-ddl.md#alter-table)
+* [`DROP TABLE`](docs/latest/flink-ddl.md#drop-table)
+
+## Writing Data to a Table
+
+Once your table is created, you can insert records.
+
+Flink uses checkpoints to ensure data durability and exactly-once semantics.
+Without checkpointing, Iceberg data and metadata may not be fully committed to
storage.
+
+```sql
+SET 'execution.checkpointing.interval' = '10s';
+```
+
+Then you can write some data:
+
+```sql
+INSERT INTO iceberg_catalog.nyc.taxis
+VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2,
1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');
+```
+
+## Reading Data from a Table
+
+To read a table, use the Iceberg table's name:
+
+```sql
+SELECT * FROM iceberg_catalog.nyc.taxis;
+```
+
+## Creating a Table with Inline Configuration
+
+Creating a Flink catalog as shown above, backed by an Iceberg REST Catalog, is
one way to use Iceberg in Flink.
+Another way is to use the [Flink connector](docs/latest/flink-connector.md)
and specify the catalog connection details directly in the table definition.
This still connects to the same external Iceberg REST Catalog - the difference
is just that you don't need a separate `CREATE CATALOG` statement.
+
+Create a table using inline configuration:
+
+!!! note
+ The Flink table definition here is registered in Flink's default in-memory
catalog (`default_catalog`), but the connector properties tell Flink to store
the Iceberg table and its data in the same REST Catalog and S3 storage as
before.
+
+```sql
+CREATE TABLE taxis_inline_config (
+ vendor_id BIGINT,
+ trip_id BIGINT,
+ trip_distance FLOAT,
+ fare_amount DOUBLE,
+ store_and_fwd_flag STRING
+) WITH (
+ 'connector' = 'iceberg',
+ 'catalog-name' = 'foo', -- Required by Flink connector but value
doesn't matter for inline config
+ 'catalog-type' = 'rest',
+ 'uri' = 'http://iceberg-rest:8181',
+ 'warehouse' = 's3://warehouse/',
+ 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO',
+ 's3.endpoint' = 'http://minio:9000',
+ 's3.access-key-id' = 'admin',
+ 's3.secret-access-key' = 'password',
+ 's3.path-style-access' = 'true'
+);
+```
+
+## Shutting down the quickstart environment
+
+Once you've finished with the quickstart, shut down the Docker containers by
running the following:
+
+```sh
+docker compose -f docker/iceberg-flink-quickstart/docker-compose.yml down
+```
+
+## Learn More
+
+!!! note
+ If you want to include Iceberg in your Flink installation, add the Iceberg
Flink runtime to Flink's `jars` folder.
+ You can download the runtime from the [Releases](releases.md) page.
+
+Now that you're up and running with Iceberg and Flink, check out the [Iceberg
Flink docs](docs/latest/flink.md) to learn more!
diff --git a/site/mkdocs-dev.yml b/site/mkdocs-dev.yml
index ef66a86501..c7fc0498f3 100644
--- a/site/mkdocs-dev.yml
+++ b/site/mkdocs-dev.yml
@@ -25,6 +25,7 @@ nav:
- Home: index.md
- Quickstart:
- Spark: spark-quickstart.md
+ - Flink: flink-quickstart.md
- Hive: hive-quickstart.md
- Docs:
- Java:
diff --git a/site/nav.yml b/site/nav.yml
index a4ad451e8e..cae97ad3d6 100644
--- a/site/nav.yml
+++ b/site/nav.yml
@@ -19,6 +19,7 @@ nav:
- Home: index.md
- Quickstart:
- Spark: spark-quickstart.md
+ - Flink: flink-quickstart.md
- Hive: hive-quickstart.md
- Docs:
- Java: