mxm commented on code in PR #15062:
URL: https://github.com/apache/iceberg/pull/15062#discussion_r2705088745


##########
docs/docs/flink.md:
##########
@@ -1,5 +1,5 @@
 ---
-title: "Flink Getting Started"
+title: "Getting Started"

Review Comment:
   Should we keep the Flink context?



##########
site/docs/flink-quickstart.md:
##########
@@ -0,0 +1,183 @@
+---
+title: "Flink and Iceberg Quickstart"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+This guide will get you up and running with Apache Iceberg™ using Apache 
Flink™, including sample code to
+highlight some powerful features. You can learn more about Iceberg's Flink 
runtime by checking out the [Flink](docs/latest/flink.md) section.
+
+## Docker Compose
+
+The fastest way to get started is to use a Docker Compose file.
+To use this, you'll need to install the [Docker 
CLI](https://docs.docker.com/get-docker/) as well as the [Docker Compose 
CLI](https://github.com/docker/compose-cli/blob/main/INSTALL.md).
+
+Once you have those, save these two files into a new folder:
+
+* 
[`docker-compose.yml`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/docker-compose.yml)
+
+    This contains:
+
+    * A local Flink cluster (Job Manager and Task Manager)
+    * Iceberg REST Catalog
+    * MinIO (local S3 storage)
+    * AWS CLI (to create the S3 bucket)
+
+* 
[`Dockerfile.flink`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/Dockerfile.flink)
 - base Flink image, plus some required JARs for S3 and Iceberg.
+
+Next, start up the docker containers with this command:
+
+```sh
+docker compose up -d
+```
+
+Launch a Flink SQL client session:
+
+```sh
+docker compose exec -it jobmanager ./bin/sql-client.sh
+```
+
+## Creating an Iceberg Catalog in Flink
+
+Iceberg has several catalog back-ends that can be used to track tables, like 
JDBC, Hive MetaStore and Glue.
+In this guide we use a REST catalog, backed by S3.
+To learn more, check out the 
[Catalog](docs/latest/flink-configuration.md#catalog-configuration) page in the 
Flink section.
+
+First up, we need to define a Flink catalog.
+Tables within this catalog will be stored on S3 blob store:
+
+```sql
+CREATE CATALOG iceberg_catalog WITH (
+  'type'                 = 'iceberg',
+  'catalog-impl'         = 'org.apache.iceberg.rest.RESTCatalog',
+  'uri'                  = 'http://iceberg-rest:8181',
+  'warehouse'            = 's3://warehouse/',
+  'io-impl'              = 'org.apache.iceberg.aws.s3.S3FileIO',
+  's3.endpoint'          = 'http://minio:9000',
+  's3.access-key-id'     = 'admin',
+  's3.secret-access-key' = 'password',
+  's3.path-style-access' = 'true'
+);
+```
+
+Then make this the active catalog in your Flink SQL session:
+
+```sql
+USE CATALOG iceberg_catalog;
+```
+
+Create a database in the catalog:
+
+```sql
+CREATE DATABASE IF NOT EXISTS nyc;
+```
+
+and set it as active:
+
+```sql
+USE nyc;
+```
+
+## Creating a Table
+
+To create your first Iceberg table in Flink, run a [`CREATE 
TABLE`](docs/latest/flink-ddl.md#create-table) command.
+Let's create a table using `iceberg_catalog.nyc.taxis` where `iceberg_catalog` 
is the catalog name, `nyc` is the database name, and `taxis` is the table name.
+
+```sql
+CREATE TABLE iceberg_catalog.nyc.taxis

Review Comment:
   I'm curious, why are we fully-qualifying the table name here when we set the 
default catalog and database name above?



##########
flink/quickstart/docker-compose.yml:
##########
@@ -0,0 +1,127 @@
+#  - Licensed to the Apache Software Foundation (ASF) under one or more
+#  - contributor license agreements.  See the NOTICE file distributed with
+#  - this work for additional information regarding copyright ownership.
+#  - The ASF licenses this file to You under the Apache License, Version 2.0
+#  - (the "License"); you may not use this file except in compliance with
+#  - the License.  You may obtain a copy of the License at
+#  -
+#  -   http://www.apache.org/licenses/LICENSE-2.0
+#  -
+#  - Unless required by applicable law or agreed to in writing, software
+#  - distributed under the License is distributed on an "AS IS" BASIS,
+#  - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  - See the License for the specific language governing permissions and
+#  - limitations under the License.
+services:
+  jobmanager:
+    build:
+      context: .
+      dockerfile: Dockerfile.flink
+    hostname: jobmanager
+    container_name: jobmanager
+    depends_on:

Review Comment:
   Kubernetes seems to be a more typical setup from my experience, even for 
local testing, e.g. via Minikube.



##########
flink/quickstart/Dockerfile.flink:
##########
@@ -0,0 +1,64 @@
+#  - Licensed to the Apache Software Foundation (ASF) under one or more
+#  - contributor license agreements.  See the NOTICE file distributed with
+#  - this work for additional information regarding copyright ownership.
+#  - The ASF licenses this file to You under the Apache License, Version 2.0
+#  - (the "License"); you may not use this file except in compliance with
+#  - the License.  You may obtain a copy of the License at
+#  -
+#  -   http://www.apache.org/licenses/LICENSE-2.0
+#  -
+#  - Unless required by applicable law or agreed to in writing, software
+#  - distributed under the License is distributed on an "AS IS" BASIS,
+#  - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  - See the License for the specific language governing permissions and
+#  - limitations under the License.
+
+# Version variables for easier upgrades
+ARG FLINK_VERSION=2.0
+
+FROM apache/flink:${FLINK_VERSION}-java21
+SHELL ["/bin/bash", "-c"]
+
+# Redeclare ARG variables after FROM to make them available in subsequent 
layers
+ARG ICEBERG_VERSION=1.10.1
+ARG ICEBERG_FLINK_RUNTIME_VERSION=2.0
+ARG ICEBERG_AWS_BUNDLE_VERSION=1.9.2
+ARG HADOOP_VERSION=3.3.4
+ARG FLINK_S3_VERSION=2.1.1
+
+# Switch back to flink user
+USER flink
+
+WORKDIR /opt/flink
+
+RUN echo "HADOOP_VERSION=${HADOOP_VERSION}" && \
+    echo "ICEBERG_VERSION=${ICEBERG_VERSION}"
+
+RUN echo "-> Install JARs: Dependencies for Iceberg" && \
+    mkdir -p ./lib/iceberg && pushd $_ && \
+    curl -fO 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-${ICEBERG_FLINK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-flink-runtime-${ICEBERG_FLINK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar
 && \
+    curl -fO 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_AWS_BUNDLE_VERSION}/iceberg-aws-bundle-${ICEBERG_AWS_BUNDLE_VERSION}.jar
 && \
+    popd
+
+RUN echo "-> Install JARs: Hadoop" && \
+    mkdir -p ./lib/hadoop && pushd $_ && \
+    curl 
https://repo1.maven.org/maven2/org/apache/commons/commons-configuration2/2.1.1/commons-configuration2-2.1.1.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/${HADOOP_VERSION}/hadoop-auth-${HADOOP_VERSION}.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/${HADOOP_VERSION}/hadoop-common-${HADOOP_VERSION}.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/thirdparty/hadoop-shaded-guava/1.1.1/hadoop-shaded-guava-1.1.1.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/codehaus/woodstox/stax2-api/4.2.1/stax2-api-4.2.1.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/com/fasterxml/woodstox/woodstox-core/5.3.0/woodstox-core-5.3.0.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs-client/${HADOOP_VERSION}/hadoop-hdfs-client-${HADOOP_VERSION}.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/${HADOOP_VERSION}/hadoop-mapreduce-client-core-${HADOOP_VERSION}.jar
 -O  && \
+    popd

Review Comment:
   We could use S3 without Hadoop.



##########
flink/quickstart/Dockerfile.flink:
##########
@@ -0,0 +1,64 @@
+#  - Licensed to the Apache Software Foundation (ASF) under one or more
+#  - contributor license agreements.  See the NOTICE file distributed with
+#  - this work for additional information regarding copyright ownership.
+#  - The ASF licenses this file to You under the Apache License, Version 2.0
+#  - (the "License"); you may not use this file except in compliance with
+#  - the License.  You may obtain a copy of the License at
+#  -
+#  -   http://www.apache.org/licenses/LICENSE-2.0
+#  -
+#  - Unless required by applicable law or agreed to in writing, software
+#  - distributed under the License is distributed on an "AS IS" BASIS,
+#  - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  - See the License for the specific language governing permissions and
+#  - limitations under the License.
+
+# Version variables for easier upgrades
+ARG FLINK_VERSION=2.0
+
+FROM apache/flink:${FLINK_VERSION}-java21
+SHELL ["/bin/bash", "-c"]
+
+# Redeclare ARG variables after FROM to make them available in subsequent 
layers
+ARG ICEBERG_VERSION=1.10.1
+ARG ICEBERG_FLINK_RUNTIME_VERSION=2.0
+ARG ICEBERG_AWS_BUNDLE_VERSION=1.9.2
+ARG HADOOP_VERSION=3.3.4
+ARG FLINK_S3_VERSION=2.1.1
+
+# Switch back to flink user
+USER flink
+
+WORKDIR /opt/flink
+
+RUN echo "HADOOP_VERSION=${HADOOP_VERSION}" && \
+    echo "ICEBERG_VERSION=${ICEBERG_VERSION}"
+
+RUN echo "-> Install JARs: Dependencies for Iceberg" && \
+    mkdir -p ./lib/iceberg && pushd $_ && \
+    curl -fO 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-${ICEBERG_FLINK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-flink-runtime-${ICEBERG_FLINK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar
 && \
+    curl -fO 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_AWS_BUNDLE_VERSION}/iceberg-aws-bundle-${ICEBERG_AWS_BUNDLE_VERSION}.jar
 && \
+    popd
+
+RUN echo "-> Install JARs: Hadoop" && \
+    mkdir -p ./lib/hadoop && pushd $_ && \
+    curl 
https://repo1.maven.org/maven2/org/apache/commons/commons-configuration2/2.1.1/commons-configuration2-2.1.1.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/${HADOOP_VERSION}/hadoop-auth-${HADOOP_VERSION}.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/${HADOOP_VERSION}/hadoop-common-${HADOOP_VERSION}.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/thirdparty/hadoop-shaded-guava/1.1.1/hadoop-shaded-guava-1.1.1.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/codehaus/woodstox/stax2-api/4.2.1/stax2-api-4.2.1.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/com/fasterxml/woodstox/woodstox-core/5.3.0/woodstox-core-5.3.0.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs-client/${HADOOP_VERSION}/hadoop-hdfs-client-${HADOOP_VERSION}.jar
 -O && \
+    curl 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/${HADOOP_VERSION}/hadoop-mapreduce-client-core-${HADOOP_VERSION}.jar
 -O  && \
+    popd

Review Comment:
   Do we have to use Hadoop in 2026? :) 



##########
site/docs/flink-quickstart.md:
##########
@@ -0,0 +1,183 @@
+---
+title: "Flink and Iceberg Quickstart"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+This guide will get you up and running with Apache Iceberg™ using Apache 
Flink™, including sample code to
+highlight some powerful features. You can learn more about Iceberg's Flink 
runtime by checking out the [Flink](docs/latest/flink.md) section.
+
+## Docker Compose
+
+The fastest way to get started is to use a Docker Compose file.
+To use this, you'll need to install the [Docker 
CLI](https://docs.docker.com/get-docker/) as well as the [Docker Compose 
CLI](https://github.com/docker/compose-cli/blob/main/INSTALL.md).
+
+Once you have those, save these two files into a new folder:
+
+* 
[`docker-compose.yml`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/docker-compose.yml)
+
+    This contains:
+
+    * A local Flink cluster (Job Manager and Task Manager)
+    * Iceberg REST Catalog
+    * MinIO (local S3 storage)
+    * AWS CLI (to create the S3 bucket)
+
+* 
[`Dockerfile.flink`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/Dockerfile.flink)
 - base Flink image, plus some required JARs for S3 and Iceberg.
+
+Next, start up the docker containers with this command:
+
+```sh
+docker compose up -d
+```
+
+Launch a Flink SQL client session:
+
+```sh
+docker compose exec -it jobmanager ./bin/sql-client.sh
+```
+
+## Creating an Iceberg Catalog in Flink
+
+Iceberg has several catalog back-ends that can be used to track tables, like 
JDBC, Hive MetaStore and Glue.
+In this guide we use a REST catalog, backed by S3.
+To learn more, check out the 
[Catalog](docs/latest/flink-configuration.md#catalog-configuration) page in the 
Flink section.
+
+First up, we need to define a Flink catalog.
+Tables within this catalog will be stored on S3 blob store:
+
+```sql
+CREATE CATALOG iceberg_catalog WITH (
+  'type'                 = 'iceberg',
+  'catalog-impl'         = 'org.apache.iceberg.rest.RESTCatalog',
+  'uri'                  = 'http://iceberg-rest:8181',
+  'warehouse'            = 's3://warehouse/',
+  'io-impl'              = 'org.apache.iceberg.aws.s3.S3FileIO',
+  's3.endpoint'          = 'http://minio:9000',
+  's3.access-key-id'     = 'admin',
+  's3.secret-access-key' = 'password',
+  's3.path-style-access' = 'true'
+);
+```
+
+Then make this the active catalog in your Flink SQL session:
+
+```sql
+USE CATALOG iceberg_catalog;
+```
+
+Create a database in the catalog:
+
+```sql
+CREATE DATABASE IF NOT EXISTS nyc;
+```
+
+and set it as active:
+
+```sql
+USE nyc;
+```

Review Comment:
   For brevity and to avoid confusion, I would remove changing the default 
catalog / database and continue to use fully-qualified table names (like below).



##########
site/docs/flink-quickstart.md:
##########
@@ -0,0 +1,183 @@
+---
+title: "Flink and Iceberg Quickstart"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+This guide will get you up and running with Apache Iceberg™ using Apache 
Flink™, including sample code to
+highlight some powerful features. You can learn more about Iceberg's Flink 
runtime by checking out the [Flink](docs/latest/flink.md) section.
+
+## Docker Compose
+
+The fastest way to get started is to use a Docker Compose file.
+To use this, you'll need to install the [Docker 
CLI](https://docs.docker.com/get-docker/) as well as the [Docker Compose 
CLI](https://github.com/docker/compose-cli/blob/main/INSTALL.md).
+
+Once you have those, save these two files into a new folder:
+
+* 
[`docker-compose.yml`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/docker-compose.yml)
+
+    This contains:
+
+    * A local Flink cluster (Job Manager and Task Manager)
+    * Iceberg REST Catalog
+    * MinIO (local S3 storage)
+    * AWS CLI (to create the S3 bucket)
+
+* 
[`Dockerfile.flink`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/Dockerfile.flink)
 - base Flink image, plus some required JARs for S3 and Iceberg.
+
+Next, start up the docker containers with this command:
+
+```sh
+docker compose up -d
+```
+
+Launch a Flink SQL client session:
+
+```sh
+docker compose exec -it jobmanager ./bin/sql-client.sh
+```
+
+## Creating an Iceberg Catalog in Flink
+
+Iceberg has several catalog back-ends that can be used to track tables, like 
JDBC, Hive MetaStore and Glue.
+In this guide we use a REST catalog, backed by S3.
+To learn more, check out the 
[Catalog](docs/latest/flink-configuration.md#catalog-configuration) page in the 
Flink section.
+
+First up, we need to define a Flink catalog.
+Tables within this catalog will be stored on S3 blob store:
+
+```sql
+CREATE CATALOG iceberg_catalog WITH (
+  'type'                 = 'iceberg',
+  'catalog-impl'         = 'org.apache.iceberg.rest.RESTCatalog',
+  'uri'                  = 'http://iceberg-rest:8181',
+  'warehouse'            = 's3://warehouse/',
+  'io-impl'              = 'org.apache.iceberg.aws.s3.S3FileIO',
+  's3.endpoint'          = 'http://minio:9000',
+  's3.access-key-id'     = 'admin',
+  's3.secret-access-key' = 'password',
+  's3.path-style-access' = 'true'
+);
+```
+
+Then make this the active catalog in your Flink SQL session:
+
+```sql
+USE CATALOG iceberg_catalog;
+```
+
+Create a database in the catalog:
+
+```sql
+CREATE DATABASE IF NOT EXISTS nyc;
+```
+
+and set it as active:
+
+```sql
+USE nyc;
+```
+
+## Creating a Table
+
+To create your first Iceberg table in Flink, run a [`CREATE 
TABLE`](docs/latest/flink-ddl.md#create-table) command.
+Let's create a table using `iceberg_catalog.nyc.taxis` where `iceberg_catalog` 
is the catalog name, `nyc` is the database name, and `taxis` is the table name.
+
+```sql
+CREATE TABLE iceberg_catalog.nyc.taxis
+(
+    vendor_id BIGINT,
+    trip_id BIGINT,
+    trip_distance FLOAT,
+    fare_amount DOUBLE,
+    store_and_fwd_flag STRING
+);
+```
+
+Iceberg catalogs support the full range of Flink SQL DDL commands, including:
+
+* [`CREATE TABLE ... PARTITIONED BY`](docs/latest/flink-ddl.md#partitioned-by)
+* [`ALTER TABLE`](docs/latest/flink-ddl.md#alter-table)
+* [`DROP TABLE`](docs/latest/flink-ddl.md#drop-table)
+
+## Writing Data to a Table
+
+Once your table is created, you can insert records.
+
+Flink uses checkpoints to ensure data durability and exactly-once semantics.
+Without checkpointing, Iceberg data and metadata may not be fully committed to 
storage.
+
+```sql
+SET 'execution.checkpointing.interval' = '10s';
+```
+
+Then you can write some data:
+
+```sql
+INSERT INTO iceberg_catalog.nyc.taxis
+VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 
1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');
+```
+
+## Reading Data from a Table
+
+To read a table, use the Iceberg table's name:
+
+```sql
+SELECT * FROM iceberg_catalog.nyc.taxis;
+```
+
+## Creating a Table with Inline Catalog Configuration
+
+Creating a Flink catalog as shown above, backed by an Iceberg catalog, is one 
way to use Iceberg in Flink.
+Another way is to use the [Iceberg connector](/docs/latest/flink-connector.md) 
and specify the Iceberg details as table properties:
+
+First, switch to the default catalog (otherwise the table would be created 
using the Iceberg details that we configured in the catalog definition above):
+
+```sql
+USE CATALOG default_catalog;
+```

Review Comment:
   I would prefer to avoid changing the default catalog because that would make 
these examples easier to read.



##########
flink/quickstart/overview.excalidraw.svg:
##########


Review Comment:
   Could we link this file from the docs page or remove it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to