mxm commented on code in PR #15062: URL: https://github.com/apache/iceberg/pull/15062#discussion_r2822175432
########## site/docs/flink-quickstart.md: ########## @@ -0,0 +1,174 @@ +--- +title: "Flink and Iceberg Quickstart" +--- +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +This guide will get you up and running with Apache Iceberg™ using Apache Flink™, including sample code to +highlight some powerful features. You can learn more about Iceberg's Flink runtime by checking out the [Flink](docs/latest/flink.md) section. + +## Quickstart environment + +The fastest way to get started is to use Docker Compose with the [Iceberg Flink Quickstart](https://github.com/apache/iceberg/tree/main/docker/iceberg-flink-quickstart) image. + +To use this, you'll need to install the [Docker CLI](https://docs.docker.com/get-docker/). + +The quickstart includes: + +* A local Flink cluster (Job Manager and Task Manager) +* Iceberg REST Catalog +* MinIO (local S3 storage) + + + +Clone the Iceberg repository and start up the Docker containers: + +```sh +git clone https://github.com/apache/iceberg.git +cd iceberg +docker compose -f docker/iceberg-flink-quickstart/docker-compose.yml up -d --build +``` + +Launch a Flink SQL client session: + +```sh +docker exec -it jobmanager ./bin/sql-client.sh +``` + +## Creating an Iceberg Catalog in Flink + +Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. +In this guide we use a REST catalog, backed by S3. +To learn more, check out the [Catalog](docs/latest/flink-configuration.md#catalog-configuration) page in the Flink section. + +First up, we need to define a Flink catalog. +Tables within this catalog will be stored on S3 blob store: + +```sql +CREATE CATALOG iceberg_catalog WITH ( + 'type' = 'iceberg', + 'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog', + 'uri' = 'http://iceberg-rest:8181', + 'warehouse' = 's3://warehouse/', + 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO', Review Comment: Created an issue so we don't foget: https://github.com/apache/iceberg/issues/15352 ########## site/docs/flink-quickstart.md: ########## @@ -0,0 +1,174 @@ +--- +title: "Flink and Iceberg Quickstart" +--- +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +This guide will get you up and running with Apache Iceberg™ using Apache Flink™, including sample code to +highlight some powerful features. You can learn more about Iceberg's Flink runtime by checking out the [Flink](docs/latest/flink.md) section. + +## Quickstart environment + +The fastest way to get started is to use Docker Compose with the [Iceberg Flink Quickstart](https://github.com/apache/iceberg/tree/main/docker/iceberg-flink-quickstart) image. + +To use this, you'll need to install the [Docker CLI](https://docs.docker.com/get-docker/). + +The quickstart includes: + +* A local Flink cluster (Job Manager and Task Manager) +* Iceberg REST Catalog +* MinIO (local S3 storage) + + + +Clone the Iceberg repository and start up the Docker containers: + +```sh +git clone https://github.com/apache/iceberg.git +cd iceberg +docker compose -f docker/iceberg-flink-quickstart/docker-compose.yml up -d --build +``` + +Launch a Flink SQL client session: + +```sh +docker exec -it jobmanager ./bin/sql-client.sh +``` + +## Creating an Iceberg Catalog in Flink + +Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. +In this guide we use a REST catalog, backed by S3. +To learn more, check out the [Catalog](docs/latest/flink-configuration.md#catalog-configuration) page in the Flink section. + +First up, we need to define a Flink catalog. +Tables within this catalog will be stored on S3 blob store: + +```sql +CREATE CATALOG iceberg_catalog WITH ( + 'type' = 'iceberg', + 'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog', + 'uri' = 'http://iceberg-rest:8181', + 'warehouse' = 's3://warehouse/', + 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO', Review Comment: The only strange thing about this is that Flink has its own IO which is usually already configured. I think it would be good to support Flink IO out of the box, so we don't need to configure Iceberg IO here. This isn't something for this PR, but something to think about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
