SaketaChalamchala commented on code in PR #206: URL: https://github.com/apache/ozone-site/pull/206#discussion_r2748065293
########## docs/04-user-guide/03-integrations/09-flink.md: ########## @@ -0,0 +1,196 @@ +--- +sidebar_label: Flink +--- + +# Apache Flink + +[Apache Flink](https://flink.apache.org/) is a powerful, open-source distributed processing framework designed for stateful computations over both bounded and unbounded data streams at any scale. It enables high-throughput, low-latency, and fault-tolerant processing while offering elastic scaling capabilities to handle millions of events per second across thousands of cores. + +Apache Flink can use Apache Ozone for reading and writing data, and for storing essential operational components like application state checkpoints and savepoints. + +## Quickstart + +This tutorial shows how to get started with connecting Apache Flink to Apache Ozone using the S3 Gateway, with Docker Compose. + +First, obtain Ozone's sample Docker Compose configuration and save it as `docker-compose.yaml`: + +```bash +curl -O https://raw.githubusercontent.com/apache/ozone-docker/refs/heads/latest/docker-compose.yaml +``` + +Refer to the [Docker quick start page](../../02-quick-start/01-installation/01-docker.md) for details. + +## Assumptions + +- Flink accesses Ozone through S3 Gateway instead of ofs. +- Ozone S3G listens on port 9878 +- Ozone S3G enables path style access. +- Ozone S3G does not enable security, therefore any S3 access key and secret key is accepted. +- Flink Docker image tag `flink:scala_2.12-java17` + +## Step 1 — Create `docker-compose-flink.yml` for Flink + +```yaml +services: + jobmanager: + image: flink:scala_2.12-java17 + command: jobmanager + ports: + - "8081:8081" + environment: + AWS_ACCESS_KEY_ID: ozone + AWS_SECRET_ACCESS_KEY: ozone + FLINK_PROPERTIES: | + jobmanager.rpc.address: jobmanager + fs.s3a.endpoint: http://s3g:9878 + fs.s3a.path.style.access: true + fs.s3a.connection.ssl.enabled: false + fs.s3a.access.key: ozone + fs.s3a.secret.key: ozone + + taskmanager: + image: flink:scala_2.12-java17 + command: taskmanager + depends_on: + - jobmanager + environment: + AWS_ACCESS_KEY_ID: ozone + AWS_SECRET_ACCESS_KEY: ozone + FLINK_PROPERTIES: | + jobmanager.rpc.address: jobmanager + taskmanager.numberOfTaskSlots: 4 + fs.s3a.endpoint: http://s3g:9878 + fs.s3a.path.style.access: true + fs.s3a.connection.ssl.enabled: false + fs.s3a.access.key: ozone + fs.s3a.secret.key: ozone +``` + +## Step 2 — Start Flink and Ozone together + +With both `docker-compose.yaml` (for Ozone) and `docker-compose-flink.yml` (for Flink) in the same directory, +you can start both services together, sharing the same network, using: + +```bash +export COMPOSE_FILE=docker-compose.yaml:docker-compose-flink.yml +docker compose up -d +``` + +Verify containers are running: + +```bash +docker ps +``` + +## Step 3 — Create an Ozone bucket + +You need to connect to Ozone (for example, `s3g`) to create a OBS bucket: + +```bash +docker compose exec -it s3g ozone sh bucket create s3v/bucket1 -l obs +``` + +## Step 4 — Copy the Flink S3 filesystem plugin + +The official Flink Docker image does not enable S3 by default. +You must copy the plugin JAR into both JobManager and TaskManager. + +Copy into JobManager + +```bash +docker compose exec -it jobmanager bash -lc \ + "mkdir -p /opt/flink/plugins/s3-fs-hadoop && \\ + cp /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/plugins/s3-fs-hadoop/" +``` + +Copy into TaskManager + +```bash +docker compose exec -it taskmanager bash -lc \ + "mkdir -p /opt/flink/plugins/s3-fs-hadoop && \\ + cp /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/plugins/s3-fs-hadoop/" +``` + +Verify: + +```bash +docker compose exec -it jobmanager ls /opt/flink/plugins/s3-fs-hadoop +docker compose exec -it taskmanager ls /opt/flink/plugins/s3-fs-hadoop +``` + +## Step 5 — Restart Flink containers (required) + +Plugins are loaded only at startup. + +```bash +docker compose restart jobmanager taskmanager +``` + +## Step 6 — Start Flink SQL client + +```bash +docker compose exec -it jobmanager ./bin/sql-client.sh +``` + +You should now be in: + +```text +Flink SQL> +``` + +## Step 7 — Create a table backed by Ozone S3 Review Comment: ```suggestion ### Step 6 — Create and Query a table backed by Ozone S3 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
