azagrebin commented on a change in pull request #12131: URL: https://github.com/apache/flink/pull/12131#discussion_r425777456
########## File path: docs/ops/deployment/docker.md ########## @@ -24,119 +24,490 @@ under the License. --> [Docker](https://www.docker.com) is a popular container runtime. -There are Docker images for Apache Flink available on Docker Hub which can be used to deploy a session cluster. -The Flink repository also contains tooling to create container images to deploy a job cluster. +There are Docker images for Apache Flink available [on Docker Hub](https://hub.docker.com/_/flink). +You can use the docker images to deploy a *Session* or *Job cluster* in a containerized environment, e.g., +[standalone Kubernetes](kubernetes.html) or [native Kubernetes](native_kubernetes.html). * This will be replaced by the TOC {:toc} -## Flink session cluster - -A Flink session cluster can be used to run multiple jobs. -Each job needs to be submitted to the cluster after it has been deployed. - -### Docker images +## Docker Hub Flink images The [Flink Docker repository](https://hub.docker.com/_/flink/) is hosted on Docker Hub and serves images of Flink version 1.2.1 and later. -Images for each supported combination of Hadoop and Scala are available, and tag aliases are provided for convenience. +### Image tags -Beginning with Flink 1.5, image tags that omit a Hadoop version (e.g. -`-hadoop28`) correspond to Hadoop-free releases of Flink that do not include a -bundled Hadoop distribution. +Images for each supported combination of Flink and Scala versions are available, and +[tag aliases](https://hub.docker.com/_/flink?tab=tags) are provided for convenience. -For example, the following aliases can be used: *(`1.5.y` indicates the latest -release of Flink 1.5)* +For example, you can use the following aliases: *(`1.11.y` indicates the latest release of Flink 1.11)* * `flink:latest` → `flink:<latest-flink>-scala_<latest-scala>` -* `flink:1.5` → `flink:1.5.y-scala_2.11` -* `flink:1.5-hadoop27` → `flink:1.5.y-hadoop27-scala_2.11` +* `flink:1.11` → `flink:1.11.y-scala_2.11` + +<span class="label label-info">Note</span> Prio to Flink 1.5 version, Hadoop dependencies were always bundled with Flink. +You can see that certain tags include the version of Hadoop, e.g. (e.g. `-hadoop28`). +Beginning with Flink 1.5, image tags that omit the Hadoop version correspond to Hadoop-free releases of Flink +that do not include a bundled Hadoop distribution. + +## How to run Flink image + +The Flink image contains a regular Flink distribution with its default configuration and a standard entry point script. +You can run its entry point in the following modes: +* *Flink Master* for [a Session cluster](#start-a-session-cluster) +* *Flink Master* for [a Single Job cluster](#start-a-single-job-cluster) +* *TaskManager* for any cluster + +This allows you to deploy a standalone cluster (Session or Single Job) in any containerised environment, for example: +* manually in a local docker setup, +* [in a Kubernetes cluster](kubernetes.html), +* [with Docker Compose](#flink-with-docker-compose), +* [with Docker swarm](#flink-with-docker-swarm). + +<span class="label label-info">Note</span> [The native Kubernetes](native_kubernetes.html) also runs the same image by default +and deploys *TaskManagers* on demand so that you do not have to do it manually. + +The next chapters describe how to start a single Flink docker container for various purposes. + +### Start a Session Cluster + +A *Flink Session cluster* can be used to run multiple jobs. Each job needs to be submitted to the cluster after it has been deployed. +To deploy a *Flink Session cluster* with docker, you need to start a *Flink Master* container: + +```sh +docker run flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} jobmanager +``` + +and one or more *TaskManager* containers: + +```sh +docker run flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} taskmanager +``` + +### Start a Job Cluster + +A *Flink Job cluster* is a dedicated cluster which runs a single job. +In this case, you deploy the cluster with the job as one step, thus, there is no extra job submission needed. +Therefore, the *job artifacts* should be already available locally in the container. + +The *job artifacts* are included into the class path of Flink's JVM process within the container and consist of: +* your job jar, which you would normally submit to a *Session cluster* and +* all other necessary dependencies or resources, not included into Flink. + +To deploy a cluster for a single job with docker, you need to +* make *job artifacts* available locally *in all containers* under `/opt/flink/usrlib`, +* start a *Flink Master* container in the *Job Cluster* mode +* start the required number of *TaskManager* containers. + +To make the **job artifacts available** locally in the container, you can + +* **either mount a volume** (or multiple volumes) with the artifacts to `/opt/flink/usrlib` when you start +the *Flink Master* and *TaskManagers*: + + ```sh + docker run \ + --mount type=bind,src=/host/path/to/job/artifacts1,target=/opt/flink/usrlib/artifacts1 \ + --mount type=bind,src=/host/path/to/job/artifacts2,target=/opt/flink/usrlib/artifacts2 \ + flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} standalone-job \ + --job-classname com.job.ClassName \ + --job-id <job id> \ + [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] \ + [job arguments] + + docker run \ + --mount type=bind,src=/host/path/to/job/artifacts1,target=/opt/flink/usrlib/artifacts1 \ + --mount type=bind,src=/host/path/to/job/artifacts2,target=/opt/flink/usrlib/artifacts2 \ + flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} taskmanager + ``` + +* **or extend the Flink image** by writing a custom `Dockerfile`, build it and use it for starting the *Flink Master* and *TaskManagers*: + + ```dockerfile + FROM flink + ADD /host/path/to/job/artifacts/1 /opt/flink/usrlib/artifacts/1 + ADD /host/path/to/job/artifacts/2 /opt/flink/usrlib/artifacts/2 + ``` + + ```sh + docker build -t flink_with_job_artifacts . + docker run \ + flink_with_job_artifacts standalone-job \ + --job-classname com.job.ClassName \ + --job-id <job id> \ + [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] \ + [job arguments] + + docker run flink_with_job_artifacts taskmanager + ``` + +The `standalone-job` argument starts a *Flink Master* container in the *Job Cluster* mode. + +#### Flink Master additional command line arguments + +You can provide the following additional command line arguments to the cluster entrypoint: + +* `--job-classname <job class name>`: Class name of the job to run. + + By default, Flink scans its class path for a JAR with a Main-Class or program-class manifest entry and chooses it as the job class. + Use this command line argument to manually set the job class. + This argument is required in case that no or more than one JAR with such a manifest entry is available on the class path. + +* `--job-id <job id>` (optional): Manually set a Flink job ID for the job (default: 00000000000000000000000000000000) + +* `--fromSavepoint /path/to/savepoint` (optional): Restore from a savepoint + + In order to resume from a savepoint, you also need to pass the savepoint path. + Note that `/path/to/savepoint` needs to be accessible in all docker containers of the cluster + (e.g. storing it on a DFS or from the mounted volume or adding it to the image). + +* `--allowNonRestoredState` (optional): Skip broken savepoint state + + Additionally you can specify this argument to allow that savepoint state is skipped which cannot be restored. -**Note:** The Docker images are provided as a community project by individuals -on a best-effort basis. They are not official releases by the Apache Flink PMC. +If the main function of the user job main class accepts arguments, you can also pass them at the end of the `docker run` command. -## Flink job cluster +## Customize Flink image -A Flink job cluster is a dedicated cluster which runs a single job. -The job is part of the image and, thus, there is no extra job submission needed. +When you run the Flink containers, there may be a need to customize them. +The next chapters describe some how-tos of what you can usually customize. -### Docker images +### Configure options -The Flink job cluster image needs to contain the user code jars of the job for which the cluster is started. -Therefore, one needs to build a dedicated container image for every job. -The `flink-container` module contains a `build.sh` script which can be used to create such an image. -Please see the [instructions](https://github.com/apache/flink/blob/{{ site.github_branch }}/flink-container/docker/README.md) for more details. +When you run Flink image, you can also change its configuration options by setting the environment variable `FLINK_PROPERTIES`: + +```sh +FLINK_PROPERTIES="jobmanager.rpc.address: host +taskmanager.numberOfTaskSlots: 3 +blob.server.port: 6124 +" +docker run --env FLINK_PROPERTIES=${FLINK_PROPERTIES} flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} <jobmanager|standalone-job|taskmanager> +``` + +The environment variable `FLINK_PROPERTIES` should contain a list of Flink cluster configuration options separated by new line, +the same way as in the `flink-conf.yaml`. + +### Provide custom configuration + +The configuration files (`flink-conf.yaml`, logging, hosts etc) are located in the `/opt/flink/conf` directory in the Flink image. +To provide a custom location for the Flink configuration files, you can + +* **either mount a volume** with the custom configuration files to this path `/opt/flink/conf` when you run the Flink image: + + ```sh + docker run \ + --mount type=bind,src=/host/path/to/custom/conf,target=/opt/flink/conf \ + flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} <jobmanager|standalone-job|taskmanager> + ``` + +* or add them to your **custom Flink image**, build and run it: + + ```dockerfile + FROM flink + ADD /host/path/to/flink-conf.yaml /opt/flink/conf/flink-conf.yaml + ADD /host/path/to/log4j.properties /opt/flink/conf/log4j.properties + ``` + +<span class="label label-warning">Warning!</span> The mounted volume must contain all necessary configuration files. +The `flink-conf.yaml` file must have write permission so that the docker entry point script can modify it in certain cases. + +### Using plugins -## Using plugins As described in the [plugins]({{ site.baseurl }}/ops/plugins.html) documentation page: in order to use plugins they must be copied to the correct location in the Flink installation for them to work. -When running Flink from one of the provided Docker images by default no plugins have been activated. -The simplest way to enable plugins is to modify the provided official Flink docker images by adding -an additional layer. This does however assume you have a docker registry available where you can push images to and -that is accessible by your cluster. +If you want to enable plugins provided with Flink, you can pass the environment variable `ENABLE_BUILT_IN_PLUGINS` +when you run the Flink image. +The `ENABLE_BUILT_IN_PLUGINS` should contain a list of plugin jar file names separated by `;`. -As an example assume you want to enable the [S3]({{ site.baseurl }}/ops/filesystems/s3.html) plugins in your installation. + ```sh + docker run \ + --env ENABLE_BUILT_IN_PLUGINS=flink-plugin1.jar;flink-plugin2.jar \ + flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} <jobmanager|standalone-job|taskmanager> + ``` -Create a Dockerfile with a content something like this: {% highlight dockerfile %} -# On which specific version of Flink is this based? -# Check https://hub.docker.com/_/flink?tab=tags for current options -FROM flink:{{ site.version }}-scala_2.12 +Otherwise you can use other [advanced methods](#advanced-customization). -# Install Flink S3 FS Presto plugin -RUN mkdir /opt/flink/plugins/s3-fs-presto && cp /opt/flink/opt/flink-s3-fs-presto* /opt/flink/plugins/s3-fs-presto +### Advanced customization -# Install Flink S3 FS Hadoop plugin -RUN mkdir /opt/flink/plugins/s3-fs-hadoop && cp /opt/flink/opt/flink-s3-fs-hadoop* /opt/flink/plugins/s3-fs-hadoop -{% endhighlight %} +If you want to further customize the Flink image, for example, for the following purposes: -Then build and push that image to your registry -{% highlight bash %} -docker build -t docker.example.nl/flink:{{ site.version }}-scala_2.12-s3 . -docker push docker.example.nl/flink:{{ site.version }}-scala_2.12-s3 -{% endhighlight %} +* install custom software (e.g. python) +* enable (symlink) optional libraries or plugins from `/opt/flink/opt` into `/opt/flink/lib` or `/opt/flink/plugins` +* add other libraries to `/opt/flink/lib` (e.g. [hadoop](hadoop.html#adding-hadoop-to-lib)) +* add other plugins to `/opt/flink/plugins` +* override configuration files -Now you can reference this image in your cluster deployment and the installed plugins are available for use. +you can achieve this in several ways: -## Flink with Docker Compose +* **override the container entry point** with a custom script where you can run any bootstrap actions. +At the end you can call the standard `/docker-entrypoint.sh` script of the Flink image with its usual arguments. + + The following example creates a custom entry point script which enables more libraries and plugins. + The custom script, custom library and plugin are provided from a mounted volume. + Then it runs the standard entry point script of the Flink image: + + ```sh + # create custom_lib.jar + # create custom_plugin.jar + + echo " + ln -fs /opt/flink/opt/flink-queryable-state-runtime-*.jar /opt/flink/lib/. # enable an optional library + ln -fs /mnt/custom_lib.jar /opt/flink/lib/. # enable a custom library + + mkdir -p /opt/flink/plugins/flink-s3-fs-hadoop + ln -fs /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/plugins/flink-s3-fs-hadoop/. # enable an optional plugin + + mkdir -p /opt/flink/plugins/custom_plugin + ln -fs /mnt/custom_plugin.jar /opt/flink/plugins/custom_plugin/. # enable a custom plugin + + /docker-entrypoint.sh <jobmanager|standalone-job|taskmanager> + " > custom_entry_point_script.sh + + chmod 755 custom_entry_point_script.sh -[Docker Compose](https://docs.docker.com/compose/) is a convenient way to run a -group of Docker containers locally. + docker run \ + --mount type=bind,src=$(pwd),target=/mnt + flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} /mnt/custom_entry_point_script.sh + ``` -Example config files for a [session cluster](https://github.com/docker-flink/examples/blob/master/docker-compose.yml) and a [job cluster](https://github.com/apache/flink/blob/{{ site.github_branch }}/flink-container/docker/docker-compose.yml) -are available on GitHub. +* **extend the Flink image** by writing a custom `Dockerfile` and build a custom image: + + ```dockerfile + FROM flink + + RUN set -ex; apt-get update; apt-get -y install python + + ADD /host/path/to/flink-conf.yaml /container/local/path/to/custom/conf/flink-conf.yaml + ADD /host/path/to/log4j.properties /container/local/path/to/custom/conf/log4j.properties + + RUN ln -fs /opt/flink/opt/flink-queryable-state-runtime-*.jar /opt/flink/lib/. + + RUN mkdir -p /opt/flink/plugins/flink-s3-fs-hadoop + RUN ln -fs /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/plugins/flink-s3-fs-hadoop/. + + ENV VAR_NAME value + ``` + + ```sh + docker build -t custom_flink_image . + # optional push to your docker image registry if you have it, + # e.g. to distribute the custom image to your cluster + docker push custom_flink_image + ``` + +{% top %} + +## Flink with Docker Compose + +[Docker Compose](https://docs.docker.com/compose/) is a way to run a group of Docker containers locally. +The next chapters show examples of configuration files to run Flink. ### Usage +* Create the `yaml` files with the container configuration, check examples for: + * [Session cluster](#session-cluster-with-docker-compose) + * [Single Job](#single-job-cluster-with-docker-compose) + + See also [the Flink docker image tags](#image-tags) and [how to customize the Flink docker image](#advanced-customization) + for usage in the configuration files. + * Launch a cluster in the foreground - docker-compose up + ```sh + docker-compose up + ``` * Launch a cluster in the background - docker-compose up -d + ```sh + docker-compose up -d + ``` -* Scale the cluster up or down to *N* TaskManagers +* Scale the cluster up or down to *N TaskManagers* - docker-compose scale taskmanager=<N> + ```sh + docker-compose scale taskmanager=<N> + ``` -* Kill the cluster +* Access the *Flink Master* container - docker-compose kill + ```sh + docker exec -it $(docker ps --filter name=jobmanager --format={{.ID}}) /bin/sh + ``` -When the cluster is running, you can visit the web UI at [http://localhost:8081](http://localhost:8081). -You can also use the web UI to submit a job to a session cluster. +* Kill the cluster -To submit a job to a session cluster via the command line, you must copy the JAR to the JobManager -container and submit the job from there. + ```sh + docker-compose kill + ``` + +* Access Web UI + + When the cluster is running, you can visit the web UI at [http://localhost:8081](http://localhost:8081). + You can also use the web UI to submit a job to a *Session cluster*. + +* To submit a job to a *Session cluster* via the command line, you must copy the JAR to the *Flink Master* container and +submit the job from there. For example: + + ```sh + JOB_CLASS_NAME="com.job.ClassName" + MASTER_CONTAINER=$(docker ps --filter name=jobmanager --format={{.ID}}) + docker cp path/to/jar "${MASTER_CONTAINER}":/job.jar + docker exec -t -i "${MASTER_CONTAINER}" flink run -d -c ${JOB_CLASS_NAME} /job.jar + ``` + +### Session Cluster with Docker Compose + +```yaml +version: "2.2" +services: + jobmanager: + image: flink:{% if site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif %} + ports: + - "8081:8081" + command: jobmanager + environment: + - | + FLINK_PROPERTIES= + jobmanager.rpc.address: jobmanager + taskmanager.numberOfTaskSlots: 2 Review comment: asaik, it only matters for active integrations atm, not for standalone, but it does not hurt and who knows future. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org