[GitHub] [flink] azagrebin commented on a change in pull request #12131: [FLINK-17161][docs][docker] Document the official docker hub image and examples of how to use it

GitBox Thu, 14 May 2020 05:27:25 -0700


azagrebin commented on a change in pull request #12131:
URL: https://github.com/apache/flink/pull/12131#discussion_r425096153




##########
File path: docs/ops/deployment/docker.md
##########
@@ -24,119 +24,476 @@ under the License.
 -->
 
 [Docker](https://www.docker.com) is a popular container runtime. 
-There are Docker images for Apache Flink available on Docker Hub which can be 
used to deploy a session cluster.
-The Flink repository also contains tooling to create container images to 
deploy a job cluster.
+There are Docker images for Apache Flink available [on Docker 
Hub](https://hub.docker.com/_/flink).
+You can use the docker images to deploy a Session or Job cluster in a 
containerised environment, e.g.
+[standalone Kubernetes](kubernetes.html) or [native 
Kubernetes](native_kubernetes.html).
 
 * This will be replaced by the TOC
 {:toc}
 
-## Flink session cluster
-
-A Flink session cluster can be used to run multiple jobs. 
-Each job needs to be submitted to the cluster after it has been deployed. 
-
-### Docker images
+## Docker Hub Flink images
 
 The [Flink Docker repository](https://hub.docker.com/_/flink/) is hosted on
 Docker Hub and serves images of Flink version 1.2.1 and later.
 
-Images for each supported combination of Hadoop and Scala are available, and 
tag aliases are provided for convenience.
+### Image tags
 
-Beginning with Flink 1.5, image tags that omit a Hadoop version (e.g.
-`-hadoop28`) correspond to Hadoop-free releases of Flink that do not include a
-bundled Hadoop distribution.
+Images for each supported combination of Flink and Scala versions are 
available, and
+[tag aliases](https://hub.docker.com/_/flink?tab=tags) are provided for 
convenience.
 
-For example, the following aliases can be used: *(`1.5.y` indicates the latest
-release of Flink 1.5)*
+For example, you can use the following aliases: *(`1.11.y` indicates the 
latest release of Flink 1.11)*
 
 * `flink:latest` → `flink:<latest-flink>-scala_<latest-scala>`
-* `flink:1.5` → `flink:1.5.y-scala_2.11`
-* `flink:1.5-hadoop27` → `flink:1.5.y-hadoop27-scala_2.11`
+* `flink:1.11` → `flink:1.11.y-scala_2.11`
+
+<span class="label label-info">Note</span> Prio to Flink 1.5 version, Hadoop 
dependencies were always bundled with Flink.
+You can see that certain tags include the version of Hadoop, e.g. (e.g. 
`-hadoop28`).
+Beginning with Flink 1.5, image tags that omit the Hadoop version correspond 
to Hadoop-free releases of Flink
+that do not include a bundled Hadoop distribution.
+
+## How to run Flink image
+
+The Flink image contains a regular Flink distribution with its default 
configuration and a standard entry point script.
+You can run its standard entry point in the following modes:
+* Flink Master for [a Session cluster](#start-a-session-cluster)
+* Flink Master for [a Single Job cluster](#start-a-single-job-cluster)
+* TaskManager for any cluster
+
+This allows to deploy a standalone cluster (Session or Single Job) in any 
containerised environment, for example:
+* manually in a local docker setup,
+* [in a Kubernetes cluster](kubernetes.html),
+* [with Docker Compose](#flink-with-docker-compose),
+* [with Docker swarm](#flink-with-docker-swarm).
+
+<span class="label label-info">Note</span> [The native 
Kubernetes](native_kubernetes.html) also runs the same image by default
+and deploys TaskManagers on demand so that you do not have to do it.
+
+The next chapters describe how to start a single Flink docker container for 
various purposes.
+
+### Start a Session Cluster
+
+A Flink Session cluster can be used to run multiple jobs. Each job needs to be 
submitted to the cluster after it has been deployed.
+To deploy a Flink Session cluster with docker, you need to start a Flink 
Master container:
+
+```sh
+docker run flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} jobmanager
+```
+
+and required number of TaskManager containers:
+
+```sh
+docker run flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} taskmanager
+```
+
+### Start a Single Job Cluster
+
+A Flink Job cluster is a dedicated cluster which runs a single job.
+The job artifacts should be already available locally in the container and, 
thus, there is no extra job submission needed.
+To deploy a cluster for a single job with docker, you need to
+* make job artifacts available locally *in all containers* under 
`/opt/flink/usrlib`,
+* start a Flink Master container in the Single Job mode
+* and required number of TaskManager containers.
+
+To make the **job artifacts available** locally in the container, you can
+
+* **either mount a volume** (or multiple volumes) with the artifacts to 
`/opt/flink/usrlib` when you start
+the Flink image as a Single Job Master and the required number of TaskManagers:
+
+    ```sh
+    docker run \
+        --mount 
type=bind,src=/host/path/to/job/artifacts1,target=/opt/flink/usrlib/artifacts/1 
\
+        --mount 
type=bind,src=/host/path/to/job/artifacts2,target=/opt/flink/usrlib/artifacts/2 
\
+        flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} standalone-job \
+        --job-classname com.job.ClassName \
+        --job-id <job id> \
+        [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] \
+        [job arguments]
+
+    docker run \
+        --mount 
type=bind,src=/host/path/to/job/artifacts1,target=/opt/flink/usrlib/artifacts/1 
\
+        --mount 
type=bind,src=/host/path/to/job/artifacts2,target=/opt/flink/usrlib/artifacts/2 
\
+        flink:{% if site.is_stable 
%}{{site.version}}-scala{{site.scala_version_suffix}}{% else %}latest{% endif 
%} taskmanager
+    ```
+
+* **or extend the Flink image** by writing a custom `Dockerfile`, build it and 
start the custom image as a Single Job Master
+and the required number of TaskManagers:
+
+    ```dockerfile
+    FROM flink
+    ADD /host/path/to/job/artifacts/1 /opt/flink/usrlib/artifacts/1
+    ADD /host/path/to/job/artifacts/2 /opt/flink/usrlib/artifacts/2
+    ```
+
+    ```sh
+    docker build -t flink_with_job_artifacts .
+    docker run \
+        flink_with_job_artifacts standalone-job \
+        --job-classname com.job.ClassName \
+        --job-id <job id> \
+        [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] \
+        [job arguments]
+
+    docker run flink_with_job_artifacts taskmanager
+    ```
+
+The `standalone-job` argument starts a Flink Master container in the Single 
Job mode.
+
+#### Flink Master additional command line arguments
+
+You can provide the following additional command line arguments to the cluster 
entrypoint:
+
+* `--job-classname <job class name>`: Class name of the job to run.
+
+  By default, Flink scans its class path for a JAR with a Main-Class or 
program-class manifest entry and choses it as the job class.
+  Use this command line argument to manually set the job class.
+  This argument is required in case that no or more than one JAR with such a 
manifest entry is available on the class path.
+
+* `--job-id <job id>` (optional): Manually set a Flink job ID for the job 
(default: 00000000000000000000000000000000)
+
+* `--fromSavepoint /path/to/savepoint` (optional): Restore from a savepoint
 
-**Note:** The Docker images are provided as a community project by individuals
-on a best-effort basis. They are not official releases by the Apache Flink PMC.
+  In order to resume from a savepoint, you also need to pass the savepoint 
path.
+  Note that `/path/to/savepoint` needs to be accessible in the docker 
container locally
+  (e.g. from the mounted volume or adding it to the image or storing it on a 
DFS).
 
-## Flink job cluster
+* `--allowNonRestoredState` (optional): Skip broken savepoint state
 
-A Flink job cluster is a dedicated cluster which runs a single job. 
-The job is part of the image and, thus, there is no extra job submission 
needed. 
+  Additionally you can specify this argument to allow that savepoint state is 
skipped which cannot be restored.
 
-### Docker images
+If the main function of the user job main class has arguments, you can also 
pass them at the end of the `docker run` command.
 
-The Flink job cluster image needs to contain the user code jars of the job for 
which the cluster is started.
-Therefore, one needs to build a dedicated container image for every job.
-The `flink-container` module contains a `build.sh` script which can be used to 
create such an image.
-Please see the [instructions](https://github.com/apache/flink/blob/{{ 
site.github_branch }}/flink-container/docker/README.md) for more details. 
+## Customize Flink image
+
+When you run the Flink containers, there is often a need to customize them.
+The next chapters describe some how-tos of what you can usually customize.
+
+### Configure options
+
+When you run Flink image, you can also change its configuration options by 
setting the environment variable `FLINK_PROPERTIES`:
+
+```sh
+FLINK_PROPERTIES="jobmanager.rpc.address: host
+taskmanager.numberOfTaskSlots: 3
+blob.server.port: 6124
+"
+docker run --env FLINK_PROPERTIES=${FLINK_PROPERTIES} flink:{% if 
site.is_stable %}{{site.version}}-scala{{site.scala_version_suffix}}{% else 
%}latest{% endif %} <jobmanager|standalone-job|taskmanager>
+```
+
+The environment variable `FLINK_PROPERTIES` should contain a list of Flink 
cluster configuration options separated by new line,
+the same way as in the `flink-conf.yaml`.
+The entry point script will override or add the listed options to the 
`flink-conf.yaml` before starting the Flink process in the container.

Review comment:
       true, I actually forgot to remove this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] azagrebin commented on a change in pull request #12131: [FLINK-17161][docs][docker] Document the official docker hub image and examples of how to use it

Reply via email to