tysonjh commented on a change in pull request #13420: URL: https://github.com/apache/beam/pull/13420#discussion_r532768803
########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) Review comment: ```suggestion Prebuilt SDK container images are released per supported language during Beam releases and are pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -32,9 +32,10 @@ Users may want to customize container images for many reasons, including: This guide describes how to create and use customized containers for the Beam SDK. ### Prerequisites -You will need to have [Docker installed](https://docs.docker.com/get-docker/). -In addition, you will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. +* You will need to have a version of the Beam SDK >= 2.21.0. Review comment: Is this true for all of Beam, or just for Dataflow? If it's only for Dataflow it should be updated to reflect that. ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment Review comment: ```suggestion * Pre-installing additional dependencies. * Launching third-party software. * Further customizing the execution environment. ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: Review comment: Maybe rephrase to, Beam [SDK container images](https://hub.docker.com/search?q=apache%2Fbeam&type=image) are built from Dockerfiles checked into the GitHub repository and published to Docker Hub every release. ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} Review comment: This should match the heading mentioned above. ```suggestion #### Writing a new Dockerfile based on an existing published container image {#writing-new-dockerfiles} ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. Review comment: Why is this recommended? Should it be part of the steps in this section? ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. +docker pull "${BASE_IMAGE}" +docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" . +docker push "${IMAGE_NAME}:${TAG}" ``` -2. [Write a new Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) that [designates](https://docs.docker.com/engine/reference/builder/#from) the original as its [parent](https://docs.docker.com/glossary/?term=parent%20image). -3. [Build](#building-container-images) a child image. -### Modifying the original Dockerfile {#modifying-dockerfiles} +**NOTE**: After pushing a container image, you should verify the remote image ID and digest should match the local image ID and digest, output from `docker build` or `docker images`. + +#### Modifying the original Dockerfile {#modifying-dockerfiles} in Beam source + +This method will require building image artifacts from Beam source - see the [Contribution guide](contribute/#development-setup) for additional instructions on setting up your development environment. + +1. Clone the `beam` repository. -1. Clone the `beam` repository: ``` git clone https://github.com/apache/beam.git ``` -2. Customize the [Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. -3. [Reimage](#building-container-images) the container. -### Testing customized images +2. Customize the `Dockerfile` for a given language. This file is typically in the `sdks/<language>/container` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).. If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. Review comment: ```suggestion 2. Customize the `Dockerfile` for a given language. This file is typically in the `sdks/<language>/container` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. +docker pull "${BASE_IMAGE}" +docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" . +docker push "${IMAGE_NAME}:${TAG}" ``` -2. [Write a new Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) that [designates](https://docs.docker.com/engine/reference/builder/#from) the original as its [parent](https://docs.docker.com/glossary/?term=parent%20image). -3. [Build](#building-container-images) a child image. -### Modifying the original Dockerfile {#modifying-dockerfiles} +**NOTE**: After pushing a container image, you should verify the remote image ID and digest should match the local image ID and digest, output from `docker build` or `docker images`. + +#### Modifying the original Dockerfile {#modifying-dockerfiles} in Beam source + +This method will require building image artifacts from Beam source - see the [Contribution guide](contribute/#development-setup) for additional instructions on setting up your development environment. + +1. Clone the `beam` repository. -1. Clone the `beam` repository: ``` git clone https://github.com/apache/beam.git ``` -2. Customize the [Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. -3. [Reimage](#building-container-images) the container. -### Testing customized images +2. Customize the `Dockerfile` for a given language. This file is typically in the `sdks/<language>/container` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).. If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. + +3. Navigate to the root directory of the local copy of your Apache Beam. + +4. Run Gradle with the `docker` target. + + +``` +# The default repository of each SDK +./gradlew :sdks:java:container:java8:docker +./gradlew :sdks:java:container:java11:docker +./gradlew :sdks:go:container:docker +./gradlew :sdks:python:container:py36:docker +./gradlew :sdks:python:container:py37:docker +./gradlew :sdks:python:container:py38:docker + +# Shortcut for building all Python SDKs +./gradlew :sdks:python:container buildAll +``` + +To examine the containers that you built, run `docker images`: + +``` +$> docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +apache/beam_java8_sdk latest ... 1 min ago ... +apache/beam_java11_sdk latest ... 1 min ago ... +apache/beam_python3.6_sdk latest ... 1 min ago ... +apache/beam_python3.7_sdk latest ... 1 min ago ... +apache/beam_python3.8_sdk latest ... 1 min ago ... +apache/beam_go_sdk latest ... 1 min ago ... +``` + +If you did not provide a custom repo/tag as additional parameters (see below), you can retag the image and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker to a remote repository. + +``` +export IMAGE_NAME="myrepo/mybeamsdk" +export TAG="latest" + +docker tag apache/beam_python3.6_sdk "${IMAGE_NAME}:${TAG}" +docker push "${IMAGE_NAME}:${TAG}" +``` + +**NOTE**: After pushing a container image, verify the remote image ID and digest matches the local image ID and digest output from `docker_images` + +##### Additional Build Parameters + +The docker Gradle task defines a default image repository and [tag](https://docs.docker.com/engine/reference/commandline/tag/) is the SDK version defined at [gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties). The default repository is the Docker Hub `apache` namespace, and the default tag is the [SDK version](https://github.com/apache/beam/blob/master/gradle.properties) defined at gradle.properties. With these settings, the +`docker` command-line tool will implicitly try to push the container to the Docker Hub Apache repository. + +You can specify a different repository or tag for built images by providing parameters to the build task. For example: + +``` +./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=example-repo -Pdocker-tag=2019-10-04 +``` -To test a customized image locally, run a pipeline with PortableRunner and set the `--environment_config` flag to the image path: +builds the Python 3.6 container and tags it as `example-repo/beam_python3.6_sdk:2019-10-04`. + +From 2.21.0, a `docker-pull-licenses` flag was introduced to add licenses/notices for third party dependencies to the docker images. For example: Review comment: ```suggestion From Beam 2.21.0 and later, a `docker-pull-licenses` flag was introduced to add licenses/notices for third party dependencies to the docker images. For example: ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. +docker pull "${BASE_IMAGE}" +docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" . +docker push "${IMAGE_NAME}:${TAG}" ``` -2. [Write a new Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) that [designates](https://docs.docker.com/engine/reference/builder/#from) the original as its [parent](https://docs.docker.com/glossary/?term=parent%20image). -3. [Build](#building-container-images) a child image. -### Modifying the original Dockerfile {#modifying-dockerfiles} +**NOTE**: After pushing a container image, you should verify the remote image ID and digest should match the local image ID and digest, output from `docker build` or `docker images`. + +#### Modifying the original Dockerfile {#modifying-dockerfiles} in Beam source + +This method will require building image artifacts from Beam source - see the [Contribution guide](contribute/#development-setup) for additional instructions on setting up your development environment. + +1. Clone the `beam` repository. -1. Clone the `beam` repository: ``` git clone https://github.com/apache/beam.git ``` -2. Customize the [Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. -3. [Reimage](#building-container-images) the container. -### Testing customized images +2. Customize the `Dockerfile` for a given language. This file is typically in the `sdks/<language>/container` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).. If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. + +3. Navigate to the root directory of the local copy of your Apache Beam. + +4. Run Gradle with the `docker` target. + + +``` +# The default repository of each SDK +./gradlew :sdks:java:container:java8:docker +./gradlew :sdks:java:container:java11:docker +./gradlew :sdks:go:container:docker +./gradlew :sdks:python:container:py36:docker +./gradlew :sdks:python:container:py37:docker +./gradlew :sdks:python:container:py38:docker + +# Shortcut for building all Python SDKs +./gradlew :sdks:python:container buildAll +``` + +To examine the containers that you built, run `docker images`: + +``` +$> docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +apache/beam_java8_sdk latest ... 1 min ago ... +apache/beam_java11_sdk latest ... 1 min ago ... +apache/beam_python3.6_sdk latest ... 1 min ago ... +apache/beam_python3.7_sdk latest ... 1 min ago ... +apache/beam_python3.8_sdk latest ... 1 min ago ... +apache/beam_go_sdk latest ... 1 min ago ... +``` + +If you did not provide a custom repo/tag as additional parameters (see below), you can retag the image and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker to a remote repository. + +``` +export IMAGE_NAME="myrepo/mybeamsdk" +export TAG="latest" + +docker tag apache/beam_python3.6_sdk "${IMAGE_NAME}:${TAG}" +docker push "${IMAGE_NAME}:${TAG}" +``` + +**NOTE**: After pushing a container image, verify the remote image ID and digest matches the local image ID and digest output from `docker_images` + +##### Additional Build Parameters + +The docker Gradle task defines a default image repository and [tag](https://docs.docker.com/engine/reference/commandline/tag/) is the SDK version defined at [gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties). The default repository is the Docker Hub `apache` namespace, and the default tag is the [SDK version](https://github.com/apache/beam/blob/master/gradle.properties) defined at gradle.properties. With these settings, the +`docker` command-line tool will implicitly try to push the container to the Docker Hub Apache repository. + +You can specify a different repository or tag for built images by providing parameters to the build task. For example: + +``` +./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=example-repo -Pdocker-tag=2019-10-04 +``` -To test a customized image locally, run a pipeline with PortableRunner and set the `--environment_config` flag to the image path: +builds the Python 3.6 container and tags it as `example-repo/beam_python3.6_sdk:2019-10-04`. + +From 2.21.0, a `docker-pull-licenses` flag was introduced to add licenses/notices for third party dependencies to the docker images. For example: + +``` +./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses +``` +creates a Java 8 SDK image with appropriate licenses in `/opt/apache/beam/third_party_licenses/`. + +By default, no licenses/notices are added to the docker images. + + +## Using Container Images in Pipelines Review comment: Use sentence case for titles throughout. Lifted from: https://developers.google.com/style/capitalization#capitalization-in-titles-and-headings ```suggestion ## Using custom container images in pipelines ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. Review comment: Could we also include a couple other hosts in addition to Docker Hub? Google CR, Amazon ECR. ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: Review comment: Jumping between 'Users' and 'You' can get confusing. I think picking 'you' is the right pronoun. ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: Review comment: ```suggestion Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). You can build customized containers in one of two ways: ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. Review comment: ```suggestion 1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container image**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) Review comment: ```suggestion 1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from). ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: Review comment: The previous step covers this first phrase. ```suggestion 2. [Build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. +docker pull "${BASE_IMAGE}" +docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" . +docker push "${IMAGE_NAME}:${TAG}" ``` -2. [Write a new Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) that [designates](https://docs.docker.com/engine/reference/builder/#from) the original as its [parent](https://docs.docker.com/glossary/?term=parent%20image). -3. [Build](#building-container-images) a child image. -### Modifying the original Dockerfile {#modifying-dockerfiles} +**NOTE**: After pushing a container image, you should verify the remote image ID and digest should match the local image ID and digest, output from `docker build` or `docker images`. + +#### Modifying the original Dockerfile {#modifying-dockerfiles} in Beam source Review comment: This should match the phrase from earlier. ```suggestion #### Modifying an existing Dockerfile {#modifying-dockerfiles} in Beam source ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. Review comment: This seems like it would be better positioned under the bullet labelled `1`. ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. +docker pull "${BASE_IMAGE}" +docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" . +docker push "${IMAGE_NAME}:${TAG}" ``` -2. [Write a new Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) that [designates](https://docs.docker.com/engine/reference/builder/#from) the original as its [parent](https://docs.docker.com/glossary/?term=parent%20image). -3. [Build](#building-container-images) a child image. -### Modifying the original Dockerfile {#modifying-dockerfiles} +**NOTE**: After pushing a container image, you should verify the remote image ID and digest should match the local image ID and digest, output from `docker build` or `docker images`. + +#### Modifying the original Dockerfile {#modifying-dockerfiles} in Beam source + +This method will require building image artifacts from Beam source - see the [Contribution guide](contribute/#development-setup) for additional instructions on setting up your development environment. Review comment: Suggestion lifted from: https://developers.google.com/style/clause-order ```suggestion This method will require building image artifacts from Beam source. For additional instructions on setting up your development environment, see the [Contribution guide](contribute/#development-setup). ``` ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -87,77 +202,28 @@ python -m apache_beam.examples.wordcount \ --output=path/to/write/counts \ --runner=PortableRunner \ --job_endpoint=localhost:8099 \ ---environment_config=path/to/container/image +--environment_config="${IMAGE}:${TAG}" {{< /highlight >}} -## Building container images - -To build Beam SDK container images: - -1. Navigate to the root directory of the local copy of your Apache Beam. -2. Run Gradle with the `docker` target. If you're [building a child image](#writing-new-dockerfiles), set the optional `--file` flag to the new Dockerfile. If you're [building an image from an original Dockerfile](#modifying-dockerfiles), ignore the `--file` flag: - -``` -# The default repository of each SDK -./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java8:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java11:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:go:container:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py2:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py35:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py36:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py37:docker - -# Shortcut for building all four Python SDKs -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container buildAll -``` - -From 2.21.0, `docker-pull-licenses` tag was introduced. Licenses/notices of third party dependencies will be added to the docker images when `docker-pull-licenses` was set. -For example, `./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses`. The files are added to `/opt/apache/beam/third_party_licenses/`. -By default, no licenses/notices are added to the docker images. - -To examine the containers that you built, run `docker images` from anywhere in the command line. If you successfully built all of the container images, the command prints a table like the following: -``` -REPOSITORY TAG IMAGE ID CREATED SIZE -apache/beam_java8_sdk latest ... 2 weeks ago ... -apache/beam_java11_sdk latest ... 2 weeks ago ... -apache/beam_python2.7_sdk latest ... 2 weeks ago ... -apache/beam_python3.5_sdk latest ... 2 weeks ago ... -apache/beam_python3.6_sdk latest ... 2 weeks ago ... -apache/beam_python3.7_sdk latest ... 2 weeks ago ... -apache/beam_go_sdk latest ... 2 weeks ago ... -``` - -### Overriding default Docker targets - -The default [tag](https://docs.docker.com/engine/reference/commandline/tag/) is sdk_version defined at [gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties) and the default repositories are in the Docker Hub `apache` namespace. -The `docker` command-line tool implicitly [pushes container images](#pushing-container-images) to this location. +{{< highlight class="runner-dataflow" >}} +export IMAGE="my-repo/beam_python_sdk_custom" +export TAG="X.Y.Z" -To tag a local image, set the `docker-tag` option when building the container. The following command tags a Python SDK image with a date. -``` -./gradlew :sdks:python:container:py36:docker -Pdocker-tag=2019-10-04 -``` - -To change the repository, set the `docker-repository-root` option to a new location. The following command sets the `docker-repository-root` -to a repository named `example-repo` on Docker Hub. -``` -./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=example-repo -``` +export GCS_PATH="gs://my-gcs-bucket" +export GCP_PROJECT="my-gcp-project" +export REGION="us-central1" -## Pushing container images - -After [building a container image](#building-container-images), you can store it in a remote Docker repository. - -The following steps push a Python3.6 SDK image to the [`docker-root-repository` value](#overriding-default-docker-targets). -Please log in to the destination repository as needed. - -Upload it to the remote repository: -``` -docker push example-repo/beam_python3.6_sdk -``` - -To download the image again, run `docker pull`: -``` -docker pull example-repo/beam_python3.6_sdk -``` +# Run a pipeline on Dataflow. +# This is a Python batch pipeline, so to run on Dataflow Runner V2 +# you must specify the experiment "use_runner_v2" Review comment: Please put this into a code block or reformat. ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -87,77 +202,28 @@ python -m apache_beam.examples.wordcount \ --output=path/to/write/counts \ --runner=PortableRunner \ --job_endpoint=localhost:8099 \ ---environment_config=path/to/container/image +--environment_config="${IMAGE}:${TAG}" {{< /highlight >}} -## Building container images - -To build Beam SDK container images: - -1. Navigate to the root directory of the local copy of your Apache Beam. -2. Run Gradle with the `docker` target. If you're [building a child image](#writing-new-dockerfiles), set the optional `--file` flag to the new Dockerfile. If you're [building an image from an original Dockerfile](#modifying-dockerfiles), ignore the `--file` flag: - -``` -# The default repository of each SDK -./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java8:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java11:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:go:container:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py2:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py35:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py36:docker -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py37:docker - -# Shortcut for building all four Python SDKs -./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container buildAll -``` - -From 2.21.0, `docker-pull-licenses` tag was introduced. Licenses/notices of third party dependencies will be added to the docker images when `docker-pull-licenses` was set. -For example, `./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses`. The files are added to `/opt/apache/beam/third_party_licenses/`. -By default, no licenses/notices are added to the docker images. - -To examine the containers that you built, run `docker images` from anywhere in the command line. If you successfully built all of the container images, the command prints a table like the following: -``` -REPOSITORY TAG IMAGE ID CREATED SIZE -apache/beam_java8_sdk latest ... 2 weeks ago ... -apache/beam_java11_sdk latest ... 2 weeks ago ... -apache/beam_python2.7_sdk latest ... 2 weeks ago ... -apache/beam_python3.5_sdk latest ... 2 weeks ago ... -apache/beam_python3.6_sdk latest ... 2 weeks ago ... -apache/beam_python3.7_sdk latest ... 2 weeks ago ... -apache/beam_go_sdk latest ... 2 weeks ago ... -``` - -### Overriding default Docker targets - -The default [tag](https://docs.docker.com/engine/reference/commandline/tag/) is sdk_version defined at [gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties) and the default repositories are in the Docker Hub `apache` namespace. -The `docker` command-line tool implicitly [pushes container images](#pushing-container-images) to this location. +{{< highlight class="runner-dataflow" >}} +export IMAGE="my-repo/beam_python_sdk_custom" +export TAG="X.Y.Z" -To tag a local image, set the `docker-tag` option when building the container. The following command tags a Python SDK image with a date. -``` -./gradlew :sdks:python:container:py36:docker -Pdocker-tag=2019-10-04 -``` - -To change the repository, set the `docker-repository-root` option to a new location. The following command sets the `docker-repository-root` -to a repository named `example-repo` on Docker Hub. -``` -./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=example-repo -``` +export GCS_PATH="gs://my-gcs-bucket" +export GCP_PROJECT="my-gcp-project" +export REGION="us-central1" Review comment: Maybe add a comment about GCR here as well? ########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -15,56 +15,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Container environments +# Container Environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment is [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. This means any execution engine can run the Beam SDK. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language version during Beam releases and and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image) -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom Containers -## Customizing container images +Users may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* pre-installing additional dependencies, +* launching third-party software +* further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to have a version of the Beam SDK >= 2.21.0. +* You will need to have [Docker installed](https://docs.docker.com/get-docker/). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries. -1. Pull a [prebuilt SDK container image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your [target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) language and version. The following example pulls the latest Python SDK: +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +### Building and pushing custom containers + +Beam builds prebuilt images from [Dockerfiles](https://docs.docker.com/engine/reference/builder/). Users can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on an existing prebuilt container**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) an existing Dockerfile in [Beam source](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} + +Steps: + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from) + +2. Once you have a created a custom Dockerfile, [build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker: + +As an example, this `Dockerfile`: + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ ``` -docker pull apache/beam_python3.7_sdk + +uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. + +``` +export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0" +export IMAGE_NAME="myremoterepo/mybeamsdk" +export TAG="latest" + +# Optional but recommended pull step to pull the base image into your local Docker daemon. +docker pull "${BASE_IMAGE}" +docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" . +docker push "${IMAGE_NAME}:${TAG}" ``` -2. [Write a new Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) that [designates](https://docs.docker.com/engine/reference/builder/#from) the original as its [parent](https://docs.docker.com/glossary/?term=parent%20image). -3. [Build](#building-container-images) a child image. -### Modifying the original Dockerfile {#modifying-dockerfiles} +**NOTE**: After pushing a container image, you should verify the remote image ID and digest should match the local image ID and digest, output from `docker build` or `docker images`. + +#### Modifying the original Dockerfile {#modifying-dockerfiles} in Beam source + +This method will require building image artifacts from Beam source - see the [Contribution guide](contribute/#development-setup) for additional instructions on setting up your development environment. + +1. Clone the `beam` repository. -1. Clone the `beam` repository: ``` git clone https://github.com/apache/beam.git ``` -2. Customize the [Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. -3. [Reimage](#building-container-images) the container. -### Testing customized images +2. Customize the `Dockerfile` for a given language. This file is typically in the `sdks/<language>/container` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).. If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. + +3. Navigate to the root directory of the local copy of your Apache Beam. + +4. Run Gradle with the `docker` target. + + +``` +# The default repository of each SDK +./gradlew :sdks:java:container:java8:docker +./gradlew :sdks:java:container:java11:docker +./gradlew :sdks:go:container:docker +./gradlew :sdks:python:container:py36:docker +./gradlew :sdks:python:container:py37:docker +./gradlew :sdks:python:container:py38:docker + +# Shortcut for building all Python SDKs +./gradlew :sdks:python:container buildAll +``` + +To examine the containers that you built, run `docker images`: + +``` +$> docker images +REPOSITORY TAG IMAGE ID CREATED SIZE +apache/beam_java8_sdk latest ... 1 min ago ... +apache/beam_java11_sdk latest ... 1 min ago ... +apache/beam_python3.6_sdk latest ... 1 min ago ... +apache/beam_python3.7_sdk latest ... 1 min ago ... +apache/beam_python3.8_sdk latest ... 1 min ago ... +apache/beam_go_sdk latest ... 1 min ago ... +``` + +If you did not provide a custom repo/tag as additional parameters (see below), you can retag the image and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker to a remote repository. + +``` +export IMAGE_NAME="myrepo/mybeamsdk" +export TAG="latest" + +docker tag apache/beam_python3.6_sdk "${IMAGE_NAME}:${TAG}" +docker push "${IMAGE_NAME}:${TAG}" +``` + +**NOTE**: After pushing a container image, verify the remote image ID and digest matches the local image ID and digest output from `docker_images` + +##### Additional Build Parameters + +The docker Gradle task defines a default image repository and [tag](https://docs.docker.com/engine/reference/commandline/tag/) is the SDK version defined at [gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties). The default repository is the Docker Hub `apache` namespace, and the default tag is the [SDK version](https://github.com/apache/beam/blob/master/gradle.properties) defined at gradle.properties. With these settings, the +`docker` command-line tool will implicitly try to push the container to the Docker Hub Apache repository. + +You can specify a different repository or tag for built images by providing parameters to the build task. For example: + +``` +./gradlew :sdks:python:container:py36:docker -Pdocker-repository-root=example-repo -Pdocker-tag=2019-10-04 +``` -To test a customized image locally, run a pipeline with PortableRunner and set the `--environment_config` flag to the image path: +builds the Python 3.6 container and tags it as `example-repo/beam_python3.6_sdk:2019-10-04`. + +From 2.21.0, a `docker-pull-licenses` flag was introduced to add licenses/notices for third party dependencies to the docker images. For example: + +``` +./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses +``` +creates a Java 8 SDK image with appropriate licenses in `/opt/apache/beam/third_party_licenses/`. + +By default, no licenses/notices are added to the docker images. + + +## Using Container Images in Pipelines + +The common method for providing a container image requires using the PortableRunner and setting the `--environment_config` flag to a given image path. +Other runners, such as Dataflow, support specifying containers with different flags. Review comment: This seems unfortunate. WDYT about Dataflow using this flag as well in the future? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org