[ 
https://issues.apache.org/jira/browse/BEAM-8273?focusedWorklogId=344598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-344598
 ]

ASF GitHub Bot logged work on BEAM-8273:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Nov/19 21:29
            Start Date: 15/Nov/19 21:29
    Worklog Time Spent: 10m 
      Work Description: ibzib commented on pull request #10116: [BEAM-8273] 
Expand portability environment documentation
URL: https://github.com/apache/beam/pull/10116#discussion_r347018467
 
 

 ##########
 File path: website/src/roadmap/portability.md
 ##########
 @@ -170,18 +170,32 @@ Python streaming mode is not yet supported on Spark.
 The Beam Python SDK allows configuration of the SDK harness to accommodate 
varying cluster setups.
 
 - `environment_type` determines where user code will be executed.
-  - `LOOPBACK`: User code is executed within the same process that submitted 
the pipeline. This
-    option is useful for local testing. However, it is not suitable for a 
production environment,
-    as it requires a connection between the original Python process and the 
worker nodes, and
-    performs work on the machine the job originated from, not the worker nodes.
-  - `PROCESS`: User code is executed by processes that are automatically 
started by the runner on
-    each worker node.
+  `environment_config` configures the environment depending on the value of 
`environment_type`.
   - `DOCKER` (default): User code is executed within a container started on 
each worker node.
-    This requires docker to be installed on worker nodes. For more 
information, see
+    This requires docker to be installed on worker nodes.
+    - `environment_config`: URL for the Docker container image. Official 
Docker images
+    are available [here](https://hub.docker.com/u/apachebeam) and are used by 
default.
+    Alternatively, you can build your own image by following the instructions
     [here]({{ site.baseurl }}/documentation/runtime/environments/).
-- `environment_config` configures the environment depending on the value of 
`environment_type`.
-  - When `environment_type=DOCKER`: URL for the Docker container image.
-  - When `environment_type=PROCESS`: JSON of the form `{"os": "<OS>", "arch": 
"<ARCHITECTURE>",
+  - `PROCESS`: User code is executed by processes that are automatically 
started by the runner on
+    each worker node.
+    - `environment_config`: JSON of the form `{"os": "<OS>", "arch": 
"<ARCHITECTURE>",
     "command": "<process to execute>", "env":{"<Environment variables 1>": 
"<ENV_VAL>"} }`. All
     fields in the JSON are optional except `command`.
+      - For `command`, it is recommended to use the bootloader executable, 
which can be built from
+        source with `./gradlew :sdks:python:container:build` and copied from
+        `sdks/python/container/build/target/launcher/linux_amd64/boot` to 
worker machines.
+        Note that the Python bootloader assumes Python and the `apache_beam` 
module are installed
+        on each worker machine.
+  - `EXTERNAL`: User code will be dispatched to an external service. For 
example, one can start
+    an external service for Python workers by running
+    `docker run -p=50000:50000 apachebeam/python3.6_sdk --worker_pool`.
+    - `environment_config`: Address for the external service, e.g. 
`localhost:50000`.
+    - To access a Dockerized worker pool on Mac or Windows, set the 
`BEAM_WORKER_POOL_IN_DOCKER_VM`
+      environment variable: `export BEAM_WORKER_POOL_IN_DOCKER_VM=1`.
 
 Review comment:
   Clarified this point.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 344598)
    Time Spent: 2h  (was: 1h 50m)

> Improve worker script for environment_type=PROCESS
> --------------------------------------------------
>
>                 Key: BEAM-8273
>                 URL: https://issues.apache.org/jira/browse/BEAM-8273
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-harness
>            Reporter: Kyle Weaver
>            Assignee: Kyle Weaver
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> When environment_type=PROCESS, environment_config specifies the command to 
> run the worker processes. Right now, it defaults to None and errors if not 
> set (`TypeError: expected string or buffer`).
> It might not be feasible to offer a one-size-fits-all executable for 
> providing as environment_config, but we could at least:
> a) make it easier to build one (right now I only see the executable being 
> built in a test script that depends on docker: 
> [https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165])
> b) document the process
> c) link to the documentation when no environment_config is provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to