Thanks for summarizing the discussion.

A few comments inline below:


On Mon, Jul 15, 2019 at 5:28 PM Sam Bourne <[email protected]> wrote:

> Hello Beam devs,
>
> I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support
> passing options/flags to the docker run command executed as part of the
> portable environment workflow. I’m in need of providing specific volumes
> and possibly other docker run options as I refine our custom container and
> workflow.
>
> There were requests to bring this up in the mailing list to discuss
> possible ways to achieve this. There’s an existing PR #8828
> <https://github.com/apache/beam/pull/8828> but we took quite different
> approaches. #8828 is limited to only mounting /tmp/ directories with no
> support for other docker run options/flags so wouldn’t solve my needs.
>
> I chose to expand upon the existing flag environment_config and provide
> the additional docker run options there. This requires the SDK parse these
> out when building the DockerPayload protobuf. It’s worth noting that what
> is provided to environment_config changes depending on the
> environment_type. e.g. if environment_type is docker, environment_config
> is currently expected to be the docker container name, but other
> environment types have completely different expectations, and each uses its
> own protobuf message type.
>
> The current method (using python SDK) looks like this:
>
> python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 
> —environment_type DOCKER —environment_config MY_CONTAINER_NAME
>
> My PR expects other run options to be provided before the container name -
> similar to how you would start the container locally:
>
> python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 
> —environment_type DOCKER —environment_config “-v 
> /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user 
> sambvfx MY_CONTAINER_NAME”
>
> The PR’s feedback raises some questions that some of you may have opinions
> about. A hopefully faithful summary of them and my commentary below:
>
> Should we require the environment_config be a json encoded string that
> mirrors the protobuf?
>
> e.g.
>
> --environment_config '{"image_name": "MY_CONTAINER_NAME", "run_options": “-v 
> /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user 
> sambvfx"}'
>
> I’m not a fan due to it not being backwards compatible and difficult to
> provide to CLI. Users don’t want to type json into the shell.
>
I agree, typing JSON on command line is really messy. But I think having
meaningful parts in the config will be easier to maintain and compare.
Can we give a config file which can be read, parsed and delivered as
options to the docker environment.
Something like "--environment_config '~/my_docker_config.json/yaml'"

I think passing a user provided command to start docker might have security
issues as users might load mount an otherwise non accessible drive or
access prohibited port etc.

> Should we not assume docker run ... is the only way to start the
> container?
>
> I think any other method would likely require further changes to the
> protobuf or a completely new one.
>
Yes I think that makes sense. However, if we add more parameters to the
docker startup then the dockerpayload protobuf can be updated to have
those.

> Should we provide different args for mounting volume(s) and map that to
> the appropriate docker command within the beam code?
>
> This requires a lot of docker specific code to be included within beam.
>
> Any input would be appreciated.
>
> Cheers,
> Sam
>

Reply via email to