Thanks for summarizing the discussion. A few comments inline below:
On Mon, Jul 15, 2019 at 5:28 PM Sam Bourne <[email protected]> wrote: > Hello Beam devs, > > I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support > passing options/flags to the docker run command executed as part of the > portable environment workflow. I’m in need of providing specific volumes > and possibly other docker run options as I refine our custom container and > workflow. > > There were requests to bring this up in the mailing list to discuss > possible ways to achieve this. There’s an existing PR #8828 > <https://github.com/apache/beam/pull/8828> but we took quite different > approaches. #8828 is limited to only mounting /tmp/ directories with no > support for other docker run options/flags so wouldn’t solve my needs. > > I chose to expand upon the existing flag environment_config and provide > the additional docker run options there. This requires the SDK parse these > out when building the DockerPayload protobuf. It’s worth noting that what > is provided to environment_config changes depending on the > environment_type. e.g. if environment_type is docker, environment_config > is currently expected to be the docker container name, but other > environment types have completely different expectations, and each uses its > own protobuf message type. > > The current method (using python SDK) looks like this: > > python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 > —environment_type DOCKER —environment_config MY_CONTAINER_NAME > > My PR expects other run options to be provided before the container name - > similar to how you would start the container locally: > > python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 > —environment_type DOCKER —environment_config “-v > /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user > sambvfx MY_CONTAINER_NAME” > > The PR’s feedback raises some questions that some of you may have opinions > about. A hopefully faithful summary of them and my commentary below: > > Should we require the environment_config be a json encoded string that > mirrors the protobuf? > > e.g. > > --environment_config '{"image_name": "MY_CONTAINER_NAME", "run_options": “-v > /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user > sambvfx"}' > > I’m not a fan due to it not being backwards compatible and difficult to > provide to CLI. Users don’t want to type json into the shell. > I agree, typing JSON on command line is really messy. But I think having meaningful parts in the config will be easier to maintain and compare. Can we give a config file which can be read, parsed and delivered as options to the docker environment. Something like "--environment_config '~/my_docker_config.json/yaml'" I think passing a user provided command to start docker might have security issues as users might load mount an otherwise non accessible drive or access prohibited port etc. > Should we not assume docker run ... is the only way to start the > container? > > I think any other method would likely require further changes to the > protobuf or a completely new one. > Yes I think that makes sense. However, if we add more parameters to the docker startup then the dockerpayload protobuf can be updated to have those. > Should we provide different args for mounting volume(s) and map that to > the appropriate docker command within the beam code? > > This requires a lot of docker specific code to be included within beam. > > Any input would be appreciated. > > Cheers, > Sam >
