Makes sense to have an option to run the SDK harness in a non-dockerized environment.
I'm in the process of creating a Docker entry point for Flink's JobServer[1]. I suppose you would also prefer to execute that one standalone. We should make sure this is also an option. [1] https://issues.apache.org/jira/browse/BEAM-4130 On 16.08.18 07:42, Thomas Weise wrote: > Yes, that's the proposal. Everything that would otherwise be packaged > into the Docker container would need to be pre-installed in the host > environment. In the case of Python SDK, that could simply mean a > (frozen) virtual environment that was setup when the host was > provisioned - the SDK harness process(es) will then just utilize that. > Of course this flavor of SDK harness execution could also be useful in > the local development environment, where right now someone who already > has the Python environment needs to also install Docker and update a > container to launch a Python SDK pipeline on the Flink runner. > > On Wed, Aug 15, 2018 at 12:40 PM Daniel Oliveira <danolive...@google.com > <mailto:danolive...@google.com>> wrote: > > I just want to clarify that I understand this correctly since I'm > not that familiar with the details behind all these execution > environments yet. Is the proposal to create a new JobBundleFactory > that instead of using Docker to create the environment that the new > processes will execute in, this JobBundleFactory would execute the > new processes directly in the host environment? So in practice if I > ran a pipeline with this JobBundleFactory the SDK Harness and Runner > Harness would both be executing directly on my machine and would > depend on me having the dependencies already present on my machine? > > On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka <goe...@google.com > <mailto:goe...@google.com>> wrote: > > Thanks for starting the discussion. I will be happy to help. > I agree, we should have pluggable SDKHarness environment Factory. > We can register multiple Environment factory using service > registry and use the PipelineOption to pick the right one on per > job basis. > > There are a couple of things which are require to setup before > launching the process. > > * Setting up the environment as done in boot.go [4] > * Retrieving and putting the artifacts in the right location. > > You can probably leverage boot.go code to setup the environment. > > Also, it will be useful to enumerate pros and cons of different > Environments to help users choose the right one. > > > On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise <t...@apache.org > <mailto:t...@apache.org>> wrote: > > Hi, > > Currently the portable Flink runner only works with SDK > Docker containers for execution (DockerJobBundleFactory, > besides an in-process (embedded) factory option for testing > [1]). I'm considering adding another out of process > JobBundleFactory implementation that directly forks the > processes on the task manager host, eliminating the need for > Docker. This would work reasonably well in environments > where the dependencies (in this case Python) can easily be > tied into the host deployment (also within an application > specific Kubernetes pod). > > There was already some discussion about alternative > JobBundleFactory implementation in [2]. There is also a JIRA > to make the bundle factory pluggable [3], pending > availability of runner level options. > > For a "ProcessBundleFactory", in addition to the Python > dependencies the environment would also need to have the Go > boot executable [4] (or a substitute thereof) to perform the > harness initialization. > > Is anyone else interested in this SDK execution option or > has already investigated an alternative implementation? > > Thanks, > Thomas > > [1] > > https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83 > > [2] > > https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E > > [3] https://issues.apache.org/jira/browse/BEAM-4819 > > [4] > https://github.com/apache/beam/blob/master/sdks/python/container/boot.go >