Makes sense to have an option to run the SDK harness in a non-dockerized
environment.

I'm in the process of creating a Docker entry point for Flink's
JobServer[1]. I suppose you would also prefer to execute that one
standalone. We should make sure this is also an option.

[1] https://issues.apache.org/jira/browse/BEAM-4130

On 16.08.18 07:42, Thomas Weise wrote:
> Yes, that's the proposal. Everything that would otherwise be packaged
> into the Docker container would need to be pre-installed in the host
> environment. In the case of Python SDK, that could simply mean a
> (frozen) virtual environment that was setup when the host was
> provisioned - the SDK harness process(es) will then just utilize that.
> Of course this flavor of SDK harness execution could also be useful in
> the local development environment, where right now someone who already
> has the Python environment needs to also install Docker and update a
> container to launch a Python SDK pipeline on the Flink runner.
> 
> On Wed, Aug 15, 2018 at 12:40 PM Daniel Oliveira <danolive...@google.com
> <mailto:danolive...@google.com>> wrote:
> 
>     I just want to clarify that I understand this correctly since I'm
>     not that familiar with the details behind all these execution
>     environments yet. Is the proposal to create a new JobBundleFactory
>     that instead of using Docker to create the environment that the new
>     processes will execute in, this JobBundleFactory would execute the
>     new processes directly in the host environment? So in practice if I
>     ran a pipeline with this JobBundleFactory the SDK Harness and Runner
>     Harness would both be executing directly on my machine and would
>     depend on me having the dependencies already present on my machine?
> 
>     On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka <goe...@google.com
>     <mailto:goe...@google.com>> wrote:
> 
>         Thanks for starting the discussion. I will be happy to help.
>         I agree, we should have pluggable SDKHarness environment Factory.
>         We can register multiple Environment factory using service
>         registry and use the PipelineOption to pick the right one on per
>         job basis.
> 
>         There are a couple of things which are require to setup before
>         launching the process.
> 
>           * Setting up the environment as done in boot.go [4]
>           * Retrieving and putting the artifacts in the right location.
> 
>         You can probably leverage boot.go code to setup the environment.
> 
>         Also, it will be useful to enumerate pros and cons of different
>         Environments to help users choose the right one.
> 
> 
>         On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise <t...@apache.org
>         <mailto:t...@apache.org>> wrote:
> 
>             Hi,
> 
>             Currently the portable Flink runner only works with SDK
>             Docker containers for execution (DockerJobBundleFactory,
>             besides an in-process (embedded) factory option for testing
>             [1]). I'm considering adding another out of process
>             JobBundleFactory implementation that directly forks the
>             processes on the task manager host, eliminating the need for
>             Docker. This would work reasonably well in environments
>             where the dependencies (in this case Python) can easily be
>             tied into the host deployment (also within an application
>             specific Kubernetes pod).
> 
>             There was already some discussion about alternative
>             JobBundleFactory implementation in [2]. There is also a JIRA
>             to make the bundle factory pluggable [3], pending
>             availability of runner level options.
> 
>             For a "ProcessBundleFactory", in addition to the Python
>             dependencies the environment would also need to have the Go
>             boot executable [4] (or a substitute thereof) to perform the
>             harness initialization.
> 
>             Is anyone else interested in this SDK execution option or
>             has already investigated an alternative implementation?
> 
>             Thanks,
>             Thomas
> 
>             [1]
>             
> https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83
> 
>             [2]
>             
> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
> 
>             [3] https://issues.apache.org/jira/browse/BEAM-4819
> 
>             [4] 
> https://github.com/apache/beam/blob/master/sdks/python/container/boot.go
> 

Reply via email to