I just want to clarify that I understand this correctly since I'm not that
familiar with the details behind all these execution environments yet. Is
the proposal to create a new JobBundleFactory that instead of using Docker
to create the environment that the new processes will execute in, this
JobBundleFactory would execute the new processes directly in the host
environment? So in practice if I ran a pipeline with this JobBundleFactory
the SDK Harness and Runner Harness would both be executing directly on my
machine and would depend on me having the dependencies already present on
my machine?

On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka <goe...@google.com> wrote:

> Thanks for starting the discussion. I will be happy to help.
> I agree, we should have pluggable SDKHarness environment Factory.
> We can register multiple Environment factory using service registry and
> use the PipelineOption to pick the right one on per job basis.
>
> There are a couple of things which are require to setup before launching
> the process.
>
>    - Setting up the environment as done in boot.go [4]
>    - Retrieving and putting the artifacts in the right location.
>
> You can probably leverage boot.go code to setup the environment.
>
> Also, it will be useful to enumerate pros and cons of different
> Environments to help users choose the right one.
>
>
> On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise <t...@apache.org> wrote:
>
>> Hi,
>>
>> Currently the portable Flink runner only works with SDK Docker containers
>> for execution (DockerJobBundleFactory, besides an in-process (embedded)
>> factory option for testing [1]). I'm considering adding another out of
>> process JobBundleFactory implementation that directly forks the processes
>> on the task manager host, eliminating the need for Docker. This would work
>> reasonably well in environments where the dependencies (in this case
>> Python) can easily be tied into the host deployment (also within an
>> application specific Kubernetes pod).
>>
>> There was already some discussion about alternative JobBundleFactory
>> implementation in [2]. There is also a JIRA to make the bundle factory
>> pluggable [3], pending availability of runner level options.
>>
>> For a "ProcessBundleFactory", in addition to the Python dependencies the
>> environment would also need to have the Go boot executable [4] (or a
>> substitute thereof) to perform the harness initialization.
>>
>> Is anyone else interested in this SDK execution option or has already
>> investigated an alternative implementation?
>>
>> Thanks,
>> Thomas
>>
>> [1]
>> https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83
>>
>> [2]
>> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
>>
>> [3] https://issues.apache.org/jira/browse/BEAM-4819
>>
>> [4]
>> https://github.com/apache/beam/blob/master/sdks/python/container/boot.go
>>
>>

Reply via email to