For reference, here is corresponding JIRA issue for this thread: https://issues.apache.org/jira/browse/BEAM-5187

On 16.08.18 11:15, Maximilian Michels wrote:
Makes sense to have an option to run the SDK harness in a non-dockerized
environment.

I'm in the process of creating a Docker entry point for Flink's
JobServer[1]. I suppose you would also prefer to execute that one
standalone. We should make sure this is also an option.

[1] https://issues.apache.org/jira/browse/BEAM-4130

On 16.08.18 07:42, Thomas Weise wrote:
Yes, that's the proposal. Everything that would otherwise be packaged
into the Docker container would need to be pre-installed in the host
environment. In the case of Python SDK, that could simply mean a
(frozen) virtual environment that was setup when the host was
provisioned - the SDK harness process(es) will then just utilize that.
Of course this flavor of SDK harness execution could also be useful in
the local development environment, where right now someone who already
has the Python environment needs to also install Docker and update a
container to launch a Python SDK pipeline on the Flink runner.

On Wed, Aug 15, 2018 at 12:40 PM Daniel Oliveira <danolive...@google.com
<mailto:danolive...@google.com>> wrote:

     I just want to clarify that I understand this correctly since I'm
     not that familiar with the details behind all these execution
     environments yet. Is the proposal to create a new JobBundleFactory
     that instead of using Docker to create the environment that the new
     processes will execute in, this JobBundleFactory would execute the
     new processes directly in the host environment? So in practice if I
     ran a pipeline with this JobBundleFactory the SDK Harness and Runner
     Harness would both be executing directly on my machine and would
     depend on me having the dependencies already present on my machine?

     On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka <goe...@google.com
     <mailto:goe...@google.com>> wrote:

         Thanks for starting the discussion. I will be happy to help.
         I agree, we should have pluggable SDKHarness environment Factory.
         We can register multiple Environment factory using service
         registry and use the PipelineOption to pick the right one on per
         job basis.

         There are a couple of things which are require to setup before
         launching the process.

           * Setting up the environment as done in boot.go [4]
           * Retrieving and putting the artifacts in the right location.

         You can probably leverage boot.go code to setup the environment.

         Also, it will be useful to enumerate pros and cons of different
         Environments to help users choose the right one.


         On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise <t...@apache.org
         <mailto:t...@apache.org>> wrote:

             Hi,

             Currently the portable Flink runner only works with SDK
             Docker containers for execution (DockerJobBundleFactory,
             besides an in-process (embedded) factory option for testing
             [1]). I'm considering adding another out of process
             JobBundleFactory implementation that directly forks the
             processes on the task manager host, eliminating the need for
             Docker. This would work reasonably well in environments
             where the dependencies (in this case Python) can easily be
             tied into the host deployment (also within an application
             specific Kubernetes pod).

             There was already some discussion about alternative
             JobBundleFactory implementation in [2]. There is also a JIRA
             to make the bundle factory pluggable [3], pending
             availability of runner level options.

             For a "ProcessBundleFactory", in addition to the Python
             dependencies the environment would also need to have the Go
             boot executable [4] (or a substitute thereof) to perform the
             harness initialization.

             Is anyone else interested in this SDK execution option or
             has already investigated an alternative implementation?

             Thanks,
             Thomas

             [1]
             
https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83

             [2]
             
https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E

             [3] https://issues.apache.org/jira/browse/BEAM-4819

             [4] 
https://github.com/apache/beam/blob/master/sdks/python/container/boot.go


--
Max

Reply via email to