For reference, here is corresponding JIRA issue for this thread:
https://issues.apache.org/jira/browse/BEAM-5187
On 16.08.18 11:15, Maximilian Michels wrote:
Makes sense to have an option to run the SDK harness in a non-dockerized
environment.
I'm in the process of creating a Docker entry point for Flink's
JobServer[1]. I suppose you would also prefer to execute that one
standalone. We should make sure this is also an option.
[1] https://issues.apache.org/jira/browse/BEAM-4130
On 16.08.18 07:42, Thomas Weise wrote:
Yes, that's the proposal. Everything that would otherwise be packaged
into the Docker container would need to be pre-installed in the host
environment. In the case of Python SDK, that could simply mean a
(frozen) virtual environment that was setup when the host was
provisioned - the SDK harness process(es) will then just utilize that.
Of course this flavor of SDK harness execution could also be useful in
the local development environment, where right now someone who already
has the Python environment needs to also install Docker and update a
container to launch a Python SDK pipeline on the Flink runner.
On Wed, Aug 15, 2018 at 12:40 PM Daniel Oliveira <danolive...@google.com
<mailto:danolive...@google.com>> wrote:
I just want to clarify that I understand this correctly since I'm
not that familiar with the details behind all these execution
environments yet. Is the proposal to create a new JobBundleFactory
that instead of using Docker to create the environment that the new
processes will execute in, this JobBundleFactory would execute the
new processes directly in the host environment? So in practice if I
ran a pipeline with this JobBundleFactory the SDK Harness and Runner
Harness would both be executing directly on my machine and would
depend on me having the dependencies already present on my machine?
On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka <goe...@google.com
<mailto:goe...@google.com>> wrote:
Thanks for starting the discussion. I will be happy to help.
I agree, we should have pluggable SDKHarness environment Factory.
We can register multiple Environment factory using service
registry and use the PipelineOption to pick the right one on per
job basis.
There are a couple of things which are require to setup before
launching the process.
* Setting up the environment as done in boot.go [4]
* Retrieving and putting the artifacts in the right location.
You can probably leverage boot.go code to setup the environment.
Also, it will be useful to enumerate pros and cons of different
Environments to help users choose the right one.
On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise <t...@apache.org
<mailto:t...@apache.org>> wrote:
Hi,
Currently the portable Flink runner only works with SDK
Docker containers for execution (DockerJobBundleFactory,
besides an in-process (embedded) factory option for testing
[1]). I'm considering adding another out of process
JobBundleFactory implementation that directly forks the
processes on the task manager host, eliminating the need for
Docker. This would work reasonably well in environments
where the dependencies (in this case Python) can easily be
tied into the host deployment (also within an application
specific Kubernetes pod).
There was already some discussion about alternative
JobBundleFactory implementation in [2]. There is also a JIRA
to make the bundle factory pluggable [3], pending
availability of runner level options.
For a "ProcessBundleFactory", in addition to the Python
dependencies the environment would also need to have the Go
boot executable [4] (or a substitute thereof) to perform the
harness initialization.
Is anyone else interested in this SDK execution option or
has already investigated an alternative implementation?
Thanks,
Thomas
[1]
https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83
[2]
https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
[3] https://issues.apache.org/jira/browse/BEAM-4819
[4]
https://github.com/apache/beam/blob/master/sdks/python/container/boot.go
--
Max