By "enum" in quotes, I meant the usual open URN style pattern not an actual enum. Sorry if that wasn't clear.
On Tue, Aug 21, 2018 at 11:51 AM Lukasz Cwik <[email protected]> wrote: > I would model the environment to be more free form then enums such that we > have forward looking extensibility and would suggest to follow the same > pattern we use on PTransforms (using an URN and a URN specific payload). > Note that in this case we may want to support a list of supported > environments (e.g. java, docker, python, ...). > > On Tue, Aug 21, 2018 at 10:37 AM Henning Rohde <[email protected]> wrote: > >> One thing to consider that we've talked about in the past. It might make >> sense to extend the environment proto and have the SDK be explicit about >> which kinds of environment it supports: >> >> >> https://github.com/apache/beam/blob/8c4f4babc0b0d55e7bddefa3f9f9ba65d21ef139/model/pipeline/src/main/proto/beam_runner_api.proto#L969 >> >> This choice might impact what files are staged or what not. Some SDKs, >> such as Go, provide a compiled binary and _need_ to know what the target >> architecture is. Running on a mac with and without docker, say, requires a >> different worker in each case. If we add an "enum", we can also easily add >> the external idea where the SDK/user starts the SDK harnesses instead of >> the runner. Each runner may not support all types of environments. >> >> Henning >> >> On Tue, Aug 21, 2018 at 2:52 AM Maximilian Michels <[email protected]> >> wrote: >> >>> For reference, here is corresponding JIRA issue for this thread: >>> https://issues.apache.org/jira/browse/BEAM-5187 >>> >>> On 16.08.18 11:15, Maximilian Michels wrote: >>> > Makes sense to have an option to run the SDK harness in a >>> non-dockerized >>> > environment. >>> > >>> > I'm in the process of creating a Docker entry point for Flink's >>> > JobServer[1]. I suppose you would also prefer to execute that one >>> > standalone. We should make sure this is also an option. >>> > >>> > [1] https://issues.apache.org/jira/browse/BEAM-4130 >>> > >>> > On 16.08.18 07:42, Thomas Weise wrote: >>> >> Yes, that's the proposal. Everything that would otherwise be packaged >>> >> into the Docker container would need to be pre-installed in the host >>> >> environment. In the case of Python SDK, that could simply mean a >>> >> (frozen) virtual environment that was setup when the host was >>> >> provisioned - the SDK harness process(es) will then just utilize that. >>> >> Of course this flavor of SDK harness execution could also be useful in >>> >> the local development environment, where right now someone who already >>> >> has the Python environment needs to also install Docker and update a >>> >> container to launch a Python SDK pipeline on the Flink runner. >>> >> >>> >> On Wed, Aug 15, 2018 at 12:40 PM Daniel Oliveira < >>> [email protected] >>> >> <mailto:[email protected]>> wrote: >>> >> >>> >> I just want to clarify that I understand this correctly since I'm >>> >> not that familiar with the details behind all these execution >>> >> environments yet. Is the proposal to create a new >>> JobBundleFactory >>> >> that instead of using Docker to create the environment that the >>> new >>> >> processes will execute in, this JobBundleFactory would execute >>> the >>> >> new processes directly in the host environment? So in practice >>> if I >>> >> ran a pipeline with this JobBundleFactory the SDK Harness and >>> Runner >>> >> Harness would both be executing directly on my machine and would >>> >> depend on me having the dependencies already present on my >>> machine? >>> >> >>> >> On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka <[email protected] >>> >> <mailto:[email protected]>> wrote: >>> >> >>> >> Thanks for starting the discussion. I will be happy to help. >>> >> I agree, we should have pluggable SDKHarness environment >>> Factory. >>> >> We can register multiple Environment factory using service >>> >> registry and use the PipelineOption to pick the right one on >>> per >>> >> job basis. >>> >> >>> >> There are a couple of things which are require to setup >>> before >>> >> launching the process. >>> >> >>> >> * Setting up the environment as done in boot.go [4] >>> >> * Retrieving and putting the artifacts in the right >>> location. >>> >> >>> >> You can probably leverage boot.go code to setup the >>> environment. >>> >> >>> >> Also, it will be useful to enumerate pros and cons of >>> different >>> >> Environments to help users choose the right one. >>> >> >>> >> >>> >> On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise <[email protected] >>> >> <mailto:[email protected]>> wrote: >>> >> >>> >> Hi, >>> >> >>> >> Currently the portable Flink runner only works with SDK >>> >> Docker containers for execution (DockerJobBundleFactory, >>> >> besides an in-process (embedded) factory option for >>> testing >>> >> [1]). I'm considering adding another out of process >>> >> JobBundleFactory implementation that directly forks the >>> >> processes on the task manager host, eliminating the need >>> for >>> >> Docker. This would work reasonably well in environments >>> >> where the dependencies (in this case Python) can easily >>> be >>> >> tied into the host deployment (also within an application >>> >> specific Kubernetes pod). >>> >> >>> >> There was already some discussion about alternative >>> >> JobBundleFactory implementation in [2]. There is also a >>> JIRA >>> >> to make the bundle factory pluggable [3], pending >>> >> availability of runner level options. >>> >> >>> >> For a "ProcessBundleFactory", in addition to the Python >>> >> dependencies the environment would also need to have the >>> Go >>> >> boot executable [4] (or a substitute thereof) to perform >>> the >>> >> harness initialization. >>> >> >>> >> Is anyone else interested in this SDK execution option or >>> >> has already investigated an alternative implementation? >>> >> >>> >> Thanks, >>> >> Thomas >>> >> >>> >> [1] >>> >> >>> https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83 >>> >> >>> >> [2] >>> >> >>> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E >>> >> >>> >> [3] https://issues.apache.org/jira/browse/BEAM-4819 >>> >> >>> >> [4] >>> https://github.com/apache/beam/blob/master/sdks/python/container/boot.go >>> >> >>> >>> -- >>> Max >>> >>
