Hi everyone,

I wanted to get your opinion on the Job-Server startup [1] which is part
of the portability story.

I've created a docker container to bring up Beam's Job Server, which is
the entry point for pipeline execution. Generally, this works fine when
the backend (Flink in this case) runs externally and the Job Server
connects to it.

For tests or pipeline development we may want the backend to run
embedded (inside the Job Server) which is rather problematic because the
portability requires to spin up the SDK harness in a Docker container as
well. This would happen at runtime inside the Docker container.

Since Docker inside Docker is not desirable I'm thinking about other
options:

Option 1) Instead of a Docker container, we start a bundled Job-Server
binary (or jar) when we run the pipeline. The bundle also contains an
embedded variant of the backend. For Flink, this is basically the output
of `:beam-runners-flink_2.11-job-server:shadowJar` but it is started
during pipeline execution.

Option 2) In addition to the Job Server, we let the SDK spin up another
Docker container with the backend. This is may be most applicable to all
types of backends since not all backends offer an embedded execution mode.


Keep in mind that this is only a problem for local/test execution but it
is an important aspect of Beam's usability.

What do you think? I'm leaning towards option 2. Maybe you have other
options in mind.

Cheers,
Max

[1] https://issues.apache.org/jira/browse/BEAM-4130

Reply via email to