Regarding the Kubernetes/Docker story: the current idea for that setup is to use a per-job pod for the user/sdk containers + runner container, so that running (and scaling) a job will go with the grain of that ecosystem. The Beam code on each worker thus wouldn't do any container management. This is also how Dataflow essentially works. The process-based option assumes that the runner environment is what the SDK needs, which is generally not the case.
Henning On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <a...@vanboxel.be> wrote: > Hey Max, I've build quit some experience with *Kubernetes* over the > years. The problem you describe seems like a custom operator story. The > thing is I don't know enough of the runner and bootstrapping story. After > the summit I'm quite eager to dive into a beam problem, so if you like to > collaborate on that topic let me know. > > _/ > _/ Alex Van Boxel > > > On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <m...@apache.org> wrote: > >> Hi, >> >> What do you think about collecting some of the feedback from the >> community at Beam Summit last week? Here's what I've come across: >> >> >> * The Kubernetes / Docker Story >> >> Multiple users reported that they would like a Beam-Kubernetes story. >> What is the best way to deploy Beam with Kubernetes? Will there be >> built-in support? >> >> Especially with regards to the portability, there are some unsolved >> problems, e.g. how to start Beam containerized and bootstrap the SDK >> Harness container from within a container? For local testing with the >> JobServer we support that via mounting the Docker socket, but this will >> be too fragile in production scenarios. Now that we have process-based >> execution, we could just use that inside the main container. >> >> Deployment is a very important topic for users and we should try to >> reduce complexity as much as possible. >> >> * External SDKs / Scio >> >> Users have asked why Scio is not part of the main repository. Generally, >> I don't think that has to be the case, same for the Runners which are >> not part of the main repo. However, it does raise the question, what >> will be the future model for maintaining SDKs/IOs/Runners? How do we >> ensure easy development and a consistent quality of internal/external >> components? >> >> * Documenting Timers & State >> >> These two have excellent blog posts but are not part of the official >> documentation. Since they are part of the model, it would be good to >> eventually update the docs. >> >> * Better Debuggability of pipelines >> >> Even a simple WordCount in Beam leads to a quite complex Flink execution >> graph (due to the the involved I/O logic). How can we make pipelines >> easier to understand? Will we provide a way to visualize the >> architecture of high-level Beam pipelines? If so, do we provide a way to >> gain insight into how it is mapped to the Runner execution model? Users >> would like to have more insight. >> >> * Current Roadmap >> >> This was asked in the context of portability. By the end of the year we >> should have at least the FlinkRunner in a ready state, with the rest >> following up. There are a lot of others threads in Beam. The newsletter >> is a great way to keep up with the project development. >> >> >> Looking forward to any other points you might have. >> >> Best, >> Max >> >