Re: Beam Summit community feedback

Henning Rohde Mon, 08 Oct 2018 10:43:40 -0700

Regarding the Kubernetes/Docker story: the current idea for that setup is
to use a per-job pod for the user/sdk containers + runner container, so
that running (and scaling) a job will go with the grain of that ecosystem.
The Beam code on each worker thus wouldn't do any container management.
This is also how Dataflow essentially works. The process-based option
assumes that the runner environment is what the SDK needs, which is
generally not the case.


Henning

On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <a...@vanboxel.be> wrote:

> Hey Max, I've build quit some experience with *Kubernetes* over the
> years. The problem you describe seems like a custom operator story. The
> thing is I don't know enough of the runner and bootstrapping story. After
> the summit I'm quite eager to dive into a beam problem, so if you like to
> collaborate on that topic let me know.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <m...@apache.org> wrote:
>
>> Hi,
>>
>> What do you think about collecting some of the feedback from the
>> community at Beam Summit last week? Here's what I've come across:
>>
>>
>> * The Kubernetes / Docker Story
>>
>> Multiple users reported that they would like a Beam-Kubernetes story.
>> What is the best way to deploy Beam with Kubernetes? Will there be
>> built-in support?
>>
>> Especially with regards to the portability, there are some unsolved
>> problems, e.g. how to start Beam containerized and bootstrap the SDK
>> Harness container from within a container? For local testing with the
>> JobServer we support that via mounting the Docker socket, but this will
>> be too fragile in production scenarios. Now that we have process-based
>> execution, we could just use that inside the main container.
>>
>> Deployment is a very important topic for users and we should try to
>> reduce complexity as much as possible.
>>
>> * External SDKs / Scio
>>
>> Users have asked why Scio is not part of the main repository. Generally,
>> I don't think that has to be the case, same for the Runners which are
>> not part of the main repo. However, it does raise the question, what
>> will be the future model for maintaining SDKs/IOs/Runners? How do we
>> ensure easy development and a consistent quality of internal/external
>> components?
>>
>> * Documenting Timers & State
>>
>> These two have excellent blog posts but are not part of the official
>> documentation. Since they are part of the model, it would be good to
>> eventually update the docs.
>>
>> * Better Debuggability of pipelines
>>
>> Even a simple WordCount in Beam leads to a quite complex Flink execution
>> graph (due to the the involved I/O logic). How can we make pipelines
>> easier to understand? Will we provide a way to visualize the
>> architecture of high-level Beam pipelines? If so, do we provide a way to
>> gain insight into how it is mapped to the Runner execution model? Users
>> would like to have more insight.
>>
>> * Current Roadmap
>>
>> This was asked in the context of portability. By the end of the year we
>> should have at least the FlinkRunner in a ready state, with the rest
>> following up. There are a lot of others threads in Beam. The newsletter
>> is a great way to keep up with the project development.
>>
>>
>> Looking forward to any other points you might have.
>>
>> Best,
>> Max
>>
>

Re: Beam Summit community feedback

Reply via email to