Re: Beam Summit community feedback

Matthias Baetens Tue, 16 Oct 2018 13:49:38 -0700

Hey Max,

Great stuff, thank you for sharing this.
In case anyone has feedback on the summit as a whole, please feel free to
fill out the survey <https://goo.gl/forms/Oka3kicBrFyUXEvp1> as well.


Thank you!
Best regards,
Matthias

On Tue, 9 Oct 2018 at 10:48 Maximilian Michels <m...@apache.org> wrote:

> Thanks for the pointer to the thread. I didn't know there already had
> been a discussion. It is possible to look at Kubernetes support solely
> from a Runner perspective, still we have to provide the basic knobs in
> Beam to make deployment easy.
>
> The approach Henning described here and in the thread (Approach 2:
>
> https://lists.apache.org/thread.html/209ddf4d701c8c915e3b411e99773f491a6cd830807d636b470000e8@%3Cdev.beam.apache.org%3E)
>
> where the backend and the SDK harness are started concurrently with
> fixed endpoints would be the way to go. In the Proto we already have the
> "EXTERNAL" environment for that.
>
> On 08.10.18 20:18, Thomas Weise wrote:
> > Related thread:
> >
> >
> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
> >
> > Kubernetes is otherwise more of a runner deployment concern. There are
> > efforts in the Flink community underway to make deployment on Kubernetes
> > easier.
> >
> > Max: thanks for taking notes!
> >
> >
> > On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <hero...@google.com
> > <mailto:hero...@google.com>> wrote:
> >
> >     Regarding the Kubernetes/Docker story: the current idea for that
> >     setup is to use a per-job pod for the user/sdk containers + runner
> >     container, so that running (and scaling) a job will go with the
> >     grain of that ecosystem. The Beam code on each worker thus wouldn't
> >     do any container management. This is also how Dataflow essentially
> >     works. The process-based option assumes that the runner environment
> >     is what the SDK needs, which is generally not the case.
> >
> >     Henning
> >
> >     On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <a...@vanboxel.be
> >     <mailto:a...@vanboxel.be>> wrote:
> >
> >         Hey Max, I've build quit some experience with *Kubernetes* over
> >         the years. The problem you describe seems like a custom operator
> >         story. The thing is I don't know enough of the runner and
> >         bootstrapping story. After the summit I'm quite eager to dive
> >         into a beam problem, so if you like to collaborate on that topic
> >         let me know.
> >
> >           _/
> >         _/ Alex Van Boxel
> >
> >
> >         On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels
> >         <m...@apache.org <mailto:m...@apache.org>> wrote:
> >
> >             Hi,
> >
> >             What do you think about collecting some of the feedback from
> >             the
> >             community at Beam Summit last week? Here's what I've come
> >             across:
> >
> >
> >             * The Kubernetes / Docker Story
> >
> >             Multiple users reported that they would like a
> >             Beam-Kubernetes story.
> >             What is the best way to deploy Beam with Kubernetes? Will
> >             there be
> >             built-in support?
> >
> >             Especially with regards to the portability, there are some
> >             unsolved
> >             problems, e.g. how to start Beam containerized and bootstrap
> >             the SDK
> >             Harness container from within a container? For local testing
> >             with the
> >             JobServer we support that via mounting the Docker socket,
> >             but this will
> >             be too fragile in production scenarios. Now that we have
> >             process-based
> >             execution, we could just use that inside the main container.
> >
> >             Deployment is a very important topic for users and we should
> >             try to
> >             reduce complexity as much as possible.
> >
> >             * External SDKs / Scio
> >
> >             Users have asked why Scio is not part of the main
> >             repository. Generally,
> >             I don't think that has to be the case, same for the Runners
> >             which are
> >             not part of the main repo. However, it does raise the
> >             question, what
> >             will be the future model for maintaining SDKs/IOs/Runners?
> >             How do we
> >             ensure easy development and a consistent quality of
> >             internal/external
> >             components?
> >
> >             * Documenting Timers & State
> >
> >             These two have excellent blog posts but are not part of the
> >             official
> >             documentation. Since they are part of the model, it would be
> >             good to
> >             eventually update the docs.
> >
> >             * Better Debuggability of pipelines
> >
> >             Even a simple WordCount in Beam leads to a quite complex
> >             Flink execution
> >             graph (due to the the involved I/O logic). How can we make
> >             pipelines
> >             easier to understand? Will we provide a way to visualize the
> >             architecture of high-level Beam pipelines? If so, do we
> >             provide a way to
> >             gain insight into how it is mapped to the Runner execution
> >             model? Users
> >             would like to have more insight.
> >
> >             * Current Roadmap
> >
> >             This was asked in the context of portability. By the end of
> >             the year we
> >             should have at least the FlinkRunner in a ready state, with
> >             the rest
> >             following up. There are a lot of others threads in Beam. The
> >             newsletter
> >             is a great way to keep up with the project development.
> >
> >
> >             Looking forward to any other points you might have.
> >
> >             Best,
> >             Max
> >
>
--

Re: Beam Summit community feedback

Reply via email to