Hi Can,

Like GreedyPipelineFuser, we have added many more components which makes
building a Portable Runner easy. Here is a link [1] to slides which
explains at a very high level what is needed to add a new portable runner.
Still adding a portable runner will be more complex than adding a native
runner but with these components it should be relatively easier than
originally expected.

[1]
https://docs.google.com/presentation/d/1JRNUSpOC8qaA4uLDuyGsuuyf6Tk8Xi9LAukhgl-hT_w/edit?usp=sharing

Thanks,
Ankur

On Wed, Mar 20, 2019 at 7:19 AM Maximilian Michels <[email protected]> wrote:

> Documentation on portability is still a bit sparse although there are
> many design documents:
> https://beam.apache.org/contribute/design-documents/#portability
>
> The structure of portable Runners is not fundamentally different, but
> some of the operations are deferred to the SDK which runs code for all
> supported languages. The Runner needs to provide an integration with it.
>
> Eventually, the old Runners will become obsolete though that won't
> happen very soon. Performance should be slightly better on the old Runners.
>
> I think writing an old-style Runner now will give you enough experience
> to port it to the new language-portable style later on.
>
> Cheers,
> Max
>
> On 20.03.19 14:52, Can Gencer wrote:
> > I had a look at "GreedyPipelineFuser" and indeed this was what exactly I
> > was talking about.
> >
> > Is https://beam.apache.org/roadmap/portability/ still the best
> > information about the portable runners or is there a more in-depth guide
> > available anywhere?
> >
> > On Wed, Mar 20, 2019 at 2:29 PM Can Gencer <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi Max,
> >
> >     Thanks. When you mean "old-style runner"  is this meant that this
> >     style of runners will be obsolete and only the portable one will be
> >     supported? The documentation for portable runners wasn't quite
> >     complete and the barrier to entry for writing an old style runner
> >     seemed easier for us and the old style runner should have better
> >     performance?
> >
> >     On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         Hi Can,
> >
> >         Thanks for the update. Interesting question. Flink has an
> >         optimization
> >         built in called chaining which works together nicely with Beam.
> >         Essentially, operators which share the same partitioning get
> >         executed
> >         one after another inside a master operator. This saves resources.
> >
> >         Interestingly, Beam's Fuser for portable Runners does something
> >         similar.
> >         AFAIK there is no built-in solution for the old-style Runners. I
> >         think
> >         it would be possible to build something like this on top of the
> >         existing
> >         translation.
> >
> >         Cheers,
> >         Max
> >
> >         On 20.03.19 13:07, Can Gencer wrote:
> >          > Hi again,
> >          >
> >          > We've made some progress on the runner since writing this
> >         more than a
> >          > month ago, the repo is available here publicly:
> >          > https://github.com/hazelcast/hazelcast-jet-beam-runner
> >          >
> >          > Still very much a work in progress though. One of the issues
> >         I wanted to
> >          > raise is that currently we're translating each PTransform to
> >         a Jet
> >          > Vertex (could be consider analogous to a Flink operator or a
> >         vertex in
> >          > Tez). This is sub-optimal, since Beam creates lots of
> >         transforms for
> >          > computations that could be performed inside the same Vertex,
> >         such as
> >          > subsequent mapping transforms and many others. Ideally you
> >         only need
> >          > distinct vertices where the data is re-partitioned and/or
> >         shuffled. I'm
> >          > curious if Beam offers some way of translating the PTransform
> >         graph to a
> >          > more minimal set of transforms, i.e. some kind of planner or
> >         would this
> >          > have to be custom code? We've done a similar integration with
> >         Cascading
> >          > in the past and it offered a planner which given a set of
> >         rules would
> >          > partition the Cascading DAG into a minimal set of vertices
> >         for the same
> >          > DAG. Curious if Beam has any similar functionality?
> >          >
> >          >
> >          >
> >          > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles
> >         <[email protected] <mailto:[email protected]>
> >          > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >          >
> >          >     Elaborating on what Robert alluded to: when I wrote that
> >         runner
> >          >     author guide, portability was in its infancy. Now Beam
> >         Python can be
> >          >     run on Flink. So that guide is primarily focused on the
> >         "deserialize
> >          >     a Java DoFn and call its methods" approach. A decent
> >         amount of it is
> >          >     still really important to know, but is now the
> >         responsibility of the
> >          >     "SDK harness", aka language-specific coprocessor. For
> >         Python & Go &
> >          >     <insert new SDK language here> you really want to use the
> >          >     portability protos and the portable Flink runner is the
> >         best model.
> >          >
> >          >     Kenn
> >          >
> >          >
> >          >     On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw
> >         <[email protected] <mailto:[email protected]>
> >          >     <mailto:[email protected]
> >         <mailto:[email protected]>>> wrote:
> >          >
> >          >         On Fri, Feb 15, 2019 at 7:36 AM Can Gencer
> >         <[email protected] <mailto:[email protected]>
> >          >         <mailto:[email protected]
> >         <mailto:[email protected]>>> wrote:
> >          >          >
> >          >          > We at Hazelcast are looking into writing a Beam
> >         runner for
> >          >         Hazelcast Jet
> >         (https://github.com/hazelcast/hazelcast-jet). I
> >          >         wanted to introduce myself as we'll likely have
> >         questions as we
> >          >         start development.
> >          >
> >          >         Welcome!
> >          >
> >          >         Hazelcast looks interesting, a Beam runner for it
> >         would be very
> >          >         cool.
> >          >
> >          >          > Some of the things I'm wondering about currently:
> >          >          >
> >          >          > * Currently there seems to be a guide available at
> >          > https://beam.apache.org/contribute/runner-guide/ , is this
> up to
> >          >         date? Is there anything in specific to be aware of
> >         when starting
> >          >         with a new runner that's not covered here?
> >          >
> >          >         That looks like a pretty good starting point. At a
> >         quick glance, I
> >          >         don't see anything that looks out of date. Another
> >         resource that
> >          >         might
> >          >         be helpful is a talk from last year on writing an SDK
> >         (but as it
> >          >         mostly covers the runner-sdk interaction, it's also
> >         quite useful for
> >          >         understanding the runner side:
> >          >
> >
> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
> >          >         And please feel free to ask any questions on this
> >         list as well; we'd
> >          >         be happy to help.
> >          >
> >          >          > * Should we be targeting the latest master which
> is at
> >          >         2.12-SNAPSHOT or a stable version?
> >          >
> >          >         I would target the latest master.
> >          >
> >          >          > * After a runner is developed, how is the
> maintenance
> >          >         typically handled, as the runners seems to be part of
> >         Beam codebase?
> >          >
> >          >         Either is possible. Several runner adapters are part
> >         of the Beam
> >          >         codebase, but for example the IMB Streams Beam runner
> >         is not. There
> >          >         are certainly pros and cons (certainly early on when
> >         the APIs
> >          >         themselves were under heavy development it was easier
> >         to keep things
> >          >         in sync in the same codebase, but things have mostly
> >         stabilized
> >          >         now).
> >          >         A runner only becomes part of the Beam codebase if
> >         there are members
> >          >         of the community committed to maintaining it (which
> >         could include
> >          >         you). Both approaches are fine.
> >          >
> >          >         - Robert
> >          >
> >
>

Reply via email to