Re: Hazelcast Jet Runner

Can Gencer Wed, 20 Mar 2019 06:30:05 -0700

Hi Max,

Thanks. When you mean "old-style runner"  is this meant that this style of
runners will be obsolete and only the portable one will be supported? The
documentation for portable runners wasn't quite complete and the barrier to
entry for writing an old style runner seemed easier for us and the old
style runner should have better performance?


On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <[email protected]> wrote:

> Hi Can,
>
> Thanks for the update. Interesting question. Flink has an optimization
> built in called chaining which works together nicely with Beam.
> Essentially, operators which share the same partitioning get executed
> one after another inside a master operator. This saves resources.
>
> Interestingly, Beam's Fuser for portable Runners does something similar.
> AFAIK there is no built-in solution for the old-style Runners. I think
> it would be possible to build something like this on top of the existing
> translation.
>
> Cheers,
> Max
>
> On 20.03.19 13:07, Can Gencer wrote:
> > Hi again,
> >
> > We've made some progress on the runner since writing this more than a
> > month ago, the repo is available here publicly:
> > https://github.com/hazelcast/hazelcast-jet-beam-runner
> >
> > Still very much a work in progress though. One of the issues I wanted to
> > raise is that currently we're translating each PTransform to a Jet
> > Vertex (could be consider analogous to a Flink operator or a vertex in
> > Tez). This is sub-optimal, since Beam creates lots of transforms for
> > computations that could be performed inside the same Vertex, such as
> > subsequent mapping transforms and many others. Ideally you only need
> > distinct vertices where the data is re-partitioned and/or shuffled. I'm
> > curious if Beam offers some way of translating the PTransform graph to a
> > more minimal set of transforms, i.e. some kind of planner or would this
> > have to be custom code? We've done a similar integration with Cascading
> > in the past and it offered a planner which given a set of rules would
> > partition the Cascading DAG into a minimal set of vertices for the same
> > DAG. Curious if Beam has any similar functionality?
> >
> >
> >
> > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Elaborating on what Robert alluded to: when I wrote that runner
> >     author guide, portability was in its infancy. Now Beam Python can be
> >     run on Flink. So that guide is primarily focused on the "deserialize
> >     a Java DoFn and call its methods" approach. A decent amount of it is
> >     still really important to know, but is now the responsibility of the
> >     "SDK harness", aka language-specific coprocessor. For Python & Go &
> >     <insert new SDK language here> you really want to use the
> >     portability protos and the portable Flink runner is the best model.
> >
> >     Kenn
> >
> >
> >     On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         On Fri, Feb 15, 2019 at 7:36 AM Can Gencer <[email protected]
> >         <mailto:[email protected]>> wrote:
> >          >
> >          > We at Hazelcast are looking into writing a Beam runner for
> >         Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet). I
> >         wanted to introduce myself as we'll likely have questions as we
> >         start development.
> >
> >         Welcome!
> >
> >         Hazelcast looks interesting, a Beam runner for it would be very
> >         cool.
> >
> >          > Some of the things I'm wondering about currently:
> >          >
> >          > * Currently there seems to be a guide available at
> >         https://beam.apache.org/contribute/runner-guide/ , is this up to
> >         date? Is there anything in specific to be aware of when starting
> >         with a new runner that's not covered here?
> >
> >         That looks like a pretty good starting point. At a quick glance,
> I
> >         don't see anything that looks out of date. Another resource that
> >         might
> >         be helpful is a talk from last year on writing an SDK (but as it
> >         mostly covers the runner-sdk interaction, it's also quite useful
> for
> >         understanding the runner side:
> >
> https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p
> >         And please feel free to ask any questions on this list as well;
> we'd
> >         be happy to help.
> >
> >          > * Should we be targeting the latest master which is at
> >         2.12-SNAPSHOT or a stable version?
> >
> >         I would target the latest master.
> >
> >          > * After a runner is developed, how is the maintenance
> >         typically handled, as the runners seems to be part of Beam
> codebase?
> >
> >         Either is possible. Several runner adapters are part of the Beam
> >         codebase, but for example the IMB Streams Beam runner is not.
> There
> >         are certainly pros and cons (certainly early on when the APIs
> >         themselves were under heavy development it was easier to keep
> things
> >         in sync in the same codebase, but things have mostly stabilized
> >         now).
> >         A runner only becomes part of the Beam codebase if there are
> members
> >         of the community committed to maintaining it (which could include
> >         you). Both approaches are fine.
> >
> >         - Robert
> >
>

Re: Hazelcast Jet Runner

Reply via email to