Hi Can, Like GreedyPipelineFuser, we have added many more components which makes building a Portable Runner easy. Here is a link [1] to slides which explains at a very high level what is needed to add a new portable runner. Still adding a portable runner will be more complex than adding a native runner but with these components it should be relatively easier than originally expected.
[1] https://docs.google.com/presentation/d/1JRNUSpOC8qaA4uLDuyGsuuyf6Tk8Xi9LAukhgl-hT_w/edit?usp=sharing Thanks, Ankur On Wed, Mar 20, 2019 at 7:19 AM Maximilian Michels <[email protected]> wrote: > Documentation on portability is still a bit sparse although there are > many design documents: > https://beam.apache.org/contribute/design-documents/#portability > > The structure of portable Runners is not fundamentally different, but > some of the operations are deferred to the SDK which runs code for all > supported languages. The Runner needs to provide an integration with it. > > Eventually, the old Runners will become obsolete though that won't > happen very soon. Performance should be slightly better on the old Runners. > > I think writing an old-style Runner now will give you enough experience > to port it to the new language-portable style later on. > > Cheers, > Max > > On 20.03.19 14:52, Can Gencer wrote: > > I had a look at "GreedyPipelineFuser" and indeed this was what exactly I > > was talking about. > > > > Is https://beam.apache.org/roadmap/portability/ still the best > > information about the portable runners or is there a more in-depth guide > > available anywhere? > > > > On Wed, Mar 20, 2019 at 2:29 PM Can Gencer <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi Max, > > > > Thanks. When you mean "old-style runner" is this meant that this > > style of runners will be obsolete and only the portable one will be > > supported? The documentation for portable runners wasn't quite > > complete and the barrier to entry for writing an old style runner > > seemed easier for us and the old style runner should have better > > performance? > > > > On Wed, Mar 20, 2019 at 1:36 PM Maximilian Michels <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi Can, > > > > Thanks for the update. Interesting question. Flink has an > > optimization > > built in called chaining which works together nicely with Beam. > > Essentially, operators which share the same partitioning get > > executed > > one after another inside a master operator. This saves resources. > > > > Interestingly, Beam's Fuser for portable Runners does something > > similar. > > AFAIK there is no built-in solution for the old-style Runners. I > > think > > it would be possible to build something like this on top of the > > existing > > translation. > > > > Cheers, > > Max > > > > On 20.03.19 13:07, Can Gencer wrote: > > > Hi again, > > > > > > We've made some progress on the runner since writing this > > more than a > > > month ago, the repo is available here publicly: > > > https://github.com/hazelcast/hazelcast-jet-beam-runner > > > > > > Still very much a work in progress though. One of the issues > > I wanted to > > > raise is that currently we're translating each PTransform to > > a Jet > > > Vertex (could be consider analogous to a Flink operator or a > > vertex in > > > Tez). This is sub-optimal, since Beam creates lots of > > transforms for > > > computations that could be performed inside the same Vertex, > > such as > > > subsequent mapping transforms and many others. Ideally you > > only need > > > distinct vertices where the data is re-partitioned and/or > > shuffled. I'm > > > curious if Beam offers some way of translating the PTransform > > graph to a > > > more minimal set of transforms, i.e. some kind of planner or > > would this > > > have to be custom code? We've done a similar integration with > > Cascading > > > in the past and it offered a planner which given a set of > > rules would > > > partition the Cascading DAG into a minimal set of vertices > > for the same > > > DAG. Curious if Beam has any similar functionality? > > > > > > > > > > > > On Sat, Feb 16, 2019 at 4:50 AM Kenneth Knowles > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > > > Elaborating on what Robert alluded to: when I wrote that > > runner > > > author guide, portability was in its infancy. Now Beam > > Python can be > > > run on Flink. So that guide is primarily focused on the > > "deserialize > > > a Java DoFn and call its methods" approach. A decent > > amount of it is > > > still really important to know, but is now the > > responsibility of the > > > "SDK harness", aka language-specific coprocessor. For > > Python & Go & > > > <insert new SDK language here> you really want to use the > > > portability protos and the portable Flink runner is the > > best model. > > > > > > Kenn > > > > > > > > > On Fri, Feb 15, 2019 at 2:08 AM Robert Bradshaw > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] > > <mailto:[email protected]>>> wrote: > > > > > > On Fri, Feb 15, 2019 at 7:36 AM Can Gencer > > <[email protected] <mailto:[email protected]> > > > <mailto:[email protected] > > <mailto:[email protected]>>> wrote: > > > > > > > > We at Hazelcast are looking into writing a Beam > > runner for > > > Hazelcast Jet > > (https://github.com/hazelcast/hazelcast-jet). I > > > wanted to introduce myself as we'll likely have > > questions as we > > > start development. > > > > > > Welcome! > > > > > > Hazelcast looks interesting, a Beam runner for it > > would be very > > > cool. > > > > > > > Some of the things I'm wondering about currently: > > > > > > > > * Currently there seems to be a guide available at > > > https://beam.apache.org/contribute/runner-guide/ , is this > up to > > > date? Is there anything in specific to be aware of > > when starting > > > with a new runner that's not covered here? > > > > > > That looks like a pretty good starting point. At a > > quick glance, I > > > don't see anything that looks out of date. Another > > resource that > > > might > > > be helpful is a talk from last year on writing an SDK > > (but as it > > > mostly covers the runner-sdk interaction, it's also > > quite useful for > > > understanding the runner side: > > > > > > https://docs.google.com/presentation/d/1Cso0XP9dmj77OD9Bd53C1M3W1sPJF0ZnA20gzb2BPhE/edit#slide=id.p > > > And please feel free to ask any questions on this > > list as well; we'd > > > be happy to help. > > > > > > > * Should we be targeting the latest master which > is at > > > 2.12-SNAPSHOT or a stable version? > > > > > > I would target the latest master. > > > > > > > * After a runner is developed, how is the > maintenance > > > typically handled, as the runners seems to be part of > > Beam codebase? > > > > > > Either is possible. Several runner adapters are part > > of the Beam > > > codebase, but for example the IMB Streams Beam runner > > is not. There > > > are certainly pros and cons (certainly early on when > > the APIs > > > themselves were under heavy development it was easier > > to keep things > > > in sync in the same codebase, but things have mostly > > stabilized > > > now). > > > A runner only becomes part of the Beam codebase if > > there are members > > > of the community committed to maintaining it (which > > could include > > > you). Both approaches are fine. > > > > > > - Robert > > > > > >
