This is a really interesting and important discussion. Having multiple reference runners can have its pros and cons. It is all about tradeoffs. From the end user point of view it can feel weird to deal with tools and packaging of a different ecosystem, e.g. python devs dealing with all the quirkiness of Java packaging, or the viceversa Java developers dealing with pip and friends. So having a reference runner per language would be more natural and help also valídate the portability concept, however having multiple reference runners sounds harder from the maintenance point of view.
Most of the software in the domain of beam have been traditionally written in Java so there is a BIG advantage of ready to use (and mature) libraries and reusable components (also the reference runner may profit of the librarires that Thomas and others in the community have developed for multi runner s). This is a big win, but more important, we can have more eyes looking and contributing improvemetns and fixes that will benefit the reference runner and others. Having a reference runner per language would be nice but if we must choose only one language I prefer it to be Java just because we have a bigger community that can contribute and improve it. We may work on making the distribution of such runner more easier or friendly for users of different languages. On Wed, Feb 13, 2019 at 3:47 AM Robert Bradshaw <[email protected]> wrote: > > I agree, it's useful for runners that are used for tests (including testing > SDKs) to push into the dark corners of what's allowed by the spec. I think > this can be added (where they don't already exist) to existing non-production > runners. (Whether a direct runner should be considered production or not > depends on who you ask...) > > On Wed, Feb 13, 2019 at 2:49 AM Daniel Oliveira <[email protected]> > wrote: >> >> +1 to Kenn's point. Regardless of whether we go with a Python runner or a >> Java runner, I think we should have at least one portable runner that isn't >> a production runner for the reasons he outlined. >> >> As for the rest of the discussion, it sounds like people are generally >> supportive of having the Python FnApiRunner as that runner, and using Flink >> as a reference implementation for portability in Java. >> >> On Tue, Feb 12, 2019 at 4:37 PM Kenneth Knowles <[email protected]> wrote: >>> >>> >>> On Tue, Feb 12, 2019 at 8:59 AM Thomas Weise <[email protected]> wrote: >>>> >>>> The Java ULR initially provided some value for the portability effort as >>>> Max mentions. It helped to develop the shared library for all Java runners >>>> and the job server functionality. >>>> >>>> However, I think the same could have been accomplished by developing the >>>> Flink runner instead of the Java ULR from the get go. This is also what >>>> happened later last year when support for state, timers and metrics was >>>> added to the portable Flink runner first and the ULR still does not >>>> support those features [1]. >>>> >>>> Since all (or most) Java based runners that are based on another ASF >>>> project support embedded execution, I think it might make sense to >>>> discontinue separate direct runners for Java and instead focus efforts on >>>> making the runners that folks would also use in production better? >>> >>> >>> Caveat: if people only test using embedded execution of a production >>> runner, they are quite likely to depend on quirks of that runner, such as >>> bundle size, fusion, whether shuffle is also checkpoint, etc. I think >>> there's a lot of value in an antagonistic testing runner, which is >>> something the Java DirectRunner tried to do with GBK random ordering, >>> checking illegal mutations, checking encodability. These were all driven by >>> real user needs and each caught a lot of user bugs. That said, I wouldn't >>> want to maintain an extra runner, but would like to put these into a >>> portable runner, whichever it is. >>> >>> Kenn >>> >>>> >>>> >>>> As for Python (and hopefully soon Go), it makes a lot of sense to have a >>>> simple to use and stable runner that can be used for local development. At >>>> the moment, the Py FnApiRunner seems the best candidate to serve as >>>> reference for portability. >>>> >>>> On a related note, we should probably also consider making pure Java >>>> pipeline execution via portability framework on a Java runner simpler and >>>> more efficient. We already use embedded environment for testing. If we >>>> also inline/embed the job server and this becomes readily available and >>>> easy to use, it might improve chances of other runners migrating to >>>> portability sooner. >>>> >>>> Thomas >>>> >>>> [1] https://s.apache.org/apache-beam-portability-support-table >>>> >>>> >>>> >>>> On Tue, Feb 12, 2019 at 3:34 AM Maximilian Michels <[email protected]> wrote: >>>>> >>>>> Do you consider job submission and artifact staging part of the >>>>> ReferenceRunner? If so, these parts have been reused or served as a >>>>> model for the portable FlinkRunner. So they had some value. >>>>> >>>>> A reference implementation helps Runner authors to understand and reuse >>>>> the code. However, I agree that the Flink implementation is more helpful >>>>> to Runners authors than a ReferenceRunner which was designed for single >>>>> node testing. >>>>> >>>>> I think there are three parts which help to push forward portability: >>>>> >>>>> 1) Good library support for new portable Runners (Java) >>>>> 2) A reference implementation of a distributed Runner (Flink) >>>>> 3) An easy way for users to run/test portable Pipelines (Python via >>>>> FnApiRunner) >>>>> >>>>> The main motivation for the portability layer is supporting additional >>>>> language to Java. Most users will be using Python, so focusing on a good >>>>> reference Runner in Python is key. >>>>> >>>>> -Max >>>>> >>>>> On 12.02.19 10:11, Robert Bradshaw wrote: >>>>> > This is certainly an interesting question, and I definitely have my >>>>> > opinions, but am curious as to what others think as well. >>>>> > >>>>> > One thing that I think wasn't as clear from the outset is distinguishing >>>>> > between the development of runners/core-java and development of a Java >>>>> > reference runner itself. With the work on work on moving Flink to >>>>> > portability, it turned out that work on the latter was not a >>>>> > prerequisite for work on the former, and runners/core-java is the >>>>> > artifact that other runners want to build on. I think that it is also >>>>> > the case, as suggested, that a distributed runner's use of this shared >>>>> > library is a better reference point (for other distributed runners) than >>>>> > one using the direct runner (e.g. there is a much more obvious >>>>> > delineation between the runner's responsibility and Beam code than in >>>>> > the direct runner where the boundaries between orchestration, execution, >>>>> > and other concerns are not as clear). >>>>> > >>>>> > As well as serving as a reference to runner implementers, the reference >>>>> > runner can also be useful for prototyping (here I think Python holds an >>>>> > advantage, but we're getting into subjective areas now), documenting (or >>>>> > ideally augmenting the documentation of) the spec (here I'd say a >>>>> > smaller advantage to Python, but neither runner clean, straightforward, >>>>> > and documented enough to serve this purpose well yet), and serving as a >>>>> > lightweight universal local runner against which to develop (and, >>>>> > possibly use long term in place of a direct runner) new SDKs (here >>>>> > you'll get a wide variety of answers whether Python or Java is easier to >>>>> > take on as a dependency for a third language, or we could just package >>>>> > it up in a docker image and take docker as a dependency). >>>>> > >>>>> > Another more pragmatic note is that one thing that helped both the Flink >>>>> > and FnApiRunner forwards is that they were driven forward by actual >>>>> > usecases--Lyft has actual Python (necessitating portable) pipelines they >>>>> > want to run on Flink, and the FnApiRunner is the direct runner for >>>>> > Python. The Java ULR (at least where it is now) sits in an awkward place >>>>> > where its only role is to be a reference rather than be used, which (in >>>>> > a world of limited resources) makes it harder to justify investment. >>>>> > >>>>> > - Robert >>>>> > >>>>> > >>>>> > >>>>> > On Tue, Feb 12, 2019 at 3:53 AM Kenneth Knowles <[email protected] >>>>> > <mailto:[email protected]>> wrote: >>>>> > >>>>> > Interesting silence here. You've got it right that the reason we >>>>> > initially chose Java was because of the cross-runner sharing. The >>>>> > reference runner could be the first target runner for any new >>>>> > feature and then its work could be directly (or indirectly via >>>>> > copy/paste/modify if it works better) be used in other runners. >>>>> > Examples: >>>>> > >>>>> > - The implementations of (pre-portability) state & timers in >>>>> > runners/core-java and prototyped in the Java DirectRunner made it a >>>>> > matter of a couple of days to implement on other runners, and they >>>>> > saw pretty quick adoption. >>>>> > - Probably the same could be said for the first drafts of the >>>>> > runners, which re-used a bunch of runners/core-java and had each >>>>> > others' translation code as a reference. >>>>> > >>>>> > I'm interested if anyone would be willing to confirm if it is >>>>> > because the FlinkRunner has forged ahead and the Dataflow worker is >>>>> > open source. It makes sense that the code from a distributed runner >>>>> > is an even better reference point if you are building another >>>>> > distributed runner. From the look of it, the SamzaRunner had no >>>>> > trouble getting started on portability. >>>>> > >>>>> > Kenn >>>>> > >>>>> > On Mon, Feb 11, 2019 at 6:04 PM Daniel Oliveira >>>>> > <[email protected] <mailto:[email protected]>> wrote: >>>>> > >>>>> > Yeah, the FnApiRunner is what I'm leaning towards too. I wasn't >>>>> > sure how much demand there was for an actual reference >>>>> > implementation in Java though, so I was hoping there were runner >>>>> > authors that would want to chime in. >>>>> > >>>>> > On the other hand, the Flink runner could serve as a reference >>>>> > implementation for portable features since it's further along, >>>>> > so maybe it's not an issue regardless. >>>>> > >>>>> > On Mon, Feb 11, 2019 at 1:09 PM Sam Rohde <[email protected] >>>>> > <mailto:[email protected]>> wrote: >>>>> > >>>>> > Thanks for starting this thread. If I had to guess, I would >>>>> > say there is more of a demand for Python as it's more widely >>>>> > used for data scientists/ analytics. Being pragmatic, the >>>>> > FnApiRunner already has more feature work than the Java so >>>>> > we should go with that. >>>>> > >>>>> > -Sam >>>>> > >>>>> > On Fri, Feb 8, 2019 at 10:07 AM Daniel Oliveira >>>>> > <[email protected] <mailto:[email protected]>> >>>>> > wrote: >>>>> > >>>>> > Hello Beam dev community, >>>>> > >>>>> > For those who don't know me, I work for Google and I've >>>>> > been working on the Java reference runner, which is a >>>>> > portable, local Java runner (it's basically the direct >>>>> > runner with the portability APIs implemented). Our goal >>>>> > in working on this was to have a portable runner which >>>>> > ran locally so it could be used by users for testing >>>>> > portable pipelines, devs for testing new features with >>>>> > portability, and for runner authors to provide a simple >>>>> > reference implementation of a portable runner. >>>>> > >>>>> > Due to various circumstances though, progress on the >>>>> > Java reference runner has been pretty slow, and a Python >>>>> > runner which does pretty much the same things was made >>>>> > to aid portability development in Python (called the >>>>> > FnApiRunner). This runner is currently further along in >>>>> > feature work than the Java reference runner, so we've >>>>> > been reevaluating if we should switch to investing in it >>>>> > instead. >>>>> > >>>>> > My question to the community is: Which runner do you >>>>> > think would be more valuable to the dev community and >>>>> > Beam users? For those of you who are runner authors, do >>>>> > you have a preference for what language you'd like to >>>>> > see a reference implementation in? >>>>> > >>>>> > Thanks, >>>>> > Daniel Oliveira >>>>> >
