+1 (non googler) big help for transparency and for future runners.
Best, Kai On Thu, Sep 13, 2018, 11:45 Xinyu Liu <xinyuliu...@gmail.com> wrote: > Big +1 (non-googler). > > From Samza Runner's perspective, we are very happy to see dataflow worker > code so we can learn and compete :). > > Thanks, > Xinyu > > On Thu, Sep 13, 2018 at 11:34 AM Suneel Marthi <suneel.mar...@gmail.com> > wrote: > >> +1 (non-googler) >> >> This is a great 👍 move >> >> Sent from my iPhone >> >> On Sep 13, 2018, at 2:25 PM, Tim Robertson <timrobertson...@gmail.com> >> wrote: >> >> +1 (non googler) >> It sounds pragmatic, helps with transparency should issues arise and >> enables more people to fix. >> >> >> On Thu, Sep 13, 2018 at 8:15 PM Dan Halperin <dhalp...@apache.org> wrote: >> >>> From my perspective as a (non-Google) community member, huge +1. >>> >>> I don't see anything bad for the community about open sourcing more of >>> the probably-most-used runner. While the DirectRunner is probably still the >>> most referential implementation of Beam, can't hurt to see more working >>> code. Other runners or runner implementors can refer to this code if they >>> want, and ignore it if they don't. >>> >>> In terms of having more code and tests to support, well, that's par for >>> the course. Will this change make the things that need to be done to >>> support them more obvious? (E.g., "this PR is blocked because someone at >>> Google on Dataflow team has to fix something" vs "this PR is blocked >>> because the Apache Beam code in foo/bar/baz is failing, and anyone who can >>> see the code can fix it"). The latter seems like a clear win for the >>> community. >>> >>> (As long as the code donation is handled properly, but that's completely >>> orthogonal and I have no reason to think it wouldn't be.) >>> >>> Thanks, >>> Dan >>> >>> On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik <lc...@google.com> wrote: >>> >>>> Yes, I'm specifically asking the community for opinions as to whether >>>> it should be accepted or not. >>>> >>>> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi <rang...@google.com> >>>> wrote: >>>> >>>>> This is terrific! >>>>> >>>>> Is thread asking for opinions from the community about if it should be >>>>> accepted? Assuming Google side decision is made to contribute, big +1 from >>>>> me to include it next to other runners. >>>>> >>>>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik <lc...@google.com> wrote: >>>>> >>>>>> At Google we have been importing the Apache Beam code base and >>>>>> integrating it with the Google portion of the codebase that supports the >>>>>> Dataflow worker. This process is painful as we regularly are making >>>>>> breaking API changes to support libraries related to running portable >>>>>> pipelines (and sometimes in other places as well). This has made it >>>>>> sometimes difficult for PR changes to make changes without either >>>>>> breaking >>>>>> something for Google or waiting for a Googler to make the change >>>>>> internally >>>>>> (e.g. dependency updates). >>>>>> >>>>>> This code is very similar to the other integrations that exist for >>>>>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits >>>>>> on top of an execution engine. There is no super secret awesome stuff as >>>>>> this code was already publicly visible in the past when it was part of >>>>>> the >>>>>> Google Cloud Dataflow github repo[1]. >>>>>> >>>>>> Process wise the code will need to get approval from Google to be >>>>>> donated and for it to go through the code donation process but before we >>>>>> attempt to do that, I was wondering whether the community would object to >>>>>> adding this code to the master branch? >>>>>> >>>>>> The up side is that people can make breaking changes and fix it for >>>>>> all runners. It will also help Googlers contribute more to the >>>>>> portability >>>>>> story as it will remove the burden of doing the code import (wasted time) >>>>>> and it will allow people to develop in master (can have the whole project >>>>>> loaded in a single IDE). >>>>>> >>>>>> The downsides are that this will represent more code and unit tests >>>>>> to support. >>>>>> >>>>>> 1: >>>>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker >>>>>> >>>>>