>From my perspective as a (non-Google) community member, huge +1.

I don't see anything bad for the community about open sourcing more of the
probably-most-used runner. While the DirectRunner is probably still the
most referential implementation of Beam, can't hurt to see more working
code. Other runners or runner implementors can refer to this code if they
want, and ignore it if they don't.

In terms of having more code and tests to support, well, that's par for the
course. Will this change make the things that need to be done to support
them more obvious? (E.g., "this PR is blocked because someone at Google on
Dataflow team has to fix something" vs "this PR is blocked because the
Apache Beam code in foo/bar/baz is failing, and anyone who can see the code
can fix it"). The latter seems like a clear win for the community.

(As long as the code donation is handled properly, but that's completely
orthogonal and I have no reason to think it wouldn't be.)

Thanks,
Dan

On Thu, Sep 13, 2018 at 11:06 AM Lukasz Cwik <[email protected]> wrote:

> Yes, I'm specifically asking the community for opinions as to whether it
> should be accepted or not.
>
> On Thu, Sep 13, 2018 at 10:51 AM Raghu Angadi <[email protected]> wrote:
>
>> This is terrific!
>>
>> Is thread asking for opinions from the community about if it should be
>> accepted? Assuming Google side decision is made to contribute, big +1 from
>> me to include it next to other runners.
>>
>> On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik <[email protected]> wrote:
>>
>>> At Google we have been importing the Apache Beam code base and
>>> integrating it with the Google portion of the codebase that supports the
>>> Dataflow worker. This process is painful as we regularly are making
>>> breaking API changes to support libraries related to running portable
>>> pipelines (and sometimes in other places as well). This has made it
>>> sometimes difficult for PR changes to make changes without either breaking
>>> something for Google or waiting for a Googler to make the change internally
>>> (e.g. dependency updates).
>>>
>>> This code is very similar to the other integrations that exist for
>>> runners such as Flink/Spark/Apex/Samza. It is an adaption layer that sits
>>> on top of an execution engine. There is no super secret awesome stuff as
>>> this code was already publicly visible in the past when it was part of the
>>> Google Cloud Dataflow github repo[1].
>>>
>>> Process wise the code will need to get approval from Google to be
>>> donated and for it to go through the code donation process but before we
>>> attempt to do that, I was wondering whether the community would object to
>>> adding this code to the master branch?
>>>
>>> The up side is that people can make breaking changes and fix it for all
>>> runners. It will also help Googlers contribute more to the portability
>>> story as it will remove the burden of doing the code import (wasted time)
>>> and it will allow people to develop in master (can have the whole project
>>> loaded in a single IDE).
>>>
>>> The downsides are that this will represent more code and unit tests to
>>> support.
>>>
>>> 1:
>>> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/tree/hotfix_v1.2/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker
>>>
>>

Reply via email to