Good points Kenn. I think we mostly agree on what has been discussed in this
thread the pros/cons of having runners on our repository, but this is probably
not the best moment in time to change any policy in that aspect.

So if nobody objects I think we can proceed. I am OOO this week so with less
time to continue with the code review, but I will be back to finish the review
and hopefully finally get this merged with Pulasthi next week (sorry for the
delay).

> (don't wait for me on code review - if Ismaël said it is good, then it is
> good.)

Thanks for your confidence. Twister2 runners looks good so far, but I will
confirm 100% next week :) In the meantime if someone has some extra cycles to
take a look extra feedback is always welcome.

On Mon, Mar 9, 2020 at 5:50 AM Kenneth Knowles <[email protected]> wrote:
>
> I haven't heard anyone suggest that we need a vote. I haven't heard anyone 
> object to this being merged to master. Some time ago, we mostly decided to 
> favor master instead of branches, because it is so much smoother for 
> contributors and users.
>
> So I am poking this thread one last time and otherwise I would consider it 
> consensus that once code review is done the runner is a part of Beam 
> (experimental!).
>
> (don't wait for me on code review - if Ismaël said it is good, then it is 
> good.)
>
> Kenn
>
> On Fri, Mar 6, 2020 at 7:47 AM Pulasthi Supun Wickramasinghe 
> <[email protected]> wrote:
>>
>> I understand that the discussion is on a more broad level than the Twister2 
>> runner. From my experience developing the runner the main advantage of being 
>> inside the beam project was the easy access to the wide range of tests and 
>> other core/utility code as Kyle pointed out. Unmerging runners that are not 
>> properly maintained and updated would be the most logical path to follow 
>> since the internals of the runners are only well understood by developers of 
>> that particular project. It would be unreasonable to expect the Beam 
>> community to maintain them. And since the runners do not alter the core 
>> API's I assume they would be easy to unmerge if the need arises.
>>
>> Talking specifically about Twister2 runner, we hope to continue developing 
>> the runner in the future to add both streaming capability and develop a 
>> portable runner as well. The team behind Twister2 is working towards the 
>> goal to get the project into Apache Incubator in the near future (Hopefully 
>> to submit the proposal in the next couple of months).
>>
>> Best Regards,
>> Pulasthi
>>
>>
>>
>> On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw <[email protected]> wrote:
>>>
>>> I think we will get to a point where it makes sense for runners to
>>> live in their own repositories, with their own release cadence, but
>>> we're not at that point yet. One prerequisite is a stable API--we're
>>> closing in on that with the portability protos, but many (java)
>>> runners actually share the common runner core libraries and that is
>>> even less set in stone.
>>>
>>> On the other hand, taking responsibility for maintaining all runners
>>> is not a tenable or scalable position for the Beam project. If a
>>> runner is merged, it should be understood that it can be "un-merged"
>>> if it causes a maintenance burden. A completely separate
>>> project/repository makes this less messy.
>>>
>>> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles <[email protected]> wrote:
>>> >
>>> > I agree with both of you, mostly :-)
>>> >
>>> > The monorepo approach doesn't work/scale well for shipped libraries (name 
>>> > a Google library that silently just works and never causes any dependency 
>>> > problems) and the pain we feel has been constant and increasing, but I 
>>> > don't think we are at the breaking point.
>>> >
>>> > But Google's big monorepo [1] demonstrates similar benefits to what Kyle 
>>> > describes. In the early stages the benefit of not having to think too 
>>> > hard about build/test infra and share it everywhere is a big help, and it 
>>> > scales well. Eventually, shipping test utility libraries and compliance 
>>> > suites can be equivalent. And to your point - it is very helpful for 
>>> > users to know that they can use CassandraIO with the other Beam 
>>> > artifacts. This is why Google requires the whole big repo to depend on a 
>>> > single version of any externally-controlled artifact. But, yes, as a 
>>> > consequence it is preposterously difficult to stay up to date, since 
>>> > literally anything can block progress. You need a unified escalation 
>>> > chain for that policy to make sense. It is the definition of a healthy 
>>> > Apache project to *not* have that (PMC is different).
>>> >
>>> > Independent dependencies, independent git histories, and independent 
>>> > release cadence/process are all separate discussions.
>>> >
>>> > It is a broader question than this particular contribution, so let's 
>>> > merge this runner before changing our whole way of doing things :-)
>>> >
>>> > Kenn
>>> >
>>> > [1] 
>>> > https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
>>> >  (really quite a balanced analysis)
>>> >
>>> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <[email protected]> wrote:
>>> >>
>>> >> > Should runners, current and future, be in the same repository as Beam
>>> >> > core?
>>> >>
>>> >> In the distant past, runners lived in their own repositories, and then 
>>> >> were donated to Beam. But Beam's current uber-repo setup allows a lot of 
>>> >> convenience. For example, a ton of code (including core functionality 
>>> >> and tests) is shared directly between runners, which is useful for 
>>> >> keeping runners up to date and ensuring consistent behavior between them 
>>> >> (in other words, maintainable and reliable).
>>> >>
>>> >> Generally, it is up to the authors of a particular Beam related 
>>> >> project/subproject to decide whether to host their code in Beam or in a 
>>> >> different repo, and up to the community to decide whether to take on the 
>>> >> donation, as discussed in previous threads on the Twister2 runner. In 
>>> >> this case, it seems there is agreement between the Twister2 runner 
>>> >> authors and the community that the runner can be hosted in Beam proper.
>>> >>
>>> >> There are examples of successful independent Beam projects, such as 
>>> >> Spotify's Scio, but having an independent project with its own releases 
>>> >> requires a lot of dedicated resources, and the bar for entry for 
>>> >> extending Beam should not be that high. All that's required of 
>>> >> subproject authors is that they keep the subproject in step with Beam. 
>>> >> If they can't maintain it any longer, the subproject can be allowed to 
>>> >> bitrot without getting in anyone's way. On the other hand, I'm not sure 
>>> >> of the details with Cassandra, but in general, a subproject should not 
>>> >> have "the ability to block progress" just because it is contained in the 
>>> >> Beam uber-repo.
>>> >>
>>> >> tl;dr Having an uber repo generally seems to work for Beam. Exceptions 
>>> >> are few enough to be handled on a case-by-case basis.
>>> >>
>>> >> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold 
>>> >> <[email protected]> wrote:
>>> >>>
>>> >>> Generic question without commenting on Twister2 specifically:
>>> >>>
>>> >>> Should runners, current and future, be in the same repository as Beam
>>> >>> core? Can or should they be completely separate products with their
>>> >>> own release cycles?
>>> >>>
>>> >>> Generally, loose coupling leads to more maintainable, reliable
>>> >>> projects. Specifically, Cassandra is holding back some other changes
>>> >>> in Beam and I really wish it didn't have the ability to block
>>> >>> progress. The more different runners we have in core, the worse this
>>> >>> problem is likely to become.
>>> >>>
>>> >>>
>>> >>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe
>>> >>> <[email protected]> wrote:
>>> >>> >
>>> >>> > Hi
>>> >>> >
>>> >>> > I believe the pull request is pretty complete now with the help of 
>>> >>> > Ismaël. Kenn, would you be able to take a look at it and suggest any 
>>> >>> > changes if needed?. The build checks and validations tests are 
>>> >>> > passing at the moment.  I will start working on the documentation 
>>> >>> > that you mentioned in an earlier email separately.
>>> >>> >
>>> >>> > Best Regards,
>>> >>> > Pulasthi
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe 
>>> >>> > <[email protected]> wrote:
>>> >>> >>
>>> >>> >> Hi All,
>>> >>> >>
>>> >>> >> I have created the initial pull request [1] to contribute the 
>>> >>> >> Twister2 Beam runner to the Apache Beam codebase. More information 
>>> >>> >> on Twister2 can be found here[2] and the Twister2 codebase is 
>>> >>> >> available here[3]. At the moment only batch mode is supported in the 
>>> >>> >> runner, but we are planning to add stream support and implement a 
>>> >>> >> portable runner for Twister2 in the near future.
>>> >>> >>
>>> >>> >> As Kenn pointed out in an earlier email it would be great to have 
>>> >>> >> inputs from the community regarding this contribution since it is a 
>>> >>> >> sizable one. I am sure there are many improvements that can be done 
>>> >>> >> in the contributed codebase with input from the community.
>>> >>> >>
>>> >>> >> [1] https://github.com/apache/beam/pull/10888
>>> >>> >> [2] https://twister2.org/
>>> >>> >> [3] https://github.com/DSC-SPIDAL/twister2
>>> >>> >>
>>> >>> >> Best Regards,
>>> >>> >> Pulasthi
>>> >>> >> --
>>> >>> >> Pulasthi S. Wickramasinghe
>>> >>> >> PhD Candidate  | Research Assistant
>>> >>> >> School of Informatics and Computing | Digital Science Center
>>> >>> >> Indiana University, Bloomington
>>> >>> >> cell: 224-386-9035
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Pulasthi S. Wickramasinghe
>>> >>> > PhD Candidate  | Research Assistant
>>> >>> > School of Informatics and Computing | Digital Science Center
>>> >>> > Indiana University, Bloomington
>>> >>> > cell: 224-386-9035
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Elliotte Rusty Harold
>>> >>> [email protected]
>>
>>
>>
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035

Reply via email to