One last thing, for any runner after this one... wouldn't it be a good
acceptance criteria to only accept portable implementations anymore?

 _/
_/ Alex Van Boxel


On Mon, Mar 9, 2020 at 10:42 PM Ismaël Mejía <[email protected]> wrote:

> Good points Kenn. I think we mostly agree on what has been discussed in
> this
> thread the pros/cons of having runners on our repository, but this is
> probably
> not the best moment in time to change any policy in that aspect.
>
> So if nobody objects I think we can proceed. I am OOO this week so with
> less
> time to continue with the code review, but I will be back to finish the
> review
> and hopefully finally get this merged with Pulasthi next week (sorry for
> the
> delay).
>
> > (don't wait for me on code review - if Ismaël said it is good, then it is
> > good.)
>
> Thanks for your confidence. Twister2 runners looks good so far, but I will
> confirm 100% next week :) In the meantime if someone has some extra cycles
> to
> take a look extra feedback is always welcome.
>
> On Mon, Mar 9, 2020 at 5:50 AM Kenneth Knowles <[email protected]> wrote:
> >
> > I haven't heard anyone suggest that we need a vote. I haven't heard
> anyone object to this being merged to master. Some time ago, we mostly
> decided to favor master instead of branches, because it is so much smoother
> for contributors and users.
> >
> > So I am poking this thread one last time and otherwise I would consider
> it consensus that once code review is done the runner is a part of Beam
> (experimental!).
> >
> > (don't wait for me on code review - if Ismaël said it is good, then it
> is good.)
> >
> > Kenn
> >
> > On Fri, Mar 6, 2020 at 7:47 AM Pulasthi Supun Wickramasinghe <
> [email protected]> wrote:
> >>
> >> I understand that the discussion is on a more broad level than the
> Twister2 runner. From my experience developing the runner the main
> advantage of being inside the beam project was the easy access to the wide
> range of tests and other core/utility code as Kyle pointed out. Unmerging
> runners that are not properly maintained and updated would be the most
> logical path to follow since the internals of the runners are only well
> understood by developers of that particular project. It would be
> unreasonable to expect the Beam community to maintain them. And since the
> runners do not alter the core API's I assume they would be easy to unmerge
> if the need arises.
> >>
> >> Talking specifically about Twister2 runner, we hope to continue
> developing the runner in the future to add both streaming capability and
> develop a portable runner as well. The team behind Twister2 is working
> towards the goal to get the project into Apache Incubator in the near
> future (Hopefully to submit the proposal in the next couple of months).
> >>
> >> Best Regards,
> >> Pulasthi
> >>
> >>
> >>
> >> On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw <[email protected]>
> wrote:
> >>>
> >>> I think we will get to a point where it makes sense for runners to
> >>> live in their own repositories, with their own release cadence, but
> >>> we're not at that point yet. One prerequisite is a stable API--we're
> >>> closing in on that with the portability protos, but many (java)
> >>> runners actually share the common runner core libraries and that is
> >>> even less set in stone.
> >>>
> >>> On the other hand, taking responsibility for maintaining all runners
> >>> is not a tenable or scalable position for the Beam project. If a
> >>> runner is merged, it should be understood that it can be "un-merged"
> >>> if it causes a maintenance burden. A completely separate
> >>> project/repository makes this less messy.
> >>>
> >>> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles <[email protected]>
> wrote:
> >>> >
> >>> > I agree with both of you, mostly :-)
> >>> >
> >>> > The monorepo approach doesn't work/scale well for shipped libraries
> (name a Google library that silently just works and never causes any
> dependency problems) and the pain we feel has been constant and increasing,
> but I don't think we are at the breaking point.
> >>> >
> >>> > But Google's big monorepo [1] demonstrates similar benefits to what
> Kyle describes. In the early stages the benefit of not having to think too
> hard about build/test infra and share it everywhere is a big help, and it
> scales well. Eventually, shipping test utility libraries and compliance
> suites can be equivalent. And to your point - it is very helpful for users
> to know that they can use CassandraIO with the other Beam artifacts. This
> is why Google requires the whole big repo to depend on a single version of
> any externally-controlled artifact. But, yes, as a consequence it is
> preposterously difficult to stay up to date, since literally anything can
> block progress. You need a unified escalation chain for that policy to make
> sense. It is the definition of a healthy Apache project to *not* have that
> (PMC is different).
> >>> >
> >>> > Independent dependencies, independent git histories, and independent
> release cadence/process are all separate discussions.
> >>> >
> >>> > It is a broader question than this particular contribution, so let's
> merge this runner before changing our whole way of doing things :-)
> >>> >
> >>> > Kenn
> >>> >
> >>> > [1]
> https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
> (really quite a balanced analysis)
> >>> >
> >>> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <[email protected]>
> wrote:
> >>> >>
> >>> >> > Should runners, current and future, be in the same repository as
> Beam
> >>> >> > core?
> >>> >>
> >>> >> In the distant past, runners lived in their own repositories, and
> then were donated to Beam. But Beam's current uber-repo setup allows a lot
> of convenience. For example, a ton of code (including core functionality
> and tests) is shared directly between runners, which is useful for keeping
> runners up to date and ensuring consistent behavior between them (in other
> words, maintainable and reliable).
> >>> >>
> >>> >> Generally, it is up to the authors of a particular Beam related
> project/subproject to decide whether to host their code in Beam or in a
> different repo, and up to the community to decide whether to take on the
> donation, as discussed in previous threads on the Twister2 runner. In this
> case, it seems there is agreement between the Twister2 runner authors and
> the community that the runner can be hosted in Beam proper.
> >>> >>
> >>> >> There are examples of successful independent Beam projects, such as
> Spotify's Scio, but having an independent project with its own releases
> requires a lot of dedicated resources, and the bar for entry for extending
> Beam should not be that high. All that's required of subproject authors is
> that they keep the subproject in step with Beam. If they can't maintain it
> any longer, the subproject can be allowed to bitrot without getting in
> anyone's way. On the other hand, I'm not sure of the details with
> Cassandra, but in general, a subproject should not have "the ability to
> block progress" just because it is contained in the Beam uber-repo.
> >>> >>
> >>> >> tl;dr Having an uber repo generally seems to work for Beam.
> Exceptions are few enough to be handled on a case-by-case basis.
> >>> >>
> >>> >> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold <
> [email protected]> wrote:
> >>> >>>
> >>> >>> Generic question without commenting on Twister2 specifically:
> >>> >>>
> >>> >>> Should runners, current and future, be in the same repository as
> Beam
> >>> >>> core? Can or should they be completely separate products with their
> >>> >>> own release cycles?
> >>> >>>
> >>> >>> Generally, loose coupling leads to more maintainable, reliable
> >>> >>> projects. Specifically, Cassandra is holding back some other
> changes
> >>> >>> in Beam and I really wish it didn't have the ability to block
> >>> >>> progress. The more different runners we have in core, the worse
> this
> >>> >>> problem is likely to become.
> >>> >>>
> >>> >>>
> >>> >>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe
> >>> >>> <[email protected]> wrote:
> >>> >>> >
> >>> >>> > Hi
> >>> >>> >
> >>> >>> > I believe the pull request is pretty complete now with the help
> of Ismaël. Kenn, would you be able to take a look at it and suggest any
> changes if needed?. The build checks and validations tests are passing at
> the moment.  I will start working on the documentation that you mentioned
> in an earlier email separately.
> >>> >>> >
> >>> >>> > Best Regards,
> >>> >>> > Pulasthi
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe <
> [email protected]> wrote:
> >>> >>> >>
> >>> >>> >> Hi All,
> >>> >>> >>
> >>> >>> >> I have created the initial pull request [1] to contribute the
> Twister2 Beam runner to the Apache Beam codebase. More information on
> Twister2 can be found here[2] and the Twister2 codebase is available
> here[3]. At the moment only batch mode is supported in the runner, but we
> are planning to add stream support and implement a portable runner for
> Twister2 in the near future.
> >>> >>> >>
> >>> >>> >> As Kenn pointed out in an earlier email it would be great to
> have inputs from the community regarding this contribution since it is a
> sizable one. I am sure there are many improvements that can be done in the
> contributed codebase with input from the community.
> >>> >>> >>
> >>> >>> >> [1] https://github.com/apache/beam/pull/10888
> >>> >>> >> [2] https://twister2.org/
> >>> >>> >> [3] https://github.com/DSC-SPIDAL/twister2
> >>> >>> >>
> >>> >>> >> Best Regards,
> >>> >>> >> Pulasthi
> >>> >>> >> --
> >>> >>> >> Pulasthi S. Wickramasinghe
> >>> >>> >> PhD Candidate  | Research Assistant
> >>> >>> >> School of Informatics and Computing | Digital Science Center
> >>> >>> >> Indiana University, Bloomington
> >>> >>> >> cell: 224-386-9035
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > --
> >>> >>> > Pulasthi S. Wickramasinghe
> >>> >>> > PhD Candidate  | Research Assistant
> >>> >>> > School of Informatics and Computing | Digital Science Center
> >>> >>> > Indiana University, Bloomington
> >>> >>> > cell: 224-386-9035
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Elliotte Rusty Harold
> >>> >>> [email protected]
> >>
> >>
> >>
> >> --
> >> Pulasthi S. Wickramasinghe
> >> PhD Candidate  | Research Assistant
> >> School of Informatics and Computing | Digital Science Center
> >> Indiana University, Bloomington
> >> cell: 224-386-9035
>

Reply via email to