+1 On Tue, Mar 10, 2020 at 12:59 AM Alex Van Boxel <[email protected]> wrote:
> One last thing, for any runner after this one... wouldn't it be a good > acceptance criteria to only accept portable implementations anymore? > > _/ > _/ Alex Van Boxel > > > On Mon, Mar 9, 2020 at 10:42 PM Ismaël Mejía <[email protected]> wrote: > >> Good points Kenn. I think we mostly agree on what has been discussed in >> this >> thread the pros/cons of having runners on our repository, but this is >> probably >> not the best moment in time to change any policy in that aspect. >> >> So if nobody objects I think we can proceed. I am OOO this week so with >> less >> time to continue with the code review, but I will be back to finish the >> review >> and hopefully finally get this merged with Pulasthi next week (sorry for >> the >> delay). >> >> > (don't wait for me on code review - if Ismaël said it is good, then it >> is >> > good.) >> >> Thanks for your confidence. Twister2 runners looks good so far, but I will >> confirm 100% next week :) In the meantime if someone has some extra >> cycles to >> take a look extra feedback is always welcome. >> >> On Mon, Mar 9, 2020 at 5:50 AM Kenneth Knowles <[email protected]> wrote: >> > >> > I haven't heard anyone suggest that we need a vote. I haven't heard >> anyone object to this being merged to master. Some time ago, we mostly >> decided to favor master instead of branches, because it is so much smoother >> for contributors and users. >> > >> > So I am poking this thread one last time and otherwise I would consider >> it consensus that once code review is done the runner is a part of Beam >> (experimental!). >> > >> > (don't wait for me on code review - if Ismaël said it is good, then it >> is good.) >> > >> > Kenn >> > >> > On Fri, Mar 6, 2020 at 7:47 AM Pulasthi Supun Wickramasinghe < >> [email protected]> wrote: >> >> >> >> I understand that the discussion is on a more broad level than the >> Twister2 runner. From my experience developing the runner the main >> advantage of being inside the beam project was the easy access to the wide >> range of tests and other core/utility code as Kyle pointed out. Unmerging >> runners that are not properly maintained and updated would be the most >> logical path to follow since the internals of the runners are only well >> understood by developers of that particular project. It would be >> unreasonable to expect the Beam community to maintain them. And since the >> runners do not alter the core API's I assume they would be easy to unmerge >> if the need arises. >> >> >> >> Talking specifically about Twister2 runner, we hope to continue >> developing the runner in the future to add both streaming capability and >> develop a portable runner as well. The team behind Twister2 is working >> towards the goal to get the project into Apache Incubator in the near >> future (Hopefully to submit the proposal in the next couple of months). >> >> >> >> Best Regards, >> >> Pulasthi >> >> >> >> >> >> >> >> On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw <[email protected]> >> wrote: >> >>> >> >>> I think we will get to a point where it makes sense for runners to >> >>> live in their own repositories, with their own release cadence, but >> >>> we're not at that point yet. One prerequisite is a stable API--we're >> >>> closing in on that with the portability protos, but many (java) >> >>> runners actually share the common runner core libraries and that is >> >>> even less set in stone. >> >>> >> >>> On the other hand, taking responsibility for maintaining all runners >> >>> is not a tenable or scalable position for the Beam project. If a >> >>> runner is merged, it should be understood that it can be "un-merged" >> >>> if it causes a maintenance burden. A completely separate >> >>> project/repository makes this less messy. >> >>> >> >>> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles <[email protected]> >> wrote: >> >>> > >> >>> > I agree with both of you, mostly :-) >> >>> > >> >>> > The monorepo approach doesn't work/scale well for shipped libraries >> (name a Google library that silently just works and never causes any >> dependency problems) and the pain we feel has been constant and increasing, >> but I don't think we are at the breaking point. >> >>> > >> >>> > But Google's big monorepo [1] demonstrates similar benefits to what >> Kyle describes. In the early stages the benefit of not having to think too >> hard about build/test infra and share it everywhere is a big help, and it >> scales well. Eventually, shipping test utility libraries and compliance >> suites can be equivalent. And to your point - it is very helpful for users >> to know that they can use CassandraIO with the other Beam artifacts. This >> is why Google requires the whole big repo to depend on a single version of >> any externally-controlled artifact. But, yes, as a consequence it is >> preposterously difficult to stay up to date, since literally anything can >> block progress. You need a unified escalation chain for that policy to make >> sense. It is the definition of a healthy Apache project to *not* have that >> (PMC is different). >> >>> > >> >>> > Independent dependencies, independent git histories, and >> independent release cadence/process are all separate discussions. >> >>> > >> >>> > It is a broader question than this particular contribution, so >> let's merge this runner before changing our whole way of doing things :-) >> >>> > >> >>> > Kenn >> >>> > >> >>> > [1] >> https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext >> (really quite a balanced analysis) >> >>> > >> >>> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <[email protected]> >> wrote: >> >>> >> >> >>> >> > Should runners, current and future, be in the same repository as >> Beam >> >>> >> > core? >> >>> >> >> >>> >> In the distant past, runners lived in their own repositories, and >> then were donated to Beam. But Beam's current uber-repo setup allows a lot >> of convenience. For example, a ton of code (including core functionality >> and tests) is shared directly between runners, which is useful for keeping >> runners up to date and ensuring consistent behavior between them (in other >> words, maintainable and reliable). >> >>> >> >> >>> >> Generally, it is up to the authors of a particular Beam related >> project/subproject to decide whether to host their code in Beam or in a >> different repo, and up to the community to decide whether to take on the >> donation, as discussed in previous threads on the Twister2 runner. In this >> case, it seems there is agreement between the Twister2 runner authors and >> the community that the runner can be hosted in Beam proper. >> >>> >> >> >>> >> There are examples of successful independent Beam projects, such >> as Spotify's Scio, but having an independent project with its own releases >> requires a lot of dedicated resources, and the bar for entry for extending >> Beam should not be that high. All that's required of subproject authors is >> that they keep the subproject in step with Beam. If they can't maintain it >> any longer, the subproject can be allowed to bitrot without getting in >> anyone's way. On the other hand, I'm not sure of the details with >> Cassandra, but in general, a subproject should not have "the ability to >> block progress" just because it is contained in the Beam uber-repo. >> >>> >> >> >>> >> tl;dr Having an uber repo generally seems to work for Beam. >> Exceptions are few enough to be handled on a case-by-case basis. >> >>> >> >> >>> >> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold < >> [email protected]> wrote: >> >>> >>> >> >>> >>> Generic question without commenting on Twister2 specifically: >> >>> >>> >> >>> >>> Should runners, current and future, be in the same repository as >> Beam >> >>> >>> core? Can or should they be completely separate products with >> their >> >>> >>> own release cycles? >> >>> >>> >> >>> >>> Generally, loose coupling leads to more maintainable, reliable >> >>> >>> projects. Specifically, Cassandra is holding back some other >> changes >> >>> >>> in Beam and I really wish it didn't have the ability to block >> >>> >>> progress. The more different runners we have in core, the worse >> this >> >>> >>> problem is likely to become. >> >>> >>> >> >>> >>> >> >>> >>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe >> >>> >>> <[email protected]> wrote: >> >>> >>> > >> >>> >>> > Hi >> >>> >>> > >> >>> >>> > I believe the pull request is pretty complete now with the help >> of Ismaël. Kenn, would you be able to take a look at it and suggest any >> changes if needed?. The build checks and validations tests are passing at >> the moment. I will start working on the documentation that you mentioned >> in an earlier email separately. >> >>> >>> > >> >>> >>> > Best Regards, >> >>> >>> > Pulasthi >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe < >> [email protected]> wrote: >> >>> >>> >> >> >>> >>> >> Hi All, >> >>> >>> >> >> >>> >>> >> I have created the initial pull request [1] to contribute the >> Twister2 Beam runner to the Apache Beam codebase. More information on >> Twister2 can be found here[2] and the Twister2 codebase is available >> here[3]. At the moment only batch mode is supported in the runner, but we >> are planning to add stream support and implement a portable runner for >> Twister2 in the near future. >> >>> >>> >> >> >>> >>> >> As Kenn pointed out in an earlier email it would be great to >> have inputs from the community regarding this contribution since it is a >> sizable one. I am sure there are many improvements that can be done in the >> contributed codebase with input from the community. >> >>> >>> >> >> >>> >>> >> [1] https://github.com/apache/beam/pull/10888 >> >>> >>> >> [2] https://twister2.org/ >> >>> >>> >> [3] https://github.com/DSC-SPIDAL/twister2 >> >>> >>> >> >> >>> >>> >> Best Regards, >> >>> >>> >> Pulasthi >> >>> >>> >> -- >> >>> >>> >> Pulasthi S. Wickramasinghe >> >>> >>> >> PhD Candidate | Research Assistant >> >>> >>> >> School of Informatics and Computing | Digital Science Center >> >>> >>> >> Indiana University, Bloomington >> >>> >>> >> cell: 224-386-9035 <(224)%20386-9035> >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > -- >> >>> >>> > Pulasthi S. Wickramasinghe >> >>> >>> > PhD Candidate | Research Assistant >> >>> >>> > School of Informatics and Computing | Digital Science Center >> >>> >>> > Indiana University, Bloomington >> >>> >>> > cell: 224-386-9035 <(224)%20386-9035> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> -- >> >>> >>> Elliotte Rusty Harold >> >>> >>> [email protected] >> >> >> >> >> >> >> >> -- >> >> Pulasthi S. Wickramasinghe >> >> PhD Candidate | Research Assistant >> >> School of Informatics and Computing | Digital Science Center >> >> Indiana University, Bloomington >> >> cell: 224-386-9035 <(224)%20386-9035> >> >
