I'm ambivalent about the placement of less-stable contributions, but I'd be very much in favor of being clear about how stable a given API is.
There's several axes of stability here, too: - Is the API going to stay compatible - Is the implementation going to stay compatible via pipeline updates - Is the implementation considered well-tested enough for production use The current @Experimental annotation is a blanket "no" to all of these, and lack of the annotation is a blanket "yes" - we can probably do better. We also sometimes overload the annotation to mean "uses a feature that doesn't work in all runners" (e.g. splittable or stateful ParDo). We also sometimes use an @Experimental API inside an API that's not marked this way, and there's no good way to enforce that. Sometimes instability applies only to a particular aspect of an API. And so on. I'm not really sure to do about this (introducing more granularity would also require some process for "graduating" along a given axis of stability), but thought it's worth pointing out anyways. On Tue, Feb 20, 2018 at 9:28 AM Kenneth Knowles <[email protected]> wrote: > There was a suggestion of such a thing - a collection of less well-tested > and vetted contributions. The one that has the spirit I think of is the > Piggybank from Apache Pig. Most important to make sure all committers and > users both understand the meaning of it. The biggest problem is if users > really rely on it and then have problems. Related to that is contributors > focusing there effort here and never doing the work to reach the higher > bar. But overall I am in favor of it, too. > > Kenn > > > On Tue, Feb 20, 2018 at 7:54 AM, Reuven Lax <[email protected]> wrote: > >> On further thought I like the separate package even more. It allows us to >> easily isolate all those tests, and not block commits or releases on them. >> >> On Tue, Feb 20, 2018 at 7:45 AM, Reuven Lax <[email protected]> wrote: >> >>> Another idea: we could have a special package for these "unrefined" >>> contributions. Once the contribution has had time to mature some, it can be >>> moved to the regular package structure. >>> >>> On Tue, Feb 20, 2018 at 7:41 AM, Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> I would add some @ToBeRefined or so 😁 >>>> Le 20 févr. 2018, à 16:35, Reuven Lax <[email protected]> a écrit: >>>>> >>>>> Hi JB, >>>>> >>>>> You're right, I was thinking more about changes to core when talking >>>>> about the technical-excellence bar. >>>>> >>>>> I think there still needs to be some bar for new features and >>>>> extension, but I also think it can be much lower (as nobody is breaking >>>>> anything by merging this). An example of where we still need a bar here is >>>>> tests. If a new IO has a test that the reviewer thinks will be flaky, that >>>>> flaky test will cause problems for _every_ Beam committer, and it's fair >>>>> to >>>>> ask for the test to be changed. >>>>> >>>>> Given that the bar is lower for new extensions, I think we need a good >>>>> way of marking these things so that Beam users know they are not as mature >>>>> as other parts of Beam. Traditionally we've used @Experimental, but >>>>> @Experimental has been overloaded to mean other things as well. Maybe we >>>>> need to introduce a new annotation? >>>>> >>>>> Reuven >>>>> >>>>> On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi Reuven >>>>>> >>>>>> I agree with all your points except maybe in term of bar level, >>>>>> especially on new features (like extensions or IOs). If the PRs on the >>>>>> core >>>>>> should be heavily reviewed, I'm more in favor of merging the PR pretty >>>>>> fast >>>>>> even if not perfect. It's not a technical topic, it's really a >>>>>> contribution >>>>>> and community topic. >>>>>> >>>>>> Thanks anyway, the dashboard is a good idea ! >>>>>> >>>>>> Regards >>>>>> JB >>>>>> Le 19 févr. 2018, à 19:33, Reuven Lax < [email protected]> a écrit: >>>>>>> >>>>>>> There have been a number of threads on code reviews (most recently >>>>>>> on a "State of the project" email). These threads have died out without >>>>>>> much resolution, but I'm not sure that the concerns have gone away. >>>>>>> >>>>>>> First of all, I'm of the opinion that a code-review bar for Beam >>>>>>> commits is critical to success of the project. This is a system with >>>>>>> many >>>>>>> subtle semantics, which might not be obvious at first glance. Beam >>>>>>> pipelines process user data, and the consequence of certain bugs might >>>>>>> mean >>>>>>> corrupting user data and aggregations - something to avoid at all cost >>>>>>> if >>>>>>> we want Beam to be trusted. Finally Beam pipelines often run at >>>>>>> extremely >>>>>>> high scale; while many of our committers have a strong intuition for >>>>>>> what >>>>>>> can go wrong when running at high scale, not everybody who wants to >>>>>>> contribute will have this experience. >>>>>>> >>>>>>> >>>>>>> However, we also cannot afford to let our policy get in the way of >>>>>>> building a community. We *must* remain a friendly place to develop >>>>>>> and contribute. >>>>>>> >>>>>>> >>>>>>> When I look at concerns people have had on on code reviews (and I've >>>>>>> been browsing most PRs this past year), I see a few common threads: >>>>>>> >>>>>>> >>>>>>> *Review Latency* >>>>>>> >>>>>>> Latency on code reviews can be too high. At various times folks >>>>>>> (most recently, Ahmet and I) have tried to regularly look for stale PRs >>>>>>> and >>>>>>> ping them, but latency still remains high. >>>>>>> >>>>>>> >>>>>>> *Pedantic* >>>>>>> >>>>>>> Overly-pedantic comments (change variable names, etc.) can be >>>>>>> frustrating. The PR author can feel like they are being forced to make >>>>>>> meaningless changes just so the reviewer will allow merging. Note that >>>>>>> this >>>>>>> is sometimes in the eye of the beholder - the reviewer may not think all >>>>>>> these comments are pedantic. >>>>>>> >>>>>>> >>>>>>> *Don't Do This* >>>>>>> >>>>>>> Sometimes a reviewer rejects an entire PR, saying that this should >>>>>>> not be done. There are various reasons given: this won't scale, this >>>>>>> will >>>>>>> break backwards compatibility, this will break a specific runner, etc. >>>>>>> The >>>>>>> PR author may not always understand or agree with these reasons, and >>>>>>> this >>>>>>> can leave hurt feelings. >>>>>>> >>>>>>> >>>>>>> I would like open discussion about ways of making our code-review >>>>>>> policy more welcoming. I'll seed the discussion with a few ideas: >>>>>>> >>>>>>> >>>>>>> *Code Review Dashboard and Automation* >>>>>>> >>>>>>> We should invest in adding a code-review dashboard to our site, >>>>>>> tracking stale PRs by reviewer. Quick turnaround on code reviews is >>>>>>> essential building community, so all Beam committers should consider >>>>>>> reviewing code as important as their own coding. Spark has built a >>>>>>> PR dashboard (https://spark-prs.appspot.com/) which they’ve found >>>>>>> better than Github’s dashboard; we could easily fork this dashboard. >>>>>>> There >>>>>>> are also tools that will automatically ping reviewers (mention-bot and >>>>>>> hey >>>>>>> there are two such tools). We can also make sure that new PRs are auto >>>>>>> assigned a reviewer (e.g. https://github.com/imsky/pull-review) >>>>>>> >>>>>>> >>>>>>> *Code Review Response SLA* >>>>>>> >>>>>>> It would be great if we could agree on a response-time SLA for Beam >>>>>>> code reviews. The response might be “I am unable to do the review until >>>>>>> next week,” however even that is better than getting no response. >>>>>>> >>>>>>> >>>>>>> *Guideline Document* >>>>>>> >>>>>>> I think we should have a guideline document, explaining common >>>>>>> reasons a reviewer might reject an approach in a PR. e.g. "This will >>>>>>> cause >>>>>>> scaling problems," "This will cause problems for XXX runner," "This is >>>>>>> backwards incompatible." Reviewers can point to this doc as part of >>>>>>> their >>>>>>> comments, along with extra flavor. e.g. “as per the guideline doc, this >>>>>>> will cause scaling problems, and here’s why.” >>>>>>> >>>>>>> >>>>>>> *Guidelines on Comments* >>>>>>> >>>>>>> Not everyone agrees on which comments are pedantic, which makes it >>>>>>> hard to have specific guidelines here. One general guideline me might >>>>>>> adopt: if it'll take less time for the reviewer to make the changes >>>>>>> themselves, it's not an appropriate comment. The reviewer should fix >>>>>>> those >>>>>>> issues a follow-on PR. This adds a bit more burden on reviewers, >>>>>>> but IMO is worth it to keep Beam a friendly environment. This is >>>>>>> especially >>>>>>> important for first-time contributors, who might otherwise lost >>>>>>> interest. >>>>>>> If the author is a seasoned Beam contributor, we can expect more out of >>>>>>> them. >>>>>>> >>>>>>> >>>>>>> We should also make sure that these fixups serve as educational >>>>>>> moments for the new contributor. “Thanks for the contribution! I’ll be >>>>>>> making some changes during the merge so that the code stays consistent >>>>>>> across the codebase - keep an eye on them.” >>>>>>> >>>>>>> >>>>>>> Would love to hear more thoughts. >>>>>>> >>>>>>> >>>>>>> Reuven >>>>>>> >>>>>>> >>>>> >>> >> >
