Re: [DISCUSS] Branches and versions for Hadoop 3

Colin McCabe Mon, 28 Aug 2017 12:46:21 -0700

On Mon, Aug 28, 2017, at 09:58, Allen Wittenauer wrote:
> 
> > On Aug 25, 2017, at 1:23 PM, Jason Lowe <jl...@oath.com> wrote:
> > 
> > Allen Wittenauer wrote:
> >  
> > > Doesn't this place an undue burden on the contributor with the first 
> > > incompatible patch to prove worthiness?  What happens if it is decided 
> > > that it's not good enough?
> > 
> > It is a burden for that first, "this can't go anywhere else but 4.x" 
> > change, but arguably that should not be a change done lightly anyway.  (Or 
> > any other backwards-incompatible change for that matter.)  If it's worth 
> > committing then I think it's perfectly reasonable to send out the dev 
> > announce that there's reason for trunk to diverge from 3.x, cut branch-3, 
> > and move on.  This is no different than Andrew's recent announcement that 
> > there's now a need for separating trunk and the 3.0 line based on what's 
> > about to go in.
> 
>       So, by this definition as soon as a patch comes in to remove deprecated 
> bits there will be no issue with a branch-3 getting created, correct?


Jason wrote that making backwards incompatible changes should not be
"done lightly."  By that definition, "a patch... to remove deprecated
bits" would not cause us to create a whole branch for it.  It should be
something where someone could reasonably make the case that breaking
backwards compatibility was worth it.

> 
> >  Otherwise if past trunk behavior is any indication, it ends up mostly 
> > enabling people to commit to just trunk, forgetting that the thing they are 
> > committing is perfectly valid for branch-3. 
> 
>       I'm not sure there was any "forgetting" involved.  We likely wouldn't 
> be talking about 3.x at all if it wasn't for the code diverging enough.
> 
> > > Given the number of committers that openly ignore discussions like this, 
> > > who is going to verify that incompatible changes don't get in?
> >  
> > The same entities who are verifying other bugs don't get in, i.e.: the 
> > committers and the Hadoop QA bot running the tests.
> >  Yes, I know that means it's inevitable that compatibility breakages will 
> > happen, and we can and should improve the automation around compatibility 
> > testing when possible.
> 
>       The automation only goes so far.  At least while investigating Yetus 
> bugs, I've seen more than enough blatant and purposeful ignored errors and 
> warnings that I'm not convinced it will be effective. ("That javadoc compile 
> failure didn't come from my patch!"  Um, yes, yes it did.) PR for features 
> has greatly trumped code correctness for a few years now.
> 
>       In any case, specifically thinking of the folks that commit maybe one 
> or two patches a year.  They generally don't pay attention to *any* of this 
> stuff and it doesn't seem like many people are actually paying attention to 
> what gets committed until it breaks their universe.
> 
> >  But I don't think there's a magic bullet for preventing all compatibility 
> > bugs from being introduced, just like there isn't one for preventing 
> > general bugs.  Does having a trunk branch separate but essentially similar 
> > to branch-3 make this any better?
> 
>       Yes: it's been the process for over a decade now.  Unless there is some 
> outreach done, it is almost a guarantee that someone will commit something to 
> trunk they shouldn't because they simply won't know (or care?) the process 
> has changed.  

This is no different than any other type of bug.  If someone commits
something that is buggy, we should revert it.  If there are too many of
these issues, then we need more review, more testing, or both.

> 
> > > Longer term:  what is the PMC doing to make sure we start doing major 
> > > releases in a timely fashion again?  In other words, is this really an 
> > > issue if we shoot for another major in (throws dart) 2 years?
> > 
> > If we're trying to do semantic versioning
> 
>       FWIW: Hadoop has *never* done semantic versioning. A large percentage 
> of our minors should really have been majors. 
> 
> > then we shouldn't have a regular cadence for major releases unless we have 
> > a regular cadence of changes that break compatibility.  
> 
>       But given that we don't follow semantic versioning....

In case someone new to the community is reading this thread, Hadoop does
have compatibility guidelines for major and minor releases, and has had
them for a very long time. 
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
Of course, there have been bugs in the past, and they have been
frustrating.

> 
> > I'd hope that's not something we would strive towards.  I do agree that we 
> > should try to be better about shipping releases, major or minor, in a more 
> > timely manner, but I don't agree that we should cut 4.0 simply based on a 
> > duration since the last major release.
> 
>       ... the only thing we're really left with is (technically) time, either 
> in the form of a volunteer saying "hey, I've got time to cut a release" or 
> "my employer has a corporate goal based upon a feature in this release".   I 
> would *love* for the PMC to define a policy or guidelines that says the 
> community should strive for a major after x  incompatible changes, a minor 
> after y changes, a micro after z fixes.  Even if it doesn't have any teeth, 
> it would at least give people hope that their contributions won't be lost in 
> the dustbin of history and may actually push others to work on getting a 
> release out.  (Hadoop has people made committers based upon features that 
> have never gotten into a stable release.  Needless to say, most of those 
> people no longer contribute actively if at all.)

I agree that Hadoop should have a more regular release cadence.  I think
the proposal outlined by Andrew will help with this.  We should have
fewer branches, so that we don't have to agonize over why a change is in
trunk, but not 3.x.  We should avoid making incompatible changes unless
there is a clear reason to do so-- especially to user-visible APIs.  We
need to push back more on last-minute feature merges when a release is
about to be made.

> 
>       No one really has any idea of when releases happen, we have situations 
> like we see with fsck:  a completely untenable amount of options for things 
> that shouldn't even be options.  It's incredibly user unfriendly and a great 
> example of why Hadoop comes off as hostile to its own users.  But because no 
> one really knows when the next incompat release is going to happen, we have 
> all of this code contortion going on.
> 
>       It's also terrible to see projects like the map reduce native code sit 
> in trunk for years and go from extremely useful to nearly irrelevant without 
> ever seeing the light of day.  (and there are plenty more examples in 3.x). 

The good news is, Allen, you can help with these backports.  For
example, the mapreduce native code can be backported without any
compatibility issues, since it is a new component.  It just needs a
committer to shepard it through-- like you.

best,
Colin

> 
>       We need to do better.
> 
> Sidenote:
> 
>       It's probably worth mentioning that despite having lots of big moneyed 
> companies involved, no one appears to be paying anyone dedicated to work on 
> quality or release management like they did in the past. That's had a huge 
> impact on the open source community and in particular the release cadence.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] Branches and versions for Hadoop 3

Reply via email to