Re: [DISCUSS] Branches and versions for Hadoop 3

Allen Wittenauer Mon, 28 Aug 2017 09:59:20 -0700

> On Aug 25, 2017, at 1:23 PM, Jason Lowe <[email protected]> wrote:
> 
> Allen Wittenauer wrote:
>  
> > Doesn't this place an undue burden on the contributor with the first 
> > incompatible patch to prove worthiness?  What happens if it is decided that 
> > it's not good enough?
> 
> It is a burden for that first, "this can't go anywhere else but 4.x" change, 
> but arguably that should not be a change done lightly anyway.  (Or any other 
> backwards-incompatible change for that matter.)  If it's worth committing 
> then I think it's perfectly reasonable to send out the dev announce that 
> there's reason for trunk to diverge from 3.x, cut branch-3, and move on.  
> This is no different than Andrew's recent announcement that there's now a 
> need for separating trunk and the 3.0 line based on what's about to go in.


        So, by this definition as soon as a patch comes in to remove deprecated 
bits there will be no issue with a branch-3 getting created, correct?

>  Otherwise if past trunk behavior is any indication, it ends up mostly 
> enabling people to commit to just trunk, forgetting that the thing they are 
> committing is perfectly valid for branch-3. 

        I'm not sure there was any "forgetting" involved.  We likely wouldn't 
be talking about 3.x at all if it wasn't for the code diverging enough.

> > Given the number of committers that openly ignore discussions like this, 
> > who is going to verify that incompatible changes don't get in?
>  
> The same entities who are verifying other bugs don't get in, i.e.: the 
> committers and the Hadoop QA bot running the tests.
>  Yes, I know that means it's inevitable that compatibility breakages will 
> happen, and we can and should improve the automation around compatibility 
> testing when possible.

        The automation only goes so far.  At least while investigating Yetus 
bugs, I've seen more than enough blatant and purposeful ignored errors and 
warnings that I'm not convinced it will be effective. ("That javadoc compile 
failure didn't come from my patch!"  Um, yes, yes it did.) PR for features has 
greatly trumped code correctness for a few years now.

        In any case, specifically thinking of the folks that commit maybe one 
or two patches a year.  They generally don't pay attention to *any* of this 
stuff and it doesn't seem like many people are actually paying attention to 
what gets committed until it breaks their universe.

>  But I don't think there's a magic bullet for preventing all compatibility 
> bugs from being introduced, just like there isn't one for preventing general 
> bugs.  Does having a trunk branch separate but essentially similar to 
> branch-3 make this any better?

        Yes: it's been the process for over a decade now.  Unless there is some 
outreach done, it is almost a guarantee that someone will commit something to 
trunk they shouldn't because they simply won't know (or care?) the process has 
changed.  

> > Longer term:  what is the PMC doing to make sure we start doing major 
> > releases in a timely fashion again?  In other words, is this really an 
> > issue if we shoot for another major in (throws dart) 2 years?
> 
> If we're trying to do semantic versioning

        FWIW: Hadoop has *never* done semantic versioning. A large percentage 
of our minors should really have been majors. 

> then we shouldn't have a regular cadence for major releases unless we have a 
> regular cadence of changes that break compatibility.  

        But given that we don't follow semantic versioning....

> I'd hope that's not something we would strive towards.  I do agree that we 
> should try to be better about shipping releases, major or minor, in a more 
> timely manner, but I don't agree that we should cut 4.0 simply based on a 
> duration since the last major release.

        ... the only thing we're really left with is (technically) time, either 
in the form of a volunteer saying "hey, I've got time to cut a release" or "my 
employer has a corporate goal based upon a feature in this release".   I would 
*love* for the PMC to define a policy or guidelines that says the community 
should strive for a major after x  incompatible changes, a minor after y 
changes, a micro after z fixes.  Even if it doesn't have any teeth, it would at 
least give people hope that their contributions won't be lost in the dustbin of 
history and may actually push others to work on getting a release out.  (Hadoop 
has people made committers based upon features that have never gotten into a 
stable release.  Needless to say, most of those people no longer contribute 
actively if at all.)

        No one really has any idea of when releases happen, we have situations 
like we see with fsck:  a completely untenable amount of options for things 
that shouldn't even be options.  It's incredibly user unfriendly and a great 
example of why Hadoop comes off as hostile to its own users.  But because no 
one really knows when the next incompat release is going to happen, we have all 
of this code contortion going on.

        It's also terrible to see projects like the map reduce native code sit 
in trunk for years and go from extremely useful to nearly irrelevant without 
ever seeing the light of day.  (and there are plenty more examples in 3.x). 

        We need to do better.

Sidenote:

        It's probably worth mentioning that despite having lots of big moneyed 
companies involved, no one appears to be paying anyone dedicated to work on 
quality or release management like they did in the past. That's had a huge 
impact on the open source community and in particular the release cadence.  


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Branches and versions for Hadoop 3

Reply via email to