Re: Looking to a Hadoop 3 release

Allen Wittenauer Tue, 03 Mar 2015 11:19:50 -0800

Between:

        * removing -finalize
        * breaking HDFS browsing
        * changing du’s output (in the 2.7 branch)
        * changing various names of metrics (either intentionally or otherwise)
        * changing the JDK release


        … and probably lots of other stuff in branch-2 I haven’t seen/know 
about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

        At least this way we as caretakers don’t come across as hypocrits.  
It’s pretty clear the direction has shown we only care about API compatibility 
and the rest is ignored when it isn’t “convenient”.  [The next time someone 
tells you that Hadoop is hard to operate, I want you think about this email.]  
(1)

        Making 2.7 build with JDK7 led to the *exact* situation I figured it 
would:  now we have a precedent where we just say to the community “You know 
those guarantees?  Yeah, you might as well ignore them because we’re going to 
change the core component any damn time we feel like it.”

        We haven’t made a release branch off of trunk since branch-0.23.  If 
anyone thinks that’s healthy, there is some beach property in Alberta you might 
be interested in as well. Our release cycle came to a screeching halt after 
0.20 and we’ve never recovered.

        However, I offer an alternative.

        This same circular argument comes up all the time: (2)

        * There aren’t enough changes in trunk to make a new branch. 
        * We can’t upgrade/change component X because there is no plan to make 
a new major release.

        To quote Frozen:  Let It Go

        We’re probably at the point where there aren’t likely to be very many 
more earth shattering changes to the Hadoop code base.  The community has 
decided instead to push these types of changes as separate projects via 
incubator to avoid the committer paralysis that this community suffers.  

        Because of this, I don’t think the “enough changes” argument works 
anymore.  Instead, we need to pick a new metric to build a cadence to force 
regular updates.  I’d offer that the “every two years” JDK EOL sets the perfect 
cadence, matched by many other enterprise and OSS software, and gives us an 
opportunity to reflect in the version number that the critical component of our 
software has changed.

        This cadence allows for people to plan appropriately and know what our 
roadmap and direction actually is.  Folks are more likely to build “real” 
solutions rather than make compromises that suffer in quality in the name of 
compatibility simply because they don’t know when their work will actually show 
up. We’ll have a normal, regular opportunity to update dependencies (regardless 
of the state of HADOOP-11656).

        Now, if you’ll excuse me, I have more contributor's patches to go 
through.

(1) FWIW, I made the decision not to worry about backward compatibility in the 
shell code rewrite when I made the realization that the jsvc log and pid file 
names were poorly chosen to allow for certain capabilities.  Did anyone 
actually touch them from outside the software? Probably not.  But it is still 
effectively an interface, so off to trunk it went. 

(2) … and that’s before we even get to the “Version numbers are cheap” 
arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

Reply via email to