Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Colin P. McCabe Tue, 10 Mar 2015 11:57:45 -0700

Er, that should read "as Allen commented"  C.


On Tue, Mar 10, 2015 at 11:55 AM, Colin P. McCabe <cmcc...@apache.org> wrote:
> Hi Arun,
>
> Not all changes which are incompatible can be "fixed"-- sometimes an
> incompatibility is a necessary part of a change.  For example, taking
> a really old library dependency with known security issues off the
> CLASSPATH will create incompatibilities, but it's also necessary.  A
> minimum JDK version bump also falls in that category.  There are also
> cases where we need to drop support for really obsolete and baroque
> features from the past.  For example, it would be nice if we could
> finally get rid of the code to read pre-transactional edit logs.  It's
> a substantial amount of code.  We could argue that we should just
> support legacy stuff forever, but code quality will suffer.
>
> These changes need to be made sooner or later, and a major version
> bump is an ideal place to make them.  I think that making these
> changes in a 2.x release is hostile to operators, as Alan commented.
> That's what we're trying to avoid by discussing Hadoop 3.x.
>
> Colin
>
> On Mon, Mar 9, 2015 at 3:54 PM, Arun Murthy <a...@hortonworks.com> wrote:
>> Colin,
>>
>>  Do you have a list of incompatible changes other than the shell-script 
>> rewrite? If we do have others we'd have to fix them anyway for the current 
>> plan on hadoop-3.x right? So, I don't see the difference?
>>
>> Arun
>>
>> ________________________________________
>> From: Colin P. McCabe <cmcc...@apache.org>
>> Sent: Monday, March 09, 2015 3:05 PM
>> To: hdfs-...@hadoop.apache.org
>> Cc: mapreduce-...@hadoop.apache.org; common-dev@hadoop.apache.org; 
>> yarn-...@hadoop.apache.org
>> Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
>>
>> Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
>> to plan a new Hadoop release against a version of Java that is almost
>> obsolete and (soon) no longer receiving security updates.  I think
>> people will be willing to roll out a new version of Java for Hadoop
>> 3.x.
>>
>> Similarly, the whole point of bumping the major version number is the
>> ability to make incompatible changes.  There are already a bunch of
>> incompatible changes in the trunk branch.  Are you proposing to revert
>> those?  Or push them into newly created feature branches?  This
>> doesn't seem like a good idea to me.
>>
>> I would be in favor of backporting targetted incompatible changes from
>> trunk to branch-2.  For example, we could consider pulling in Allen's
>> shell script rewrite.  But pulling in all of trunk seems like a bad
>> idea at this point, if we want a 2.x release.
>>
>> best,
>> Colin
>>
>> On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <ste...@hortonworks.com> 
>> wrote:
>>>
>>> If 3.x is going to be Java 8 & not backwards compatible, I don't expect 
>>> anyone wanting to use this in production until some time deep into 2016.
>>>
>>> Issue: JDK 8 vs 7
>>>
>>> It will require Hadoop clusters to move up to Java 8. While there's dev 
>>> pull for this, there's ops pull against this: people are still in the 
>>> moving-off Java 6 phase due to that "it's working, don't update it" 
>>> philosophy. Java 8 is compelling to us coders, but that doesn't mean ops 
>>> want it.
>>>
>>> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the 
>>> main thing is setting up JAVA_HOME. That's something we could make easier 
>>> somehow (maybe some min Java version field in resource requests that will 
>>> let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it 
>>> could fail-fast if a Java version wasn't available.
>>>
>>> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 
>>> code. Downstream code ca do that (Hive, etc); they just need to accept that 
>>> they don't get to play on JDK7 clusters if they embrace l-expressions.
>>>
>>> So...we need to stay on java 7 for some time due to ops pull; downstream 
>>> apps get to choose what they want. We can/could enhance YARN to make JVM 
>>> choice more declarative.
>>>
>>> Issue: Incompatible changes
>>>
>>> Without knowing what is proposed for "an incompatible classpath change", I 
>>> can't say whether this is something that could be made optional. If it 
>>> isn't, then it is a python-3 class option, "rewrite your code" event, which 
>>> is going to be particularly traumatic to things like Hive that already do 
>>> complex CP games. I'm currently against any mandatory change here, though 
>>> would love to see an optional one. And if optional, it ceases to become an 
>>> incompatible change...
>>>
>>> Issue: Getting trunk out the door
>>>
>>> The main diff from branch-2 and trunk is currently the bash script changes. 
>>> These don't break client apps. May or may not break bigtop & other 
>>> downstream hadoop stacks, but developers don't need to worry about this:  
>>> no recompilation necessary
>>>
>>> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>>>
>>> It seems to me that I could go
>>>
>>> git checkout trunk
>>>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>>>
>>> We'd then have a version of Hadoop-trunk we could ship later this year, 
>>> compatible at the JDK and API level with the existing java code & JDK7+ 
>>> clusters.
>>>
>>> A classpath fix that is optional/compatible can then go out on the 2.x 
>>> line, saving the 3.x tag for something that really breaks things, forces 
>>> all downstream apps to set up new hadoop profiles, have separate modules & 
>>> generally hate the hadoop dev team
>>>
>>> This lets us tick off the "recent trunk release" and "fixed shell scripts" 
>>> items, pushing out those benefits to people sooner rather than later, and 
>>> puts off the "Hello, we've just broken your code" event for another 12+ 
>>> months.
>>>
>>> Comments?
>>>
>>> -Steve
>>>
>>>
>>>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Reply via email to