Re: Looking to a Hadoop 3 release

2015-03-05 Thread Alejandro Abdelnur
IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.


On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 The 'resistance' is not so much about  a new major release, more so about
 the content and the roadmap of the release. Other than the two specific
 features raised (the need for breaking compat for them is something that I
 am debating), I haven't seen a roadmap of branch-3 about any more features
 that this community needs to discuss about. If all the difference between
 branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
 is a big problem in two dimensions (1) it's a burden keeping the branches
 in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
 branch-0.23, branch-2 and (2) very hard to ask people to not break more
 things in branch-3.

 We seem to have agreed upon a course of action for JDK7. And now we are
 taking a different direction for JDK8. Going by this new proposal, come
 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
 releases.

 Regarding, individual improvements like classpath isolation, shell script
 stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
 possible for every major feature that we develop to be a opt in, unless the
 change is so great and users can balance out the incompatibilities for the
 new stuff they are getting. Even with an ground breaking change like with
 YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
 has paid so many times over in return. Breaking compatibility shouldn't
 come across as too cheap a thing.

 Thanks,
 +Vinod

 On Mar 4, 2015, at 10:15 AM, Andrew Wang andrew.w...@cloudera.commailto:
 andrew.w...@cloudera.com wrote:

 Where does this resistance to a new major release stem from? As I've
 described from the beginning, this will look basically like a 2.x release,
 except for the inclusion of classpath isolation by default and target
 version JDK8. I've expressed my desire to maintain API and wire
 compatibility, and we can audit the set of incompatible changes in trunk to
 ensure this. My proposal for doing alpha and beta releases leading up to GA
 also gives downstreams a nice amount of time for testing and validation.




Re: Looking to a Hadoop 3 release

2015-03-05 Thread Vinod Kumar Vavilapalli

Moving to JDK8 involves a lot of things
 (1) Get Hadoop apps to be able to run on JDK8 and chose JDK8 language 
features. This is already possible with the decoupling of apps from the 
platform.
 (2) Get the platform to run on JDK8. This can be done so that we can run 
Hadoop on both JDK8 and JDK7 without any compatibility issues. This in itself 
is a huge move, what with potential GC behavior changes, native library compat 
etc.
 (3) Get the platform to use JDK8 language features. As much as I love the new 
stuff in JDK8, I'm willing to postpone usage of the language features in the 
platform till the time when JDK8 is already in full force.

So, how about we do (1) + (2) for now, get JDK8 going and then come around to 
make the decision of dropping support for JDK7? This is no different from what 
we did for the adoption of JDK7. For a bit of time (2/3 releases?), we were 
able to run on both JDK6 and JDK7 and we are phasing out JDK6 only when most of 
the community stopped using it.

Thanks,
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Given that we already agreed to put in JDK7 in 2.7, and that the
 classpath is a fairly minor irritant given some existing solutions (e.g. a
 new default classloader), how do you quantify the benefit for users?
 
 I looked at our thread on this topic from last time, and we (meaning at
 least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
 2.x for practical reasons. We waited for so long that we had some assurance
 JDK6 was on the outs. Multiple distros also already had bumped their min
 version to JDK7. This is not true this time around. Bumping the JDK version
 is hugely impactful on the end user, and my email on the earlier thread
 still reflects my thoughts on JDK compatibility:
 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E
 
 .

 Right now, the incompatible changes would be JDK8, classpath isolation, and
 whatever is already in trunk. I can audit these existing trunk changes when
 branch-3 is cut.



Re: Looking to a Hadoop 3 release

2015-03-05 Thread Vinod Kumar Vavilapalli
The 'resistance' is not so much about  a new major release, more so about the 
content and the roadmap of the release. Other than the two specific features 
raised (the need for breaking compat for them is something that I am debating), 
I haven't seen a roadmap of branch-3 about any more features that this 
community needs to discuss about. If all the difference between branch-2 and 
branch-3 is going to be JDK + a couple of incompat changes, it is a big problem 
in two dimensions (1) it's a burden keeping the branches in sync and avoiding 
the split-brain we experienced with 1.x, 2.x or worse branch-0.23, branch-2 and 
(2) very hard to ask people to not break more things in branch-3.

We seem to have agreed upon a course of action for JDK7. And now we are taking 
a different direction for JDK8. Going by this new proposal, come 2016, we will 
have to deal with JDK9 and 3 mainline incompatible hadoop releases.

Regarding, individual improvements like classpath isolation, shell script 
stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be possible 
for every major feature that we develop to be a opt in, unless the change is so 
great and users can balance out the incompatibilities for the new stuff they 
are getting. Even with an ground breaking change like with YARN, we spent a bit 
of time to ensure compatibility (MAPREDUCE-5108) that has paid so many times 
over in return. Breaking compatibility shouldn't come across as too cheap a 
thing.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang 
andrew.w...@cloudera.commailto:andrew.w...@cloudera.com wrote:

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.



Re: 2.7 status

2015-03-05 Thread Vinod Kumar Vavilapalli

The 2.7 blocker JIRA went down and going back up again, we will need to 
converge.

Unless I see objections, I plan to cut a branch this weekend and selectively 
filter stuff in after that in the interest of convergence.

Thoughts welcome!

Thanks,
+Vinod

On Mar 1, 2015, at 11:58 AM, Arun Murthy a...@hortonworks.com wrote:

 Sounds good, thanks for the help Vinod!
 
 Arun
 
 
 From: Vinod Kumar Vavilapalli
 Sent: Sunday, March 01, 2015 11:43 AM
 To: Hadoop Common; Jason Lowe; Arun Murthy
 Subject: Re: 2.7 status
 
 Agreed. How about we roll an RC end of this week? As a Java 7+ release with 
 features, patches that already got in?
 
 Here's a filter tracking blocker tickets - 
 https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.
 
 +Arun
 Arun, I'd like to help get 2.7 out without further delay. Do you mind me 
 taking over release duties?
 
 Thanks,
 +Vinod
 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-...@hadoop.apache.org
 Subject: Re: 2.7 status
 
 I'd like to see a 2.7 release sooner than later.  It has been almost 3 months 
 since Hadoop 2.6 was released, and there have already been 634 JIRAs 
 committed to 2.7.  That's a lot of changes waiting for an official release.
 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2Chdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolution%3DFixed
 Jason
 
  From: Sangjin Lee sj...@apache.org
 To: common-...@hadoop.apache.org common-...@hadoop.apache.org
 Sent: Tuesday, February 10, 2015 1:30 PM
 Subject: 2.7 status
 
 Folks,
 
 What is the current status of the 2.7 release? I know initially it started
 out as a java-7 only release, but looking at the JIRAs that is very much
 not the case.
 
 Do we have a certain timeframe for 2.7 or is it time to discuss it?
 
 Thanks,
 Sangjin
 



Re: Looking to a Hadoop 3 release

2015-03-05 Thread Jason Lowe
I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
maintaining yet another release line and conscious of the incompatibilities 
going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet 
another line onto which to cherry-pick, and I don't see why we need to add this 
overhead at such an early phase.  We should only create branch-3 when there's 
an incompatible change that the community wants and it should _not_ go into the 
next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and 
betas on trunk and release from trunk in the interim.  IMHO we need to stop 
treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of 
breaking compatibility against the costs of migrating.  Each time we break 
compatibility we create a hurdle for people to jump when they move to the new 
release, and we should make those hurdles worth their time.  For example, 
wire-compatibility has been mentioned as part of this.  Any feature that breaks 
wire compatibility better be absolutely amazing, as it creates a huge hurdle 
for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in 
Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next 
incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 
3.0
Jason
  From: Andrew Wang andrew.w...@cloudera.com
 To: hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org 
Cc: common-...@hadoop.apache.org common-...@hadoop.apache.org; 
mapreduce-dev@hadoop.apache.org mapreduce-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org 
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release
   
Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew



On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote:

 Awesome, looks like we can just do this in a compatible manner - nothing
 else on the list seems like it warrants a (premature) major release.

 Thanks Vinod.

 Arun

 
 From: Vinod Kumar Vavilapalli vino...@hortonworks.com
 Sent: Tuesday, March 03, 2015 2:30 PM
 To: common-...@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
 yarn-...@hadoop.apache.org
 Subject: Re: Looking to a Hadoop 3 release

 I started pitching in more on that JIRA.

 To add, I think we can and should strive for doing this in a compatible
 manner, whatever the approach. Marking and calling it incompatible before
 we see proposal/patch seems premature to me. Commented the same on JIRA:
 https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
 .

 Thanks
 +Vinod

 On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto:
 andrew.w...@cloudera.com wrote:

 Regarding classpath isolation, based on what I hear from our customers,
 it's still a big problem (even after the MR classloader work). The latest
 Jackson version bump was quite painful for our downstream projects, and the
 HDFS client still leaks a lot 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Steve Loughran


On 05/03/2015 13:05, Alejandro Abdelnur 
tuc...@gmail.commailto:tuc...@gmail.com wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha  2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work against a 3.y Hadoop release, for all x, y in Natural  where y=x  and 
is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on this 
roadmap is classpath isolation. But: it's only one of the features, and given 
there's not any design doc on that JIRA, way too immature to set a release 
schedule on. An alpha schedule with no-guarantees and a regular alpha roll, 
could be viable, as new features go in and can then be used to experimentally 
try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course 
instability guarantees will transitive


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that backport to branch-2 isn't going to happen.

-Steve


Re: Looking to a Hadoop 3 release

2015-03-05 Thread Steve Loughran
Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, Steve Loughran 
ste...@hortonworks.commailto:ste...@hortonworks.com wrote:



On 05/03/2015 13:05, Alejandro Abdelnur 
tuc...@gmail.commailto:tuc...@gmail.commailto:tuc...@gmail.com wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha  2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's instability guarantee for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work in and against a 3.y Hadoop cluster, for all x, y in Natural  where y=x  
and is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's 
only one of the features, and given there's not any design doc on that JIRA, 
way too immature to set a release schedule on. An alpha schedule with 
no-guarantees and a regular alpha roll, could be viable, as new features go in 
and can then be used to experimentally try this stuff in branches of Hbase 
(well volunteered, Stack!), etc. Of course instability guarantees will be 
transitive downstream.


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that backport to branch-2 isn't going to happen.

-Steve



Re: Looking to a Hadoop 3 release

2015-03-05 Thread Siddharth Seth
I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
wrote:

 Sorry, outlook dequoted Alejandros's comments.

 Let me try again with his comments in italic and proofreading of mine

 On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
 ste...@hortonworks.com wrote:



 On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
 tuc...@gmail.commailto:tuc...@gmail.com wrote:

 IMO, if part of the community wants to take on the responsibility and work
 that takes to do a new major release, we should not discourage them from
 doing that.

 Having multiple major branches active is a standard practice.

 Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
 long time to get out, and during that time 0.21, 0.22, got released and
 ignored; 0.23 picked up and used in production.

 The 2.04-alpha release was more of a troublespot as it got picked up
 widely enough to be used in products, and changes were made between that
 alpha  2.2 itself which raised compatibility issues.

 For 3.x I'd propose


   1.  Have less longevity of 3.x alpha/beta artifacts
   2.  Make clear there are no guarantees of compatibility from alpha/beta
 releases to shipping. Best effort, but not to the extent that it gets in
 the way. More succinctly: we will care more about seamless migration from
 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
 accept policy (2). Hadoop's instability guarantee for the 3.x alpha/beta
 phase

 As well as backwards compatibility, we need to think about Forwards
 compatibility, with the goal being:

 Any app written/shipped with the 3.x release binaries (JAR and native)
 will work in and against a 3.y Hadoop cluster, for all x, y in Natural
 where y=x  and is-release(x) and is-release(y)

 That's important, as it means all server-side changes in 3.x which are
 expected to to mandate client-side updates: protocols, HDFS erasure
 decoding, security features, must be considered complete and stable before
 we can say is-release(x). In an ideal world, we'll even get the semantics
 right with tests to show this.

 Fixing classpath hell downstream is certainly one feature I am +1 on. But:
 it's only one of the features, and given there's not any design doc on that
 JIRA, way too immature to set a release schedule on. An alpha schedule with
 no-guarantees and a regular alpha roll, could be viable, as new features go
 in and can then be used to experimentally try this stuff in branches of
 Hbase (well volunteered, Stack!), etc. Of course instability guarantees
 will be transitive downstream.


 This time around we are not replacing the guts as we did from Hadoop 1 to
 Hadoop 2, but superficial surgery to address issues were not considered (or
 was too much to take on top of the guts transplant).

 For the split brain concern, we did a great of job maintaining Hadoop 1 and
 Hadoop 2 until Hadoop 1 faded away.

 And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
 compatibility.


 Based on that experience I would say that the coexistence of Hadoop 2 and
 Hadoop 3 will be much less demanding/traumatic.

 The re-layout of all the source trees was a major change there, assuming
 there's no refactoring or switch of build tools then picking things back
 will be tractable


 Also, to facilitate the coexistence we should limit Java language features
 to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
 we can remove this limitation.

 +1; setting javac.version will fix this

 What is nice about having java 8 as the base JVM is that it means you can
 be confident that all Hadoop 3 servers will be 

Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-05 Thread Vinod Kumar Vavilapalli
Tx for the feedback! Let's continue on JIRA, but I'd definitely welcome as much 
help as is available.

Thanks,
+Vinod

On Mar 4, 2015, at 3:30 PM, Zhijie Shen zs...@hortonworks.com wrote:

 +1. It¹s really helpful for branch development. To continue Karthik¹s
 point, is it good make pre-commit testing against branch-2 as the default
 too like that against trunk?
 
 On 3/4/15, 1:47 PM, Sean Busbey bus...@cloudera.com wrote:
 
 +1
 
 If we can make things look like HBase support for precommit testing on
 branches (HBASE-12944), that would make it easier for new and occasional
 contributors who might end up working in other ecosystem projects. AFAICT,
 Jonathan's proposal for branch names in patch names does this.
 
 
 
 On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla ka...@cloudera.com
 wrote:
 
 Thanks for reviving this on email, Vinod. Newer folks like me might not
 be
 aware of this JIRA/effort.
 
 This would be wonderful to have so (1) we know the status of release
 branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
 Jonathan's or Matt's proposal for including branch name looks
 reasonable to
 me.
 
 If none has any objections, I think we can continue on JIRA and get this
 in.
 
 On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 
 Hi all,
 
 I'd like us to revive the effort at
 https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
 builds being able to work with branches. Having the Jenkins verify
 patches
 on branches is very useful even if there may be relaxed review
 oversight
 on
 the said-branch.
 
 Unless there are objections, I'd request help from Giri who already
 has a
 patch sitting there for more than a year before. This may need us to
 collectively agree on some convention - the last comment says that the
 branch patch name should be in some format for this to work.
 
 Thanks,
 +Vinod
 
 
 
 
 --
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es
 
 
 
 
 -- 
 Sean
 



Re: Looking to a Hadoop 3 release

2015-03-05 Thread Alejandro Abdelnur
If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer a...@altiscale.com wrote:


 Is there going to be a general upgrade of dependencies?  I'm thinking of
 jetty  jackson in particular.

 On Mar 5, 2015, at 5:24 PM, Andrew Wang andrew.w...@cloudera.com wrote:

  I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
  page. In addition to the two things I've been pushing, I also looked
  through Allen's list (thanks Allen for making this) and picked out the
  shell script rewrite and the removal of HFTP as big changes. This would
 be
  the place to propose features for inclusion in 3.x, I'd particularly
  appreciate help on the YARN/MR side.
 
  Based on what I'm hearing, let me modulate my proposal to the following:
 
  - We avoid cutting branch-3, and release off of trunk. The trunk-only
  changes don't look that scary, so I think this is fine. This does mean we
  need to be more rigorous before merging branches to trunk. I think
  Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
 would
  be very helpful in this regard.
  - We do not include anything to break wire compatibility unless (as Jason
  says) it's an unbelievably awesome feature.
  - No harm in rolling alphas from trunk, as it doesn't lock us to anything
  compatibility wise. Downstreams like releases.
 
  I'll take Steve's advice about not locking GA to a given date, but I also
  share his belief that we can alpha/beta/GA faster than it took for Hadoop
  2. Let's roll some intermediate releases, work on the roadmap items, and
  see how we're feeling in a few months.
 
  Best,
  Andrew
 
  On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  I think it'll be useful to have a discussion about what else people
 would
  like to see in Hadoop 3.x - especially if the change is potentially
  incompatible. Also, what we expect the release schedule to be for major
  releases and what triggers them - JVM version, major features, the need
 for
  incompatible changes ? Assuming major versions will not be released
 every 6
  months/1 year (adoption time, fairly disruptive for downstream projects,
  and users) -  considering additional features/incompatible changes for
 3.x
  would be useful.
 
  Some features that come to mind immediately would be
  1) enhancements to the RPC mechanics - specifically support for AsynRPC
 /
  two way communication. There's a lot of places where we re-use
 heartbeats
  to send more information than what would be done if the PRC layer
 supported
  these features. Some of this can be done in a compatible manner to the
  existing RPC sub-system. Others like 2 way communication probably
 cannot.
  After this, having HDFS/YARN actually make use of these changes. The
 other
  consideration is adoption of an alternate system ike gRpc which would be
  incompatible.
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
 
  Thanks
  - Sid
 
 
  On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
  wrote:
 
  Sorry, outlook dequoted Alejandros's comments.
 
  Let me try again with his comments in italic and proofreading of mine
 
  On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
  ste...@hortonworks.com wrote:
 
 
 
  On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
  tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
  IMO, if part of the community wants to take on the responsibility and
  work
  that takes to do a new major release, we should not discourage them
 from
  doing that.
 
  Having multiple major branches active is a standard practice.
 
  Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
  long time to get out, and during that time 0.21, 0.22, got released and
  ignored; 0.23 picked up and used in production.
 
  The 2.04-alpha release was more of a troublespot as it got picked up
  widely enough to be used in products, and changes were made between
 that
  alpha  2.2 itself which raised compatibility issues.
 
  For 3.x I'd propose
 
 
   1.  Have less longevity of 3.x alpha/beta artifacts
   2.  Make clear there are no guarantees of compatibility from
 alpha/beta
  releases to shipping. Best effort, but not to the extent that it gets
 in
  the way. More succinctly: we will care more about seamless migration
 from
  2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
  accept policy (2). Hadoop's instability guarantee for the 3.x
  alpha/beta
  phase
 
  As well as backwards compatibility, we need to think about Forwards
  compatibility, with the goal being:
 
  Any app written/shipped with the 3.x release binaries (JAR and native)
  will work in and 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Yongjun Zhang
Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer a...@altiscale.com wrote:


 One of the questions that keeps popping up is “what exactly is in trunk?”

 As some may recall, I had done some experiments creating the change log
 based upon JIRA.  While the interest level appeared to be approaching zero,
 I kept playing with it a bit and eventually also started playing with the
 release notes script (for various reasons I won’t bore you with.)

 In any case, I’ve started posting the results of these runs on one of my
 github repos if anyone was wanting a quick reference as to JIRA’s opinion
 on the matter:

 https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0





Re: Looking to a Hadoop 3 release

2015-03-05 Thread Chris Douglas
On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
shv.had...@gmail.com wrote:
 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
 manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
 other versions. If that somehow beneficial for commercial vendors, which I
 don't see how, for the community it was proven to be very disruptive. Would
 be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk 
branch-2  branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

 3. Could we release Hadoop 3 directly from trunk? With a proper feature
 freeze in advance. Current trunk is in the best working condition I've seen
 in years - much better, than when hadoop-2 was coming to life. It could
 make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features in
place on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

 I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



Hadoop-Mapreduce-trunk-Java8 - Build # 123 - Still Failing

2015-03-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/123/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 10611 lines...]
Tests run: 521, Failures: 2, Errors: 0, Skipped: 11

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] hadoop-mapreduce-client ... SUCCESS [  2.031 s]
[INFO] hadoop-mapreduce-client-core .. SUCCESS [01:13 min]
[INFO] hadoop-mapreduce-client-common  SUCCESS [ 24.528 s]
[INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [  4.009 s]
[INFO] hadoop-mapreduce-client-app ... SUCCESS [10:15 min]
[INFO] hadoop-mapreduce-client-hs  SUCCESS [06:06 min]
[INFO] hadoop-mapreduce-client-jobclient . FAILURE [  01:58 h]
[INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
[INFO] hadoop-mapreduce-client-nativetask  SKIPPED
[INFO] Apache Hadoop MapReduce Examples .. SKIPPED
[INFO] hadoop-mapreduce .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:16 h
[INFO] Finished at: 2015-03-05T15:24:38+00:00
[INFO] Final Memory: 34M/167M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :hadoop-mapreduce-client-jobclient
Build step 'Execute shell' marked build as failure
[FINDBUGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Sending artifact delta relative to Hadoop-Mapreduce-trunk-Java8 #28
Archived 1 artifacts
Archive block size is 32768
Received 0 blocks and 20322434 bytes
Compression is 0.0%
Took 5 min 39 sec
Recording test results
Updating YARN-3242
Updating HDFS-7434
Updating HDFS-7879
Updating HADOOP-11648
Updating MAPREDUCE-6267
Updating YARN-3249
Updating MAPREDUCE-6136
Updating HADOOP-11674
Updating HDFS-7746
Updating HDFS-7535
Updating YARN-3131
Updating YARN-3122
Updating YARN-3231
Updating HDFS-1522
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles

Error Message:
expected:2 but was:1

Stack Trace:
junit.framework.AssertionFailedError: expected:2 but was:1
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.failNotEquals(Assert.java:329)
at junit.framework.Assert.assertEquals(Assert.java:78)
at junit.framework.Assert.assertEquals(Assert.java:234)
at junit.framework.Assert.assertEquals(Assert.java:241)
at junit.framework.TestCase.assertEquals(TestCase.java:409)
at 
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)


FAILED:  
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement

Error Message:
expected:2 but was:1

Stack Trace:
junit.framework.AssertionFailedError: expected:2 but was:1
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.failNotEquals(Assert.java:329)
at junit.framework.Assert.assertEquals(Assert.java:78)
at junit.framework.Assert.assertEquals(Assert.java:234)
at junit.framework.Assert.assertEquals(Assert.java:241)
at junit.framework.TestCase.assertEquals(TestCase.java:409)
at 
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)




Hadoop-Mapreduce-trunk - Build # 2073 - Failure

2015-03-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2073/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 31198 lines...]
Tests run: 520, Failures: 0, Errors: 1, Skipped: 11

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] hadoop-mapreduce-client ... SUCCESS [  2.659 s]
[INFO] hadoop-mapreduce-client-core .. SUCCESS [01:32 min]
[INFO] hadoop-mapreduce-client-common  SUCCESS [ 27.780 s]
[INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [  4.589 s]
[INFO] hadoop-mapreduce-client-app ... SUCCESS [11:21 min]
[INFO] hadoop-mapreduce-client-hs  SUCCESS [05:37 min]
[INFO] hadoop-mapreduce-client-jobclient . FAILURE [  01:58 h]
[INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
[INFO] hadoop-mapreduce-client-nativetask  SKIPPED
[INFO] Apache Hadoop MapReduce Examples .. SKIPPED
[INFO] hadoop-mapreduce .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:17 h
[INFO] Finished at: 2015-03-05T15:44:04+00:00
[INFO] Final Memory: 34M/760M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :hadoop-mapreduce-client-jobclient
Build step 'Execute shell' marked build as failure
[FINDBUGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Sending artifact delta relative to Hadoop-Mapreduce-trunk #2072
Archived 1 artifacts
Archive block size is 32768
Received 0 blocks and 20312860 bytes
Compression is 0.0%
Took 5 min 11 sec
Recording test results
Updating YARN-3242
Updating HDFS-7434
Updating HDFS-7879
Updating HADOOP-11648
Updating MAPREDUCE-6267
Updating YARN-3249
Updating MAPREDUCE-6136
Updating HADOOP-11674
Updating HDFS-7746
Updating HDFS-7535
Updating YARN-3131
Updating YARN-3122
Updating YARN-3231
Updating HDFS-1522
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
REGRESSION:  
org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMapreduceJobTimelineServiceEnabled

Error Message:
Job didn't finish in 30 seconds

Stack Trace:
java.io.IOException: Job didn't finish in 30 seconds
at 
org.apache.hadoop.mapred.UtilsForTests.runJobSucceed(UtilsForTests.java:622)
at 
org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMapreduceJobTimelineServiceEnabled(TestMRTimelineEventHandling.java:155)




Re: Looking to a Hadoop 3 release

2015-03-05 Thread Andrew Wang
I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote:

 I think it'll be useful to have a discussion about what else people would
 like to see in Hadoop 3.x - especially if the change is potentially
 incompatible. Also, what we expect the release schedule to be for major
 releases and what triggers them - JVM version, major features, the need for
 incompatible changes ? Assuming major versions will not be released every 6
 months/1 year (adoption time, fairly disruptive for downstream projects,
 and users) -  considering additional features/incompatible changes for 3.x
 would be useful.

 Some features that come to mind immediately would be
 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
 two way communication. There's a lot of places where we re-use heartbeats
 to send more information than what would be done if the PRC layer supported
 these features. Some of this can be done in a compatible manner to the
 existing RPC sub-system. Others like 2 way communication probably cannot.
 After this, having HDFS/YARN actually make use of these changes. The other
 consideration is adoption of an alternate system ike gRpc which would be
 incompatible.
 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.

 Thanks
 - Sid


 On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  Sorry, outlook dequoted Alejandros's comments.
 
  Let me try again with his comments in italic and proofreading of mine
 
  On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
  ste...@hortonworks.com wrote:
 
 
 
  On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
  tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
  IMO, if part of the community wants to take on the responsibility and
 work
  that takes to do a new major release, we should not discourage them from
  doing that.
 
  Having multiple major branches active is a standard practice.
 
  Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
  long time to get out, and during that time 0.21, 0.22, got released and
  ignored; 0.23 picked up and used in production.
 
  The 2.04-alpha release was more of a troublespot as it got picked up
  widely enough to be used in products, and changes were made between that
  alpha  2.2 itself which raised compatibility issues.
 
  For 3.x I'd propose
 
 
1.  Have less longevity of 3.x alpha/beta artifacts
2.  Make clear there are no guarantees of compatibility from alpha/beta
  releases to shipping. Best effort, but not to the extent that it gets in
  the way. More succinctly: we will care more about seamless migration from
  2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
3.  Anybody who ships code based on 3.x alpha/beta to recognise and
  accept policy (2). Hadoop's instability guarantee for the 3.x
 alpha/beta
  phase
 
  As well as backwards compatibility, we need to think about Forwards
  compatibility, with the goal being:
 
  Any app written/shipped with the 3.x release binaries (JAR and native)
  will work in and against a 3.y Hadoop cluster, for all x, y in Natural
  where y=x  and is-release(x) and is-release(y)
 
  That's important, as it means all server-side changes in 3.x which are
  expected to to mandate client-side updates: protocols, HDFS erasure
  decoding, security features, must be considered complete and stable
 before
  we can say is-release(x). In an ideal world, we'll even get the semantics
  right with tests to show this.
 
  Fixing classpath hell downstream is certainly one feature I am +1 on.
 But:
  it's 

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Allen Wittenauer

Is there going to be a general upgrade of dependencies?  I'm thinking of jetty 
 jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang andrew.w...@cloudera.com wrote:

 I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
 page. In addition to the two things I've been pushing, I also looked
 through Allen's list (thanks Allen for making this) and picked out the
 shell script rewrite and the removal of HFTP as big changes. This would be
 the place to propose features for inclusion in 3.x, I'd particularly
 appreciate help on the YARN/MR side.
 
 Based on what I'm hearing, let me modulate my proposal to the following:
 
 - We avoid cutting branch-3, and release off of trunk. The trunk-only
 changes don't look that scary, so I think this is fine. This does mean we
 need to be more rigorous before merging branches to trunk. I think
 Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
 be very helpful in this regard.
 - We do not include anything to break wire compatibility unless (as Jason
 says) it's an unbelievably awesome feature.
 - No harm in rolling alphas from trunk, as it doesn't lock us to anything
 compatibility wise. Downstreams like releases.
 
 I'll take Steve's advice about not locking GA to a given date, but I also
 share his belief that we can alpha/beta/GA faster than it took for Hadoop
 2. Let's roll some intermediate releases, work on the roadmap items, and
 see how we're feeling in a few months.
 
 Best,
 Andrew
 
 On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
 I think it'll be useful to have a discussion about what else people would
 like to see in Hadoop 3.x - especially if the change is potentially
 incompatible. Also, what we expect the release schedule to be for major
 releases and what triggers them - JVM version, major features, the need for
 incompatible changes ? Assuming major versions will not be released every 6
 months/1 year (adoption time, fairly disruptive for downstream projects,
 and users) -  considering additional features/incompatible changes for 3.x
 would be useful.
 
 Some features that come to mind immediately would be
 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
 two way communication. There's a lot of places where we re-use heartbeats
 to send more information than what would be done if the PRC layer supported
 these features. Some of this can be done in a compatible manner to the
 existing RPC sub-system. Others like 2 way communication probably cannot.
 After this, having HDFS/YARN actually make use of these changes. The other
 consideration is adoption of an alternate system ike gRpc which would be
 incompatible.
 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 
 Thanks
 - Sid
 
 
 On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 Sorry, outlook dequoted Alejandros's comments.
 
 Let me try again with his comments in italic and proofreading of mine
 
 On 05/03/2015 13:59, Steve Loughran ste...@hortonworks.commailto:
 ste...@hortonworks.com wrote:
 
 
 
 On 05/03/2015 13:05, Alejandro Abdelnur tuc...@gmail.commailto:
 tuc...@gmail.commailto:tuc...@gmail.com wrote:
 
 IMO, if part of the community wants to take on the responsibility and
 work
 that takes to do a new major release, we should not discourage them from
 doing that.
 
 Having multiple major branches active is a standard practice.
 
 Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
 long time to get out, and during that time 0.21, 0.22, got released and
 ignored; 0.23 picked up and used in production.
 
 The 2.04-alpha release was more of a troublespot as it got picked up
 widely enough to be used in products, and changes were made between that
 alpha  2.2 itself which raised compatibility issues.
 
 For 3.x I'd propose
 
 
  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta
 releases to shipping. Best effort, but not to the extent that it gets in
 the way. More succinctly: we will care more about seamless migration from
 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
 accept policy (2). Hadoop's instability guarantee for the 3.x
 alpha/beta
 phase
 
 As well as backwards compatibility, we need to think about Forwards
 compatibility, with the goal being:
 
 Any app written/shipped with the 3.x release binaries (JAR and native)
 will work in and against a 3.y Hadoop cluster, for all x, y in Natural
 where y=x  and is-release(x) and is-release(y)
 
 That's important, as it means all server-side changes in 3.x which are
 expected to to mandate client-side updates: protocols, HDFS erasure
 decoding, security features, must be considered complete and stable
 before
 we can say