subject:"Looking to a Hadoop 3 release"

Re: Looking to a Hadoop 3 release

2016-06-27 Thread Andrew Wang

A heads up that I think we're getting close on the blockers for the first
alpha. Looking at my list, I see two I'd like to get in still: YARN-5270
and HADOOP-13316. Will cut a branch and roll the release once those go in;
my test builds have looked good thus far.

My original plan was to do alphas and then beta in Aug/Sep, but given how
the create-release and L&N changes delayed us by a few months, it also
pushes out the beta timeframe. Given that Nov/Dec is often a quiet period
of development, I think a realistic new beta date is sometime early next
year (Jan/Feb). FYI.

Thanks,
Andrew

On Thu, May 12, 2016 at 5:20 PM, Karthik Kambatla 
wrote:

> I am with Vinod on avoiding merging mostly_complete_branches to trunk since
> we are not shipping any release off it. If 3.x releases going off of trunk
> is going to help with this, I am fine with that approach. We should still
> make sure to keep trunk-incompat small and not include large features.
>
> On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglas 
> wrote:
>
> > If we're not starting branch-3/trunk, what would distinguish it from
> > trunk/trunk-incompat? Is it the same mechanism with different labels?
> >
> > That may be a reasonable strategy when we create branch-3, as a
> > release branch for beta. Releasing 3.x from trunk will help us figure
> > out which incompatibilities can be called out in an upgrade guide
> > (e.g., "new feature X is incompatible with uncommon configuration Y")
> > and which require code changes (e.g., "data loss upgrading a cluster
> > with feature X"). Given how long trunk has been unreleased, we need
> > more data from deployments to triage. How to manage transitions
> > between major versions will always be case-by-case; consensus on how
> > we'll address generic incompatible changes is not saving any work.
> >
> > Once created, removing functionality from branch-3 (leaving it in
> > trunk) _because_ nobody volunteers cycles to address urgent
> > compatibility issues is fair. It's also more workable than asking that
> > features be committed to a branch that we have no plan to release,
> > even as alpha. -C
> >
> > On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
> >  wrote:
> > > Tx for your replies, Andrew.
> > >
> > >>> For exit criteria, how about we time box it? My plan was to do
> monthly
> > >> alphas through the summer, leading up to beta in late August / early
> > Sep.
> > >> At that point we freeze and stabilize for GA in Nov/Dec.
> > >
> > >
> > > Time-boxing is a reasonable exit-criterion.
> > >
> > >
> > >> In this case, does trunk-incompat essentially become the new trunk? Or
> > are
> > >> we treating trunk-incompat as a feature branch, which periodically
> > merges
> > >> changes from trunk?
> > >
> > >
> > > It’s the later. Essentially
> > >  - trunk-incompat = trunk + only incompatible changes, periodically
> kept
> > up-to-date to trunk
> > >  - trunk is always ready to ship
> > >  - and no compatible code gets left behind
> > >
> > > The reason for my proposal like this is to address the tension between
> > “there is lot of compatible code in trunk that we are not shipping” and
> > “don’t ship trunk, it has incompatibilities”. With this, we will not have
> > (compatible) code not getting shipped to users.
> > >
> > > Obviously, we can forget about all of my proposal completely if
> everyone
> > puts in all compatible code into branch-2 / branch-3 or whatever the main
> > releasable branch is. This didn’t work in practice, have seen this not
> > happening prominently during 0.21, and now 3.x.
> > >
> > > There is another related issue - "my feature is nearly ready, so I’ll
> > just merge it into trunk as we don’t release that anyways, but not the
> > current releasable branch - I’m lazy to fix the last few stability
> related
> > issues”. With this, we will (should) get more disciplined, take feature
> > stability on a branch seriously and merge a feature branch only when it
> is
> > truly ready!
> > >
> > >> For 3.x, my strawman was to release off trunk for the alphas, then
> > branch a
> > >> branch-3 for the beta and onwards.
> > >
> > >
> > > Repeating above, I’m proposing continuing to make GA 3.x releases also
> > off of trunk! This way only incompatible changes don’t get shipped to
> users
> > - by design! Eventually, trunk-incompat will be latest 3.x GA + enough
> > incompatible code to warrant a 4.x, 5.x etc.
> > >
> > > +Vinod
> >
>

Re: Looking to a Hadoop 3 release

2016-05-12 Thread Karthik Kambatla

I am with Vinod on avoiding merging mostly_complete_branches to trunk since
we are not shipping any release off it. If 3.x releases going off of trunk
is going to help with this, I am fine with that approach. We should still
make sure to keep trunk-incompat small and not include large features.

On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglas  wrote:

> If we're not starting branch-3/trunk, what would distinguish it from
> trunk/trunk-incompat? Is it the same mechanism with different labels?
>
> That may be a reasonable strategy when we create branch-3, as a
> release branch for beta. Releasing 3.x from trunk will help us figure
> out which incompatibilities can be called out in an upgrade guide
> (e.g., "new feature X is incompatible with uncommon configuration Y")
> and which require code changes (e.g., "data loss upgrading a cluster
> with feature X"). Given how long trunk has been unreleased, we need
> more data from deployments to triage. How to manage transitions
> between major versions will always be case-by-case; consensus on how
> we'll address generic incompatible changes is not saving any work.
>
> Once created, removing functionality from branch-3 (leaving it in
> trunk) _because_ nobody volunteers cycles to address urgent
> compatibility issues is fair. It's also more workable than asking that
> features be committed to a branch that we have no plan to release,
> even as alpha. -C
>
> On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
>  wrote:
> > Tx for your replies, Andrew.
> >
> >>> For exit criteria, how about we time box it? My plan was to do monthly
> >> alphas through the summer, leading up to beta in late August / early
> Sep.
> >> At that point we freeze and stabilize for GA in Nov/Dec.
> >
> >
> > Time-boxing is a reasonable exit-criterion.
> >
> >
> >> In this case, does trunk-incompat essentially become the new trunk? Or
> are
> >> we treating trunk-incompat as a feature branch, which periodically
> merges
> >> changes from trunk?
> >
> >
> > It’s the later. Essentially
> >  - trunk-incompat = trunk + only incompatible changes, periodically kept
> up-to-date to trunk
> >  - trunk is always ready to ship
> >  - and no compatible code gets left behind
> >
> > The reason for my proposal like this is to address the tension between
> “there is lot of compatible code in trunk that we are not shipping” and
> “don’t ship trunk, it has incompatibilities”. With this, we will not have
> (compatible) code not getting shipped to users.
> >
> > Obviously, we can forget about all of my proposal completely if everyone
> puts in all compatible code into branch-2 / branch-3 or whatever the main
> releasable branch is. This didn’t work in practice, have seen this not
> happening prominently during 0.21, and now 3.x.
> >
> > There is another related issue - "my feature is nearly ready, so I’ll
> just merge it into trunk as we don’t release that anyways, but not the
> current releasable branch - I’m lazy to fix the last few stability related
> issues”. With this, we will (should) get more disciplined, take feature
> stability on a branch seriously and merge a feature branch only when it is
> truly ready!
> >
> >> For 3.x, my strawman was to release off trunk for the alphas, then
> branch a
> >> branch-3 for the beta and onwards.
> >
> >
> > Repeating above, I’m proposing continuing to make GA 3.x releases also
> off of trunk! This way only incompatible changes don’t get shipped to users
> - by design! Eventually, trunk-incompat will be latest 3.x GA + enough
> incompatible code to warrant a 4.x, 5.x etc.
> >
> > +Vinod
>

Re: Looking to a Hadoop 3 release

2016-04-23 Thread Chris Douglas

If we're not starting branch-3/trunk, what would distinguish it from
trunk/trunk-incompat? Is it the same mechanism with different labels?

That may be a reasonable strategy when we create branch-3, as a
release branch for beta. Releasing 3.x from trunk will help us figure
out which incompatibilities can be called out in an upgrade guide
(e.g., "new feature X is incompatible with uncommon configuration Y")
and which require code changes (e.g., "data loss upgrading a cluster
with feature X"). Given how long trunk has been unreleased, we need
more data from deployments to triage. How to manage transitions
between major versions will always be case-by-case; consensus on how
we'll address generic incompatible changes is not saving any work.

Once created, removing functionality from branch-3 (leaving it in
trunk) _because_ nobody volunteers cycles to address urgent
compatibility issues is fair. It's also more workable than asking that
features be committed to a branch that we have no plan to release,
even as alpha. -C

On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
 wrote:
> Tx for your replies, Andrew.
>
>>> For exit criteria, how about we time box it? My plan was to do monthly
>> alphas through the summer, leading up to beta in late August / early Sep.
>> At that point we freeze and stabilize for GA in Nov/Dec.
>
>
> Time-boxing is a reasonable exit-criterion.
>
>
>> In this case, does trunk-incompat essentially become the new trunk? Or are
>> we treating trunk-incompat as a feature branch, which periodically merges
>> changes from trunk?
>
>
> It’s the later. Essentially
>  - trunk-incompat = trunk + only incompatible changes, periodically kept 
> up-to-date to trunk
>  - trunk is always ready to ship
>  - and no compatible code gets left behind
>
> The reason for my proposal like this is to address the tension between “there 
> is lot of compatible code in trunk that we are not shipping” and “don’t ship 
> trunk, it has incompatibilities”. With this, we will not have (compatible) 
> code not getting shipped to users.
>
> Obviously, we can forget about all of my proposal completely if everyone puts 
> in all compatible code into branch-2 / branch-3 or whatever the main 
> releasable branch is. This didn’t work in practice, have seen this not 
> happening prominently during 0.21, and now 3.x.
>
> There is another related issue - "my feature is nearly ready, so I’ll just 
> merge it into trunk as we don’t release that anyways, but not the current 
> releasable branch - I’m lazy to fix the last few stability related issues”. 
> With this, we will (should) get more disciplined, take feature stability on a 
> branch seriously and merge a feature branch only when it is truly ready!
>
>> For 3.x, my strawman was to release off trunk for the alphas, then branch a
>> branch-3 for the beta and onwards.
>
>
> Repeating above, I’m proposing continuing to make GA 3.x releases also off of 
> trunk! This way only incompatible changes don’t get shipped to users - by 
> design! Eventually, trunk-incompat will be latest 3.x GA + enough 
> incompatible code to warrant a 4.x, 5.x etc.
>
> +Vinod

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli

Tx for your replies, Andrew.

>> For exit criteria, how about we time box it? My plan was to do monthly
> alphas through the summer, leading up to beta in late August / early Sep.
> At that point we freeze and stabilize for GA in Nov/Dec.


Time-boxing is a reasonable exit-criterion.


> In this case, does trunk-incompat essentially become the new trunk? Or are
> we treating trunk-incompat as a feature branch, which periodically merges
> changes from trunk?


It’s the later. Essentially
 - trunk-incompat = trunk + only incompatible changes, periodically kept 
up-to-date to trunk
 - trunk is always ready to ship
 - and no compatible code gets left behind

The reason for my proposal like this is to address the tension between “there 
is lot of compatible code in trunk that we are not shipping” and “don’t ship 
trunk, it has incompatibilities”. With this, we will not have (compatible) code 
not getting shipped to users.

Obviously, we can forget about all of my proposal completely if everyone puts 
in all compatible code into branch-2 / branch-3 or whatever the main releasable 
branch is. This didn’t work in practice, have seen this not happening 
prominently during 0.21, and now 3.x.

There is another related issue - "my feature is nearly ready, so I’ll just 
merge it into trunk as we don’t release that anyways, but not the current 
releasable branch - I’m lazy to fix the last few stability related issues”. 
With this, we will (should) get more disciplined, take feature stability on a 
branch seriously and merge a feature branch only when it is truly ready!

> For 3.x, my strawman was to release off trunk for the alphas, then branch a
> branch-3 for the beta and onwards.


Repeating above, I’m proposing continuing to make GA 3.x releases also off of 
trunk! This way only incompatible changes don’t get shipped to users - by 
design! Eventually, trunk-incompat will be latest 3.x GA + enough incompatible 
code to warrant a 4.x, 5.x etc.

+Vinod

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Allen Wittenauer


> On Apr 22, 2016, at 6:10 PM, Vinod Kumar Vavilapalli  
> wrote:
> 
> Nope.
> 
> I’m proposing making a new 3.x release (as has been discussed in this thread) 
> off today’s trunk (instead of creating a fresh branch-3) and create a new 
> trunk-incompt where incompatible changes that we don’t want in 3.x go.
> 
> This is mainly to avoid repeating the “we are not releasing 3.x off trunk” 
> issue when we start thinking about 4.x or any such major release in the 
> future.

The only difference between “we aren’t releasing 4.x off of trunk” and 
“we aren’t releasing 4.x off of trunk-incompat” is 10 characters.

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Andrew Wang

Great comments Vinod, thanks for replying.

Since trunk is a superset of branch-2.8, I think the two efforts are mostly
aligned. The 2.8 blockers are likely also 3.0 blockers. For example, the
create-release and L&N JIRAs I mentioned are in this camp. The difference
between the two is the expectation as to the level of quality. Once we get
create-release and L&N settled, I think it's ready for an alpha. Yes, this
means we ship with some known issues, but right now there's no 3.0 artifact
for downstreams to compile and test against. Considering that we're
shipping incompatible changes, I want to give downstreams as much
opportunity to give feedback as possible.

While welcoming the push for alphas, i think we should set some exit
> criteria. Otherwise, I can imagine us doing 3/4/5 alpha releases, and then
> getting restless about calling it beta or GA of whatever. Essentially,
> instead of today’s questions as to "why we aren’t doing a 3.x release",
> we’d be fielding a "why is 3.x still considered alpha” question. This
> happened with 2.x alpha releases too and it wasn’t fun.
>
> For exit criteria, how about we time box it? My plan was to do monthly
alphas through the summer, leading up to beta in late August / early Sep.
At that point we freeze and stabilize for GA in Nov/Dec.

I think we all have an interest in declaring beta/GA, no one wants eternal
alpha releases.

On an unrelated note, offline I was pitching to a bunch of contributors
> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of
> trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s trunk)
> that is not releasable because it gets mixed with other undesirable and
> incompatible changes.
>  - This needs to be coupled with more discipline on individual features -
> medium to to large features are always worked upon in branches and get
> merged into trunk (and a nearing release!) when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat branch
> and stay there till we accumulate enough of those to warrant another major
> release.
>

In this case, does trunk-incompat essentially become the new trunk? Or are
we treating trunk-incompat as a feature branch, which periodically merges
changes from trunk?

Linux has a "next" branch for separate from master for integrating pending
feature branches. I think this is a good model, and would be even better if
we published artifacts to assist with testing. However, that depends on
someone stepping up to be the maintainer of the integration branch.

I really like a more stringent policy around branch merges and new feature
development. That'd be great.

For 3.x, my strawman was to release off trunk for the alphas, then branch a
branch-3 for the beta and onwards.

Best,
Andrew

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli

Nope.

I’m proposing making a new 3.x release (as has been discussed in this thread) 
off today’s trunk (instead of creating a fresh branch-3) and create a new 
trunk-incompt where incompatible changes that we don’t want in 3.x go.

This is mainly to avoid repeating the “we are not releasing 3.x off trunk” 
issue when we start thinking about 4.x or any such major release in the future.

We’ll do 2.8.x independently and later figure out if 2.9 is needed or not.

+Vinod

> On Apr 22, 2016, at 5:59 PM, Allen Wittenauer  wrote:
> 
> 
>> On Apr 22, 2016, at 5:38 PM, Vinod Kumar Vavilapalli  
>> wrote:
>> 
>> On an unrelated note, offline I was pitching to a bunch of contributors 
>> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of 
>> trunk directly*.
>> 
>> What this gains us is that
>> - Trunk is always nearly stable or nearly ready for releases
>> - We no longer have some code lying around in some branch (today’s trunk) 
>> that is not releasable because it gets mixed with other undesirable and 
>> incompatible changes.
>> - This needs to be coupled with more discipline on individual features - 
>> medium to to large features are always worked upon in branches and get 
>> merged into trunk (and a nearing release!) when they are ready
>> - All incompatible changes go into some sort of a trunk-incompat branch and 
>> stay there till we accumulate enough of those to warrant another major 
>> release.
>> 
>> Thoughts?
> 
>   Unless I’m missing something, all this proposal does is (using today’s 
> branch names) effectively rename trunk to trunk-incompat and branch-2 to 
> trunk.  I’m unclear how moving "rotting trunk” to “rotting trunk-incompat” is 
> really progress.
> 
>

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Allen Wittenauer


> On Apr 22, 2016, at 5:38 PM, Vinod Kumar Vavilapalli  
> wrote:
> 
> On an unrelated note, offline I was pitching to a bunch of contributors 
> another idea to deal with rotting trunk post 3.x: *Make 3.x releases off of 
> trunk directly*.
> 
> What this gains us is that
> - Trunk is always nearly stable or nearly ready for releases
> - We no longer have some code lying around in some branch (today’s trunk) 
> that is not releasable because it gets mixed with other undesirable and 
> incompatible changes.
> - This needs to be coupled with more discipline on individual features - 
> medium to to large features are always worked upon in branches and get merged 
> into trunk (and a nearing release!) when they are ready
> - All incompatible changes go into some sort of a trunk-incompat branch and 
> stay there till we accumulate enough of those to warrant another major 
> release.
> 
> Thoughts?

Unless I’m missing something, all this proposal does is (using today’s 
branch names) effectively rename trunk to trunk-incompat and branch-2 to trunk. 
 I’m unclear how moving "rotting trunk” to “rotting trunk-incompat” is really 
progress.

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli

;>> issues tracking and patch committing, not even mention the tremendous
>>>> effort of release verification and voting.
>>>>>> I would like to propose to wait 2.8 release become stable (may be 2nd
>>>> release in 2.8 branch cause first release is alpha due to discussion in
>>>> another email thread), then we can move to 3.0 as the only alpha
>> release.
>>>> In the meantime, we can bring more significant features (like ATS v2,
>> etc.)
>>>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe
>> that
>>>> make life easier. :)
>>>>>> Thoughts?
>>>>>> 
>>>>> 
>>>>> 2.8.0 is relatively close to shipping. I say relatively as I'm doing
>>>> some work with ATS 1.5 downstream and I'd like to make sure all that
>> works.
>>>> There's also a large collection of S3 and swift patches needing
>> attention
>>>> from any reviewers with time and credentials.
>>>>> 
>>>>> 3.x is going to take multiple iterations to stabilise, and with more
>>>> changes, more significant a rollout. I'd also like to do a complete
>> update
>>>> of all the dependencies before a final release, so we can have less
>>>> pressure to upgrade for a while, and get Sean's classloader patch in so
>>>> it's slightly less visible.
>>>>> 
>>>>> That means 3.0 is going to be an alpha release, not final.
>>>>> 
>>>>> one thing that could be shared is any build.xml automation of the
>>>> release process, to at least take away most of the manual steps in the
>>>> process, to have something more repeatable.
>>>>> 
>>>>> -steve
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Junping
>>>>>> 
>>>>>> From: Yongjun Zhang 
>>>>>> Sent: Friday, February 19, 2016 8:05 PM
>>>>>> To: hdfs-dev@hadoop.apache.org
>>>>>> Cc: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
>>>> yarn-...@hadoop.apache.org
>>>>>> Subject: Re: Looking to a Hadoop 3 release
>>>>>> 
>>>>>> Thanks Andrew for initiating the effort!
>>>>>> 
>>>>>> +1 on pushing 3.x with extended alpha cycle, and continuing the more
>>>> stable
>>>>>> 2.x releases.
>>>>>> 
>>>>>> --Yongjun
>>>>>> 
>>>>>> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang <
>> andrew.w...@cloudera.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Kai,
>>>>>>> 
>>>>>>> Sure, I'm open to it. It's a new major release, so we're allowed to
>>>> make
>>>>>>> these kinds of big changes. The idea behind the extended alpha
>> cycle is
>>>>>>> that downstreams can give us feedback. This way if we do anything
>> too
>>>>>>> radical, we can address it in the next alpha and have downstreams
>>>> re-test.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Andrew
>>>>>>> 
>>>>>>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai 
>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Andrew for driving this. Wonder if it's a good chance for
>>>>>>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in.
>> Note
>>>>>>> it's
>>>>>>>> not an incompatible change, but feel better to be done in the major
>>>>>>> release.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Kai
>>>>>>>> 
>>>>>>>> -Original Message-
>>>>>>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>>>>>>>> Sent: Friday, February 19, 2016 7:04 AM
>>>>>>>> To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
>>>>>>>> Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
>>>>>>>> yarn-...@hadoop.apache.org
>>>>>>>> Subject: Re: Looking to a Hadoop 3 release
>>>>>>>> 
>>>&g

Re: Looking to a Hadoop 3 release

2016-04-22 Thread Vinod Kumar Vavilapalli

I kind of echo Junping’s comment too.

While 2.8 and 3.0 don’t need to be serialized in theory, in practice I’m 
desperately looking for help on 2.8.0. We haven’t been converging on 2.8.0 what 
with 50+ blocker / critical patches still unfinished. If postponing 3.x alpha 
to after a 2.8.0 alpha means undivided attention from the community, I’d 
strongly root for such a proposal.

Thanks
+Vinod

> On Feb 20, 2016, at 9:07 PM, Andrew Wang  wrote:
> 
> Hi Junping, thanks for the mail, inline:
> 
> On Sat, Feb 20, 2016 at 7:34 AM, Junping Du  wrote:
> 
>> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
>> reasonable to have two alpha releases to go in parallel. Is EC feature the
>> main motivation of releasing hadoop 3 here? If so, I don't understand why
>> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>> 
> 
> EC is one motivation, there are others too (JDK8, shell scripts, jar
> bumps). I'm open to EC going into branch-2, but I haven't seen any
> backporting yet and it's a lot of code.
> 
> 
>> If we release 3.0 in a month like plan proposed below, it means we will
>> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
>> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
>> issues tracking and patch committing, not even mention the tremendous
>> effort of release verification and voting.
>> I would like to propose to wait 2.8 release become stable (may be 2nd
>> release in 2.8 branch cause first release is alpha due to discussion in
>> another email thread), then we can move to 3.0 as the only alpha release.
>> In the meantime, we can bring more significant features (like ATS v2, etc.)
>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
>> make life easier. :)
>> Thoughts?
>> 
>> Based on some earlier mails in this chain, I was planning to release off
> trunk. This way we avoid having to commit to yet-another-branch, and makes
> tracking easier since trunk will always be a superset of the branch-2's.
> This does mean though that trunk needs to be stable, and we need to be more
> judicious with branch merges, and quickly revert broken code.
> 
> Regarding RM/voting/validation efforts, Steve mentioned some scripts that
> he uses to automate Slider releases. This is something I'd like to bring
> over to Hadoop. Ideally, publishing an RC is push-button, and it comes with
> automated validation. I think this will help with the overhead. Also, since
> these will be early alphas, and there will be a lot of them, I'm not
> expecting anyone to do endurance runs on a large cluster before casting a
> +1.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

2016-04-21 Thread Andrew Wang

Hi folks,

Very optimistically, we're still on track for a 3.0 alpha this month.
Here's a JIRA query for 3.0 and 2.8:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20MAPREDUCE%2C%20YARN)%20AND%20%22Target%20Version%2Fs%22%20in%20(3.0.0%2C%202.8.0)%20AND%20statusCategory%20not%20in%20(Complete)%20ORDER%20BY%20priority

I think two of these are true alpha blockers: HADOOP-12892 and
HADOOP-12893. I'm trying to help push both of those forward.

For the rest, I think it's probably okay to delay until the next alpha,
since we're planning a few alphas leading up to beta. That said, if you are
the owner of a Blocker targeted at 3.0.0, I'd encourage reviving those
patches. The earlier the better for incompatible changes.

In all likelihood, this first release will slip into early May, but I'll be
disappointed if we don't have an RC out before ApacheCon.

Best,
Andrew

On Mon, Feb 22, 2016 at 3:19 PM, Colin P. McCabe  wrote:

> I think starting a 3.0 alpha soon would be a great idea.  As some
> other people commented, this would come with no compatibility
> guarantees, so that we can iron out any issues.
>
> Colin
>
> On Mon, Feb 22, 2016 at 1:26 PM, Zhe Zhang  wrote:
> > Thanks Andrew for driving the effort!
> >
> > +1 (non-binding) on starting the 3.0 release process now with 3.0 as an
> > alpha.
> >
> > I wanted to echo Andrew's point that backporting EC to branch-2 is a lot
> of
> > work. Considering that no concrete backporting plan has been proposed, it
> > seems quite uncertain whether / when it can be released in 2.9. I think
> we
> > should rather concentrate our EC dev efforts to harden key features under
> > the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release.
> >
> > Sincerely,
> > Zhe
> >
> > On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe 
> wrote:
> >
> >> +1 for a release of 3.0.  There are a lot of significant,
> >> compatibility-breaking, but necessary changes in this release... we've
> >> touched on some of them in this thread.
> >>
> >> +1 for a parallel release of 2.8 as well.  I think we are pretty close
> >> to this, barring a dozen or so blockers.
> >>
> >> best,
> >> Colin
> >>
> >> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran  >
> >> wrote:
> >> >
> >> >> On 20 Feb 2016, at 15:34, Junping Du  wrote:
> >> >>
> >> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
> >> reasonable to have two alpha releases to go in parallel. Is EC feature
> the
> >> main motivation of releasing hadoop 3 here? If so, I don't understand
> why
> >> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
> >> >
> >> >
> >> >
> >> >> If we release 3.0 in a month like plan proposed below, it means we
> will
> >> have 4 active releases going in parallel - two alpha releases (2.8 and
> 3.0)
> >> and two stable releases (2.6.x and 2.7.x). It brings a lot of
> challenges in
> >> issues tracking and patch committing, not even mention the tremendous
> >> effort of release verification and voting.
> >> >> I would like to propose to wait 2.8 release become stable (may be 2nd
> >> release in 2.8 branch cause first release is alpha due to discussion in
> >> another email thread), then we can move to 3.0 as the only alpha
> release.
> >> In the meantime, we can bring more significant features (like ATS v2,
> etc.)
> >> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe
> that
> >> make life easier. :)
> >> >> Thoughts?
> >> >>
> >> >
> >> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing
> >> some work with ATS 1.5 downstream and I'd like to make sure all that
> works.
> >> There's also a large collection of S3 and swift patches needing
> attention
> >> from any reviewers with time and credentials.
> >> >
> >> > 3.x is going to take multiple iterations to stabilise, and with more
> >> changes, more significant a rollout. I'd also like to do a complete
> update
> >> of all the dependencies before a final release, so we can have less
> >> pressure to upgrade for a while, and get Sean's classloader patch in so
> >> it's slightly less visible.
> >> >
> >> > That means 3.0 is going to be an alpha release, not final.
> >> >
> >> > one thing

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Colin P. McCabe

I think starting a 3.0 alpha soon would be a great idea.  As some
other people commented, this would come with no compatibility
guarantees, so that we can iron out any issues.

Colin

On Mon, Feb 22, 2016 at 1:26 PM, Zhe Zhang  wrote:
> Thanks Andrew for driving the effort!
>
> +1 (non-binding) on starting the 3.0 release process now with 3.0 as an
> alpha.
>
> I wanted to echo Andrew's point that backporting EC to branch-2 is a lot of
> work. Considering that no concrete backporting plan has been proposed, it
> seems quite uncertain whether / when it can be released in 2.9. I think we
> should rather concentrate our EC dev efforts to harden key features under
> the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release.
>
> Sincerely,
> Zhe
>
> On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe  wrote:
>
>> +1 for a release of 3.0.  There are a lot of significant,
>> compatibility-breaking, but necessary changes in this release... we've
>> touched on some of them in this thread.
>>
>> +1 for a parallel release of 2.8 as well.  I think we are pretty close
>> to this, barring a dozen or so blockers.
>>
>> best,
>> Colin
>>
>> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran 
>> wrote:
>> >
>> >> On 20 Feb 2016, at 15:34, Junping Du  wrote:
>> >>
>> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
>> reasonable to have two alpha releases to go in parallel. Is EC feature the
>> main motivation of releasing hadoop 3 here? If so, I don't understand why
>> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>> >
>> >
>> >
>> >> If we release 3.0 in a month like plan proposed below, it means we will
>> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
>> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
>> issues tracking and patch committing, not even mention the tremendous
>> effort of release verification and voting.
>> >> I would like to propose to wait 2.8 release become stable (may be 2nd
>> release in 2.8 branch cause first release is alpha due to discussion in
>> another email thread), then we can move to 3.0 as the only alpha release.
>> In the meantime, we can bring more significant features (like ATS v2, etc.)
>> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
>> make life easier. :)
>> >> Thoughts?
>> >>
>> >
>> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing
>> some work with ATS 1.5 downstream and I'd like to make sure all that works.
>> There's also a large collection of S3 and swift patches needing attention
>> from any reviewers with time and credentials.
>> >
>> > 3.x is going to take multiple iterations to stabilise, and with more
>> changes, more significant a rollout. I'd also like to do a complete update
>> of all the dependencies before a final release, so we can have less
>> pressure to upgrade for a while, and get Sean's classloader patch in so
>> it's slightly less visible.
>> >
>> > That means 3.0 is going to be an alpha release, not final.
>> >
>> > one thing that could be shared is any build.xml automation of the
>> release process, to at least take away most of the manual steps in the
>> process, to have something more repeatable.
>> >
>> > -steve
>> >
>> >
>> >> Thanks,
>> >>
>> >> Junping
>> >> 
>> >> From: Yongjun Zhang 
>> >> Sent: Friday, February 19, 2016 8:05 PM
>> >> To: hdfs-dev@hadoop.apache.org
>> >> Cc: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
>> yarn-...@hadoop.apache.org
>> >> Subject: Re: Looking to a Hadoop 3 release
>> >>
>> >> Thanks Andrew for initiating the effort!
>> >>
>> >> +1 on pushing 3.x with extended alpha cycle, and continuing the more
>> stable
>> >> 2.x releases.
>> >>
>> >> --Yongjun
>> >>
>> >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
>> >> wrote:
>> >>
>> >>> Hi Kai,
>> >>>
>> >>> Sure, I'm open to it. It's a new major release, so we're allowed to
>> make
>> >>> these kinds of big changes. The idea behind the extended alpha cycle is
>> >>> that downstreams can give us feedback. This way

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Zhe Zhang

Thanks Andrew for driving the effort!

+1 (non-binding) on starting the 3.0 release process now with 3.0 as an
alpha.

I wanted to echo Andrew's point that backporting EC to branch-2 is a lot of
work. Considering that no concrete backporting plan has been proposed, it
seems quite uncertain whether / when it can be released in 2.9. I think we
should rather concentrate our EC dev efforts to harden key features under
the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release.

Sincerely,
Zhe

On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe  wrote:

> +1 for a release of 3.0.  There are a lot of significant,
> compatibility-breaking, but necessary changes in this release... we've
> touched on some of them in this thread.
>
> +1 for a parallel release of 2.8 as well.  I think we are pretty close
> to this, barring a dozen or so blockers.
>
> best,
> Colin
>
> On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran 
> wrote:
> >
> >> On 20 Feb 2016, at 15:34, Junping Du  wrote:
> >>
> >> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
> reasonable to have two alpha releases to go in parallel. Is EC feature the
> main motivation of releasing hadoop 3 here? If so, I don't understand why
> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
> >
> >
> >
> >> If we release 3.0 in a month like plan proposed below, it means we will
> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
> issues tracking and patch committing, not even mention the tremendous
> effort of release verification and voting.
> >> I would like to propose to wait 2.8 release become stable (may be 2nd
> release in 2.8 branch cause first release is alpha due to discussion in
> another email thread), then we can move to 3.0 as the only alpha release.
> In the meantime, we can bring more significant features (like ATS v2, etc.)
> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
> make life easier. :)
> >> Thoughts?
> >>
> >
> > 2.8.0 is relatively close to shipping. I say relatively as I'm doing
> some work with ATS 1.5 downstream and I'd like to make sure all that works.
> There's also a large collection of S3 and swift patches needing attention
> from any reviewers with time and credentials.
> >
> > 3.x is going to take multiple iterations to stabilise, and with more
> changes, more significant a rollout. I'd also like to do a complete update
> of all the dependencies before a final release, so we can have less
> pressure to upgrade for a while, and get Sean's classloader patch in so
> it's slightly less visible.
> >
> > That means 3.0 is going to be an alpha release, not final.
> >
> > one thing that could be shared is any build.xml automation of the
> release process, to at least take away most of the manual steps in the
> process, to have something more repeatable.
> >
> > -steve
> >
> >
> >> Thanks,
> >>
> >> Junping
> >> 
> >> From: Yongjun Zhang 
> >> Sent: Friday, February 19, 2016 8:05 PM
> >> To: hdfs-dev@hadoop.apache.org
> >> Cc: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Thanks Andrew for initiating the effort!
> >>
> >> +1 on pushing 3.x with extended alpha cycle, and continuing the more
> stable
> >> 2.x releases.
> >>
> >> --Yongjun
> >>
> >> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
> >> wrote:
> >>
> >>> Hi Kai,
> >>>
> >>> Sure, I'm open to it. It's a new major release, so we're allowed to
> make
> >>> these kinds of big changes. The idea behind the extended alpha cycle is
> >>> that downstreams can give us feedback. This way if we do anything too
> >>> radical, we can address it in the next alpha and have downstreams
> re-test.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai 
> wrote:
> >>>
> >>>> Thanks Andrew for driving this. Wonder if it's a good chance for
> >>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> >>> it's
> >>>> not an incompatible change, but feel better to be done in the major
> >>> release.
> >>&

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Colin P. McCabe

+1 for a release of 3.0.  There are a lot of significant,
compatibility-breaking, but necessary changes in this release... we've
touched on some of them in this thread.

+1 for a parallel release of 2.8 as well.  I think we are pretty close
to this, barring a dozen or so blockers.

best,
Colin

On Mon, Feb 22, 2016 at 2:56 AM, Steve Loughran  wrote:
>
>> On 20 Feb 2016, at 15:34, Junping Du  wrote:
>>
>> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds 
>> reasonable to have two alpha releases to go in parallel. Is EC feature the 
>> main motivation of releasing hadoop 3 here? If so, I don't understand why 
>> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>
>
>
>> If we release 3.0 in a month like plan proposed below, it means we will have 
>> 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and 
>> two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in 
>> issues tracking and patch committing, not even mention the tremendous effort 
>> of release verification and voting.
>> I would like to propose to wait 2.8 release become stable (may be 2nd 
>> release in 2.8 branch cause first release is alpha due to discussion in 
>> another email thread), then we can move to 3.0 as the only alpha release. In 
>> the meantime, we can bring more significant features (like ATS v2, etc.) to 
>> trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that 
>> make life easier. :)
>> Thoughts?
>>
>
> 2.8.0 is relatively close to shipping. I say relatively as I'm doing some 
> work with ATS 1.5 downstream and I'd like to make sure all that works. 
> There's also a large collection of S3 and swift patches needing attention 
> from any reviewers with time and credentials.
>
> 3.x is going to take multiple iterations to stabilise, and with more changes, 
> more significant a rollout. I'd also like to do a complete update of all the 
> dependencies before a final release, so we can have less pressure to upgrade 
> for a while, and get Sean's classloader patch in so it's slightly less 
> visible.
>
> That means 3.0 is going to be an alpha release, not final.
>
> one thing that could be shared is any build.xml automation of the release 
> process, to at least take away most of the manual steps in the process, to 
> have something more repeatable.
>
> -steve
>
>
>> Thanks,
>>
>> Junping
>> ____________________
>> From: Yongjun Zhang 
>> Sent: Friday, February 19, 2016 8:05 PM
>> To: hdfs-dev@hadoop.apache.org
>> Cc: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
>> yarn-...@hadoop.apache.org
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Thanks Andrew for initiating the effort!
>>
>> +1 on pushing 3.x with extended alpha cycle, and continuing the more stable
>> 2.x releases.
>>
>> --Yongjun
>>
>> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
>> wrote:
>>
>>> Hi Kai,
>>>
>>> Sure, I'm open to it. It's a new major release, so we're allowed to make
>>> these kinds of big changes. The idea behind the extended alpha cycle is
>>> that downstreams can give us feedback. This way if we do anything too
>>> radical, we can address it in the next alpha and have downstreams re-test.
>>>
>>> Best,
>>> Andrew
>>>
>>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai  wrote:
>>>
>>>> Thanks Andrew for driving this. Wonder if it's a good chance for
>>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
>>> it's
>>>> not an incompatible change, but feel better to be done in the major
>>> release.
>>>>
>>>> Regards,
>>>> Kai
>>>>
>>>> -Original Message-
>>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>>>> Sent: Friday, February 19, 2016 7:04 AM
>>>> To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
>>>> Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
>>>> yarn-...@hadoop.apache.org
>>>> Subject: Re: Looking to a Hadoop 3 release
>>>>
>>>> Hi Kihwal,
>>>>
>>>> I think there's still value in continuing the 2.x releases. 3.x comes
>>> with
>>>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>>>> be beta or GA for some number of months. In the meanwhile, it'd be good
>>> to
>>>&

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Steve Loughran


> On 20 Feb 2016, at 15:34, Junping Du  wrote:
> 
> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds reasonable 
> to have two alpha releases to go in parallel. Is EC feature the main 
> motivation of releasing hadoop 3 here? If so, I don't understand why this 
> feature cannot land on 2.8.x or 2.9.x as an alpha feature. 



> If we release 3.0 in a month like plan proposed below, it means we will have 
> 4 active releases going in parallel - two alpha releases (2.8 and 3.0) and 
> two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in 
> issues tracking and patch committing, not even mention the tremendous effort 
> of release verification and voting.
> I would like to propose to wait 2.8 release become stable (may be 2nd release 
> in 2.8 branch cause first release is alpha due to discussion in another email 
> thread), then we can move to 3.0 as the only alpha release. In the meantime, 
> we can bring more significant features (like ATS v2, etc.) to trunk and 
> consolidate stable releases in 2.6.x and 2.7.x. I believe that make life 
> easier. :)
> Thoughts?
> 

2.8.0 is relatively close to shipping. I say relatively as I'm doing some work 
with ATS 1.5 downstream and I'd like to make sure all that works. There's also 
a large collection of S3 and swift patches needing attention from any reviewers 
with time and credentials. 

3.x is going to take multiple iterations to stabilise, and with more changes, 
more significant a rollout. I'd also like to do a complete update of all the 
dependencies before a final release, so we can have less pressure to upgrade 
for a while, and get Sean's classloader patch in so it's slightly less visible.

That means 3.0 is going to be an alpha release, not final. 

one thing that could be shared is any build.xml automation of the release 
process, to at least take away most of the manual steps in the process, to have 
something more repeatable.

-steve


> Thanks,
> 
> Junping 
> 
> From: Yongjun Zhang 
> Sent: Friday, February 19, 2016 8:05 PM
> To: hdfs-dev@hadoop.apache.org
> Cc: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
> 
> Thanks Andrew for initiating the effort!
> 
> +1 on pushing 3.x with extended alpha cycle, and continuing the more stable
> 2.x releases.
> 
> --Yongjun
> 
> On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
> wrote:
> 
>> Hi Kai,
>> 
>> Sure, I'm open to it. It's a new major release, so we're allowed to make
>> these kinds of big changes. The idea behind the extended alpha cycle is
>> that downstreams can give us feedback. This way if we do anything too
>> radical, we can address it in the next alpha and have downstreams re-test.
>> 
>> Best,
>> Andrew
>> 
>> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai  wrote:
>> 
>>> Thanks Andrew for driving this. Wonder if it's a good chance for
>>> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
>> it's
>>> not an incompatible change, but feel better to be done in the major
>> release.
>>> 
>>> Regards,
>>> Kai
>>> 
>>> -Original Message-
>>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>>> Sent: Friday, February 19, 2016 7:04 AM
>>> To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
>>> Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
>>> yarn-...@hadoop.apache.org
>>> Subject: Re: Looking to a Hadoop 3 release
>>> 
>>> Hi Kihwal,
>>> 
>>> I think there's still value in continuing the 2.x releases. 3.x comes
>> with
>>> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>>> be beta or GA for some number of months. In the meanwhile, it'd be good
>> to
>>> keep putting out regular, stable 2.x releases.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> 
>>> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee >> 
>>> wrote:
>>> 
>>>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>>> motivations, are we getting rid of branch-2.8?
>>>> 
>>>> Kihwal
>>>> 
>>>>  From: Andrew Wang 
>>>> To: "common-...@hadoop.apache.org" 
>>>> Cc: "yarn-...@hadoop.apache.org" ; "
>>>> mapreduce-...@hadoop.apache.org" ;
>>>> hdfs-dev 
>>>> Sent: Thursday, February 18, 2016 4:35 P

Re: Looking to a Hadoop 3 release

2016-02-20 Thread Andrew Wang

Hi Junping, thanks for the mail, inline:

On Sat, Feb 20, 2016 at 7:34 AM, Junping Du  wrote:

> Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds
> reasonable to have two alpha releases to go in parallel. Is EC feature the
> main motivation of releasing hadoop 3 here? If so, I don't understand why
> this feature cannot land on 2.8.x or 2.9.x as an alpha feature.
>

EC is one motivation, there are others too (JDK8, shell scripts, jar
bumps). I'm open to EC going into branch-2, but I haven't seen any
backporting yet and it's a lot of code.


> If we release 3.0 in a month like plan proposed below, it means we will
> have 4 active releases going in parallel - two alpha releases (2.8 and 3.0)
> and two stable releases (2.6.x and 2.7.x). It brings a lot of challenges in
> issues tracking and patch committing, not even mention the tremendous
> effort of release verification and voting.
> I would like to propose to wait 2.8 release become stable (may be 2nd
> release in 2.8 branch cause first release is alpha due to discussion in
> another email thread), then we can move to 3.0 as the only alpha release.
> In the meantime, we can bring more significant features (like ATS v2, etc.)
> to trunk and consolidate stable releases in 2.6.x and 2.7.x. I believe that
> make life easier. :)
> Thoughts?
>
> Based on some earlier mails in this chain, I was planning to release off
trunk. This way we avoid having to commit to yet-another-branch, and makes
tracking easier since trunk will always be a superset of the branch-2's.
This does mean though that trunk needs to be stable, and we need to be more
judicious with branch merges, and quickly revert broken code.

Regarding RM/voting/validation efforts, Steve mentioned some scripts that
he uses to automate Slider releases. This is something I'd like to bring
over to Hadoop. Ideally, publishing an RC is push-button, and it comes with
automated validation. I think this will help with the overhead. Also, since
these will be early alphas, and there will be a lot of them, I'm not
expecting anyone to do endurance runs on a large cluster before casting a
+1.

Best,
Andrew

Re: Looking to a Hadoop 3 release

2016-02-20 Thread Junping Du

Shall we consolidate effort for 2.8.0 and 3.0.0? It doesn't sounds reasonable 
to have two alpha releases to go in parallel. Is EC feature the main motivation 
of releasing hadoop 3 here? If so, I don't understand why this feature cannot 
land on 2.8.x or 2.9.x as an alpha feature. 
If we release 3.0 in a month like plan proposed below, it means we will have 4 
active releases going in parallel - two alpha releases (2.8 and 3.0) and two 
stable releases (2.6.x and 2.7.x). It brings a lot of challenges in issues 
tracking and patch committing, not even mention the tremendous effort of 
release verification and voting.
I would like to propose to wait 2.8 release become stable (may be 2nd release 
in 2.8 branch cause first release is alpha due to discussion in another email 
thread), then we can move to 3.0 as the only alpha release. In the meantime, we 
can bring more significant features (like ATS v2, etc.) to trunk and 
consolidate stable releases in 2.6.x and 2.7.x. I believe that make life 
easier. :)
Thoughts?

Thanks,

Junping 

From: Yongjun Zhang 
Sent: Friday, February 19, 2016 8:05 PM
To: hdfs-dev@hadoop.apache.org
Cc: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for initiating the effort!

+1 on pushing 3.x with extended alpha cycle, and continuing the more stable
2.x releases.

--Yongjun

On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
wrote:

> Hi Kai,
>
> Sure, I'm open to it. It's a new major release, so we're allowed to make
> these kinds of big changes. The idea behind the extended alpha cycle is
> that downstreams can give us feedback. This way if we do anything too
> radical, we can address it in the next alpha and have downstreams re-test.
>
> Best,
> Andrew
>
> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai  wrote:
>
> > Thanks Andrew for driving this. Wonder if it's a good chance for
> > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> it's
> > not an incompatible change, but feel better to be done in the major
> release.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> > Sent: Friday, February 19, 2016 7:04 AM
> > To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
> > Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi Kihwal,
> >
> > I think there's still value in continuing the 2.x releases. 3.x comes
> with
> > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> > be beta or GA for some number of months. In the meanwhile, it'd be good
> to
> > keep putting out regular, stable 2.x releases.
> >
> > Best,
> > Andrew
> >
> >
> > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee  >
> > wrote:
> >
> > > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > > motivations, are we getting rid of branch-2.8?
> > >
> > > Kihwal
> > >
> > >   From: Andrew Wang 
> > >  To: "common-...@hadoop.apache.org" 
> > > Cc: "yarn-...@hadoop.apache.org" ; "
> > > mapreduce-...@hadoop.apache.org" ;
> > > hdfs-dev 
> > >  Sent: Thursday, February 18, 2016 4:35 PM
> > >  Subject: Re: Looking to a Hadoop 3 release
> > >
> > > Hi all,
> > >
> > > Reviving this thread. I've seen renewed interest in a trunk release
> > > since HDFS erasure coding has not yet made it to branch-2. Along with
> > > JDK8, the shell script rewrite, and many other improvements, I think
> > > it's time to revisit Hadoop 3.0 release plans.
> > >
> > > My overall plan is still the same as in my original email: a series of
> > > regular alpha releases leading up to beta and GA. Alpha releases make
> > > it easier for downstreams to integrate with our code, and making them
> > > regular means features can be included when they are ready.
> > >
> > > I know there are some incompatible changes waiting in the wings (i.e.
> > > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > > If you have changes like this, please set the target version to 3.0.0
> > > and mark them "Incompatible". We can use this JIRA query to track:
> > >
> > >
> > > https://issues.apache.org/jira/issues/?jql=proje

Re: Looking to a Hadoop 3 release

2016-02-19 Thread Yongjun Zhang

Thanks Andrew for initiating the effort!

+1 on pushing 3.x with extended alpha cycle, and continuing the more stable
2.x releases.

--Yongjun

On Thu, Feb 18, 2016 at 5:58 PM, Andrew Wang 
wrote:

> Hi Kai,
>
> Sure, I'm open to it. It's a new major release, so we're allowed to make
> these kinds of big changes. The idea behind the extended alpha cycle is
> that downstreams can give us feedback. This way if we do anything too
> radical, we can address it in the next alpha and have downstreams re-test.
>
> Best,
> Andrew
>
> On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai  wrote:
>
> > Thanks Andrew for driving this. Wonder if it's a good chance for
> > HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note
> it's
> > not an incompatible change, but feel better to be done in the major
> release.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> > Sent: Friday, February 19, 2016 7:04 AM
> > To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
> > Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi Kihwal,
> >
> > I think there's still value in continuing the 2.x releases. 3.x comes
> with
> > the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> > be beta or GA for some number of months. In the meanwhile, it'd be good
> to
> > keep putting out regular, stable 2.x releases.
> >
> > Best,
> > Andrew
> >
> >
> > On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee  >
> > wrote:
> >
> > > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > > motivations, are we getting rid of branch-2.8?
> > >
> > > Kihwal
> > >
> > >   From: Andrew Wang 
> > >  To: "common-...@hadoop.apache.org" 
> > > Cc: "yarn-...@hadoop.apache.org" ; "
> > > mapreduce-...@hadoop.apache.org" ;
> > > hdfs-dev 
> > >  Sent: Thursday, February 18, 2016 4:35 PM
> > >  Subject: Re: Looking to a Hadoop 3 release
> > >
> > > Hi all,
> > >
> > > Reviving this thread. I've seen renewed interest in a trunk release
> > > since HDFS erasure coding has not yet made it to branch-2. Along with
> > > JDK8, the shell script rewrite, and many other improvements, I think
> > > it's time to revisit Hadoop 3.0 release plans.
> > >
> > > My overall plan is still the same as in my original email: a series of
> > > regular alpha releases leading up to beta and GA. Alpha releases make
> > > it easier for downstreams to integrate with our code, and making them
> > > regular means features can be included when they are ready.
> > >
> > > I know there are some incompatible changes waiting in the wings (i.e.
> > > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > > If you have changes like this, please set the target version to 3.0.0
> > > and mark them "Incompatible". We can use this JIRA query to track:
> > >
> > >
> > > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> > >
> > > There's some release-related stuff that needs to be sorted out
> > > (namely, the new CHANGES.txt and release note generation from Yetus),
> > > but I'd tentatively like to roll the first alpha a month out, so third
> > > week of March.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
> > wrote:
> > >
> > > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > > source version to JDK8.
> > > >
> > > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > > not a way of abandoning it.
> > > >
> > > >
> > > >
> > > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > > 
> > > > wrote:
> > > >

Re: Looking to a Hadoop 3 release

2016-02-19 Thread Ravi Prakash

+1 for the plan to start cutting 3.x alpha releases. Thanks for the
initiative Andrew!

On Fri, Feb 19, 2016 at 6:19 AM, Steve Loughran 
wrote:

>
> > On 19 Feb 2016, at 11:27, Dmitry Sivachenko  wrote:
> >
> >
> >> On 19 Feb 2016, at 01:35, Andrew Wang  wrote:
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >
>
> It's time to start ... I suspect it'll take a while to stabilise. I look
> forward to the new shell scripts already
>
> One thing I do want there is for all the alpha releases to make clear that
> there are no compatibility policies here; protocols may change and there is
> no requirement of the first 3.x release to be compatible with all the 3.0.x
> alphas. That's something we missed out on the 2.0.x-alpha process, or at
> least not repeated often enough.
>
> >
> > Hello,
> >
> > any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes
> out?
> >
> > Thanks!
> >
> >
>
> sounds like a good time for a status update on the FB work —and anything
> people can do to test it would be appreciated by all. That includes testing
> on ipv4 systems, and especially, IPv4/v6 systems with Kerberos turned on
> and both MIT and AD kerberos servers. At the same time, IPv6 support ought
> to be something that could be added in.
>
>
> I don't have any opinions on timescale, but
>
> +1 to anything related to classpath isolation
> +1 to a careful bump of versions of dependencies.
> +1 to fixing the outstanding Java 8 migration issues, especially the big
> Jersey patch that's just been updated.
> +1 to switching to JIRA-created release notes
>
> Having been doing the slider releases recently, it's clear to me that you
> can do a lot in automating the release process itself. All those steps in
> the release runbook can be turned into targets in a special ant release.xml
> build file, calling maven, gpg, etc.
>
> I think doing something like this for 3.0 will significantly benefit both
> the release phase here but the future releases
>
> This is the slider one:
> https://github.com/apache/incubator-slider/blob/develop/bin/release.xml
>
> It doesn't replace maven, instead it choreographs that along with all the
> other steps: signing and checksumming artifacts, publishing them, voting
>
> it includes
>  -refusing to release if the git repo is modified
>  -making the various git branch/tag/push operations
>  -issuing the various mvn versions:update commands
>  -signing
>  -publishing via asf SVN
>  -using GET calls too verify the artifacts made it
>  -generating the vote and vote result emails (it even counts the votes)
>
> I recommend this is included as part of the release process. It does make
> a difference; we can now cut new releases with no human intervention other
> than editing a properties file and running different targets as the process
> goes through its release and vote phases.
>
> -Steve

Re: Looking to a Hadoop 3 release

2016-02-19 Thread Steve Loughran

> On 19 Feb 2016, at 11:27, Dmitry Sivachenko  wrote:
> 
> 
>> On 19 Feb 2016, at 01:35, Andrew Wang  wrote:
>> 
>> Hi all,
>> 
>> Reviving this thread. I've seen renewed interest in a trunk release since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>> 
> 

It's time to start ... I suspect it'll take a while to stabilise. I look 
forward to the new shell scripts already

One thing I do want there is for all the alpha releases to make clear that 
there are no compatibility policies here; protocols may change and there is no 
requirement of the first 3.x release to be compatible with all the 3.0.x 
alphas. That's something we missed out on the 2.0.x-alpha process, or at least 
not repeated often enough.

> 
> Hello,
> 
> any chance IPv6 support (HADOOP-11890) will be finished before 3.0 comes out?
> 
> Thanks!
> 
> 

sounds like a good time for a status update on the FB work —and anything people 
can do to test it would be appreciated by all. That includes testing on ipv4 
systems, and especially, IPv4/v6 systems with Kerberos turned on and both MIT 
and AD kerberos servers. At the same time, IPv6 support ought to be something 
that could be added in.

I don't have any opinions on timescale, but

+1 to anything related to classpath isolation
+1 to a careful bump of versions of dependencies.
+1 to fixing the outstanding Java 8 migration issues, especially the big Jersey 
patch that's just been updated.
+1 to switching to JIRA-created release notes

Having been doing the slider releases recently, it's clear to me that you can 
do a lot in automating the release process itself. All those steps in the 
release runbook can be turned into targets in a special ant release.xml build 
file, calling maven, gpg, etc.

I think doing something like this for 3.0 will significantly benefit both the 
release phase here but the future releases

This is the slider one: 
https://github.com/apache/incubator-slider/blob/develop/bin/release.xml

It doesn't replace maven, instead it choreographs that along with all the other 
steps: signing and checksumming artifacts, publishing them, voting

it includes
 -refusing to release if the git repo is modified
 -making the various git branch/tag/push operations
 -issuing the various mvn versions:update commands
 -signing
 -publishing via asf SVN 
 -using GET calls too verify the artifacts made it
 -generating the vote and vote result emails (it even counts the votes)

I recommend this is included as part of the release process. It does make a 
difference; we can now cut new releases with no human intervention other than 
editing a properties file and running different targets as the process goes 
through its release and vote phases.

-Steve

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Akira AJISAKA


+1 for the 3.0 release plan and continuing 2.x releases.
I'm thinking we should consider stopping new 2.x minor releases after 
3.x reaches GA.


Thanks,
Akira

On 2/19/16 10:33, Gangumalla, Uma wrote:

Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang"  wrote:


Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
wrote:


Moving Hadoop 3 forward sounds fine. If EC is one of the main
motivations,
are we getting rid of branch-2.8?

Kihwal

   From: Andrew Wang 
  To: "common-...@hadoop.apache.org" 
Cc: "yarn-...@hadoop.apache.org" ; "
mapreduce-...@hadoop.apache.org" ;
hdfs-dev 
  Sent: Thursday, February 18, 2016 4:35 PM
  Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release
since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them
regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in.
If
you have changes like this, please set the target version to 3.0.0 and
mark
them "Incompatible". We can use this JIRA query to track:



https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
s%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely,
the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
wrote:


Avoiding the use of JDK8 language features (and, presumably, APIs)
means you've abandoned #1, i.e., you haven't (really) bumped the JDK
source version to JDK8.

Also, note that releasing from trunk is a way of achieving #3, it's
not a way of abandoning it.



On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
wrote:

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a

branch-2,

and there was general agreement there. So, consider #3 abandoned.

1&2
can

be achieved at the same time, we just need to avoid using JDK8

language

features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 

wrote:



In this (and the related threads), I see the following three

requirements:


1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).

2. "We'll still be releasing 2.x releases for a while, with similar
feature sets as 3.x."

3. Avoid the "risk of split-brain behavior" by "minimize

backporting

headaches. Pulling trunk > branch-2 > branch-2.x is already

tedious.

Adding a branch-3, branch-3.x would be obnoxious."

These three cannot be achieved at the same time.  Which do we

abandon?



On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia



wrote:



On Mar 5, 2015, at 3:21 PM, Siddharth Seth 

wrote:


2) Simplification of configs - potentially separating client

side

configs

and those used by daemons. This is another source of perpetual

confusion

for users.

+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Andrew Wang

Hi Kai,

Sure, I'm open to it. It's a new major release, so we're allowed to make
these kinds of big changes. The idea behind the extended alpha cycle is
that downstreams can give us feedback. This way if we do anything too
radical, we can address it in the next alpha and have downstreams re-test.

Best,
Andrew

On Thu, Feb 18, 2016 at 5:23 PM, Zheng, Kai  wrote:

> Thanks Andrew for driving this. Wonder if it's a good chance for
> HADOOP-12579 (Deprecate and remove WriteableRPCEngine) to be in. Note it's
> not an incompatible change, but feel better to be done in the major release.
>
> Regards,
> Kai
>
> -Original Message-
> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
> Sent: Friday, February 19, 2016 7:04 AM
> To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
> Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Hi Kihwal,
>
> I think there's still value in continuing the 2.x releases. 3.x comes with
> the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> be beta or GA for some number of months. In the meanwhile, it'd be good to
> keep putting out regular, stable 2.x releases.
>
> Best,
> Andrew
>
>
> On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
> wrote:
>
> > Moving Hadoop 3 forward sounds fine. If EC is one of the main
> > motivations, are we getting rid of branch-2.8?
> >
> > Kihwal
> >
> >   From: Andrew Wang 
> >  To: "common-...@hadoop.apache.org" 
> > Cc: "yarn-...@hadoop.apache.org" ; "
> > mapreduce-...@hadoop.apache.org" ;
> > hdfs-dev 
> >  Sent: Thursday, February 18, 2016 4:35 PM
> >  Subject: Re: Looking to a Hadoop 3 release
> >
> > Hi all,
> >
> > Reviving this thread. I've seen renewed interest in a trunk release
> > since HDFS erasure coding has not yet made it to branch-2. Along with
> > JDK8, the shell script rewrite, and many other improvements, I think
> > it's time to revisit Hadoop 3.0 release plans.
> >
> > My overall plan is still the same as in my original email: a series of
> > regular alpha releases leading up to beta and GA. Alpha releases make
> > it easier for downstreams to integrate with our code, and making them
> > regular means features can be included when they are ready.
> >
> > I know there are some incompatible changes waiting in the wings (i.e.
> > HDFS-6984 making FileStatus a PB rather than Writable, some of
> > HADOOP-9991 bumping dependency versions) that would be good to get in.
> > If you have changes like this, please set the target version to 3.0.0
> > and mark them "Incompatible". We can use this JIRA query to track:
> >
> >
> > https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> > 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> > 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> > op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >
> > There's some release-related stuff that needs to be sorted out
> > (namely, the new CHANGES.txt and release note generation from Yetus),
> > but I'd tentatively like to roll the first alpha a month out, so third
> > week of March.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
> wrote:
> >
> > > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > > source version to JDK8.
> > >
> > > Also, note that releasing from trunk is a way of achieving #3, it's
> > > not a way of abandoning it.
> > >
> > >
> > >
> > > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang
> > > 
> > > wrote:
> > > > Hi Raymie,
> > > >
> > > > Konst proposed just releasing off of trunk rather than cutting a
> > > branch-2,
> > > > and there was general agreement there. So, consider #3 abandoned.
> > > > 1&2
> > can
> > > > be achieved at the same time, we just need to avoid using JDK8
> > > > language features in trunk so things can be backported.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata
> > > > 
> > > wrote:
> > > >
> > > >> In this (and the related threads), I see the fol

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Sangjin Lee

Another thing to throw in there is the dependency/classpath isolation
(HADOOP-11656). Some efforts have already been made by Sean, and it'd be
great to complete this to have a much better dependency isolation solution
for 3.x.

On Thu, Feb 18, 2016 at 5:33 PM, Gangumalla, Uma 
wrote:

> Yes. I think starting 3.0 release with alpha is good idea. So it would get
> some time to reach the beta or GA.
>
> +1 for the plan.
>
> For the compatibility purposes and as current stable versions, we should
> continue 2.x releases anyway.
>
> Thanks Andrew for starting the thread.
>
> Regards,
> Uma
>
> On 2/18/16, 3:04 PM, "Andrew Wang"  wrote:
>
> >Hi Kihwal,
> >
> >I think there's still value in continuing the 2.x releases. 3.x comes with
> >the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
> >be beta or GA for some number of months. In the meanwhile, it'd be good to
> >keep putting out regular, stable 2.x releases.
> >
> >Best,
> >Andrew
> >
> >
> >On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee  >
> >wrote:
> >
> >> Moving Hadoop 3 forward sounds fine. If EC is one of the main
> >>motivations,
> >> are we getting rid of branch-2.8?
> >>
> >> Kihwal
> >>
> >>   From: Andrew Wang 
> >>  To: "common-...@hadoop.apache.org" 
> >> Cc: "yarn-...@hadoop.apache.org" ; "
> >> mapreduce-...@hadoop.apache.org" ;
> >> hdfs-dev 
> >>  Sent: Thursday, February 18, 2016 4:35 PM
> >>  Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Hi all,
> >>
> >> Reviving this thread. I've seen renewed interest in a trunk release
> >>since
> >> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
> >>the
> >> shell script rewrite, and many other improvements, I think it's time to
> >> revisit Hadoop 3.0 release plans.
> >>
> >> My overall plan is still the same as in my original email: a series of
> >> regular alpha releases leading up to beta and GA. Alpha releases make it
> >> easier for downstreams to integrate with our code, and making them
> >>regular
> >> means features can be included when they are ready.
> >>
> >> I know there are some incompatible changes waiting in the wings
> >> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> >> HADOOP-9991 bumping dependency versions) that would be good to get in.
> >>If
> >> you have changes like this, please set the target version to 3.0.0 and
> >>mark
> >> them "Incompatible". We can use this JIRA query to track:
> >>
> >>
> >>
> >>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
> >>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
> >>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
> >>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
> >>
> >> There's some release-related stuff that needs to be sorted out (namely,
> >>the
> >> new CHANGES.txt and release note generation from Yetus), but I'd
> >> tentatively like to roll the first alpha a month out, so third week of
> >> March.
> >>
> >> Best,
> >> Andrew
> >>
> >> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
> >>wrote:
> >>
> >> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> >> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> >> > source version to JDK8.
> >> >
> >> > Also, note that releasing from trunk is a way of achieving #3, it's
> >> > not a way of abandoning it.
> >> >
> >> >
> >> >
> >> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang  >
> >> > wrote:
> >> > > Hi Raymie,
> >> > >
> >> > > Konst proposed just releasing off of trunk rather than cutting a
> >> > branch-2,
> >> > > and there was general agreement there. So, consider #3 abandoned.
> >>1&2
> >> can
> >> > > be achieved at the same time, we just need to avoid using JDK8
> >>language
> >> > > features in trunk so things can be backported.
> >> > >
> >> > > Best,
> >> > > Andrew
> >> > >
> >> > > On Mon, Mar 9,

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Gangumalla, Uma

Yes. I think starting 3.0 release with alpha is good idea. So it would get
some time to reach the beta or GA.

+1 for the plan.

For the compatibility purposes and as current stable versions, we should
continue 2.x releases anyway.

Thanks Andrew for starting the thread.

Regards,
Uma

On 2/18/16, 3:04 PM, "Andrew Wang"  wrote:

>Hi Kihwal,
>
>I think there's still value in continuing the 2.x releases. 3.x comes with
>the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
>be beta or GA for some number of months. In the meanwhile, it'd be good to
>keep putting out regular, stable 2.x releases.
>
>Best,
>Andrew
>
>
>On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
>wrote:
>
>> Moving Hadoop 3 forward sounds fine. If EC is one of the main
>>motivations,
>> are we getting rid of branch-2.8?
>>
>> Kihwal
>>
>>   From: Andrew Wang 
>>  To: "common-...@hadoop.apache.org" 
>> Cc: "yarn-...@hadoop.apache.org" ; "
>> mapreduce-...@hadoop.apache.org" ;
>> hdfs-dev 
>>  Sent: Thursday, February 18, 2016 4:35 PM
>>  Subject: Re: Looking to a Hadoop 3 release
>>
>> Hi all,
>>
>> Reviving this thread. I've seen renewed interest in a trunk release
>>since
>> HDFS erasure coding has not yet made it to branch-2. Along with JDK8,
>>the
>> shell script rewrite, and many other improvements, I think it's time to
>> revisit Hadoop 3.0 release plans.
>>
>> My overall plan is still the same as in my original email: a series of
>> regular alpha releases leading up to beta and GA. Alpha releases make it
>> easier for downstreams to integrate with our code, and making them
>>regular
>> means features can be included when they are ready.
>>
>> I know there are some incompatible changes waiting in the wings
>> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
>> HADOOP-9991 bumping dependency versions) that would be good to get in.
>>If
>> you have changes like this, please set the target version to 3.0.0 and
>>mark
>> them "Incompatible". We can use this JIRA query to track:
>>
>>
>> 
>>https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HD
>>FS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%
>>223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flag
>>s%22%3D%22Incompatible%20change%22%20order%20by%20priority
>>
>> There's some release-related stuff that needs to be sorted out (namely,
>>the
>> new CHANGES.txt and release note generation from Yetus), but I'd
>> tentatively like to roll the first alpha a month out, so third week of
>> March.
>>
>> Best,
>> Andrew
>>
>> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata 
>>wrote:
>>
>> > Avoiding the use of JDK8 language features (and, presumably, APIs)
>> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
>> > source version to JDK8.
>> >
>> > Also, note that releasing from trunk is a way of achieving #3, it's
>> > not a way of abandoning it.
>> >
>> >
>> >
>> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
>> > wrote:
>> > > Hi Raymie,
>> > >
>> > > Konst proposed just releasing off of trunk rather than cutting a
>> > branch-2,
>> > > and there was general agreement there. So, consider #3 abandoned.
>>1&2
>> can
>> > > be achieved at the same time, we just need to avoid using JDK8
>>language
>> > > features in trunk so things can be backported.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
>> > wrote:
>> > >
>> > >> In this (and the related threads), I see the following three
>> > requirements:
>> > >>
>> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>> > >>
>> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
>> > >> feature sets as 3.x."
>> > >>
>> > >> 3. Avoid the "risk of split-brain behavior" by "minimize
>>backporting
>> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already
>>tedious.
>> > >> Adding a branch-3, branch-3.x would be obnoxious."
>> > >>
>> > >> These three cannot be achieved at the same time.  Which do we
>>abandon?
>> > >>
>> > >>
>> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia
>>
>> > >> wrote:
>> > >> >
>> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth 
>> wrote:
>> > >> >>
>> > >> >> 2) Simplification of configs - potentially separating client
>>side
>> > >> configs
>> > >> >> and those used by daemons. This is another source of perpetual
>> > confusion
>> > >> >> for users.
>> > >> > + 1 on this.
>> > >> >
>> > >> > sanjay
>> > >>
>> >
>>
>>
>>

RE: Looking to a Hadoop 3 release

2016-02-18 Thread Zheng, Kai

Thanks Andrew for driving this. Wonder if it's a good chance for HADOOP-12579 
(Deprecate and remove WriteableRPCEngine) to be in. Note it's not an 
incompatible change, but feel better to be done in the major release.

Regards,
Kai

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Friday, February 19, 2016 7:04 AM
To: hdfs-dev@hadoop.apache.org; Kihwal Lee 
Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with the 
incompatible bump to a JDK8 runtime, and also the fact that 3.x won't be beta 
or GA for some number of months. In the meanwhile, it'd be good to keep putting 
out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main 
> motivations, are we getting rid of branch-2.8?
>
> Kihwal
>
>   From: Andrew Wang 
>  To: "common-...@hadoop.apache.org" 
> Cc: "yarn-...@hadoop.apache.org" ; "
> mapreduce-...@hadoop.apache.org" ;
> hdfs-dev 
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release 
> since HDFS erasure coding has not yet made it to branch-2. Along with 
> JDK8, the shell script rewrite, and many other improvements, I think 
> it's time to revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of 
> regular alpha releases leading up to beta and GA. Alpha releases make 
> it easier for downstreams to integrate with our code, and making them 
> regular means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings (i.e. 
> HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. 
> If you have changes like this, please set the target version to 3.0.0 
> and mark them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%2
> 0HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%
> 3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hado
> op%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out 
> (namely, the new CHANGES.txt and release note generation from Yetus), 
> but I'd tentatively like to roll the first alpha a month out, so third 
> week of March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata  wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs) 
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK 
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's 
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > 
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 
> > > 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 
> > > language features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > > 
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with 
> > >> similar feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize 
> > >> backporting headaches. Pulling trunk > branch-2 > branch-2.x is already 
> > >> tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> 
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth 
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client 
> > >> >> side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Andrew Wang

Hi Kihwal,

I think there's still value in continuing the 2.x releases. 3.x comes with
the incompatible bump to a JDK8 runtime, and also the fact that 3.x won't
be beta or GA for some number of months. In the meanwhile, it'd be good to
keep putting out regular, stable 2.x releases.

Best,
Andrew


On Thu, Feb 18, 2016 at 2:50 PM, Kihwal Lee 
wrote:

> Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations,
> are we getting rid of branch-2.8?
>
> Kihwal
>
>   From: Andrew Wang 
>  To: "common-...@hadoop.apache.org" 
> Cc: "yarn-...@hadoop.apache.org" ; "
> mapreduce-...@hadoop.apache.org" ;
> hdfs-dev 
>  Sent: Thursday, February 18, 2016 4:35 PM
>  Subject: Re: Looking to a Hadoop 3 release
>
> Hi all,
>
> Reviving this thread. I've seen renewed interest in a trunk release since
> HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
> shell script rewrite, and many other improvements, I think it's time to
> revisit Hadoop 3.0 release plans.
>
> My overall plan is still the same as in my original email: a series of
> regular alpha releases leading up to beta and GA. Alpha releases make it
> easier for downstreams to integrate with our code, and making them regular
> means features can be included when they are ready.
>
> I know there are some incompatible changes waiting in the wings
> (i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
> HADOOP-9991 bumping dependency versions) that would be good to get in. If
> you have changes like this, please set the target version to 3.0.0 and mark
> them "Incompatible". We can use this JIRA query to track:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority
>
> There's some release-related stuff that needs to be sorted out (namely, the
> new CHANGES.txt and release note generation from Yetus), but I'd
> tentatively like to roll the first alpha a month out, so third week of
> March.
>
> Best,
> Andrew
>
> On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata  wrote:
>
> > Avoiding the use of JDK8 language features (and, presumably, APIs)
> > means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> > source version to JDK8.
> >
> > Also, note that releasing from trunk is a way of achieving #3, it's
> > not a way of abandoning it.
> >
> >
> >
> > On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> > wrote:
> > > Hi Raymie,
> > >
> > > Konst proposed just releasing off of trunk rather than cutting a
> > branch-2,
> > > and there was general agreement there. So, consider #3 abandoned. 1&2
> can
> > > be achieved at the same time, we just need to avoid using JDK8 language
> > > features in trunk so things can be backported.
> > >
> > > Best,
> > > Andrew
> > >
> > > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> > wrote:
> > >
> > >> In this (and the related threads), I see the following three
> > requirements:
> > >>
> > >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> > >>
> > >> 2. "We'll still be releasing 2.x releases for a while, with similar
> > >> feature sets as 3.x."
> > >>
> > >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> > >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> > >> Adding a branch-3, branch-3.x would be obnoxious."
> > >>
> > >> These three cannot be achieved at the same time.  Which do we abandon?
> > >>
> > >>
> > >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> > >> wrote:
> > >> >
> > >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth 
> wrote:
> > >> >>
> > >> >> 2) Simplification of configs - potentially separating client side
> > >> configs
> > >> >> and those used by daemons. This is another source of perpetual
> > confusion
> > >> >> for users.
> > >> > + 1 on this.
> > >> >
> > >> > sanjay
> > >>
> >
>
>
>

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Kihwal Lee

Moving Hadoop 3 forward sounds fine. If EC is one of the main motivations, are 
we getting rid of branch-2.8? 

Kihwal

  From: Andrew Wang 
 To: "common-...@hadoop.apache.org"  
Cc: "yarn-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; hdfs-dev 

 Sent: Thursday, February 18, 2016 4:35 PM
 Subject: Re: Looking to a Hadoop 3 release

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata  wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth  wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

2016-02-18 Thread Andrew Wang

Hi all,

Reviving this thread. I've seen renewed interest in a trunk release since
HDFS erasure coding has not yet made it to branch-2. Along with JDK8, the
shell script rewrite, and many other improvements, I think it's time to
revisit Hadoop 3.0 release plans.

My overall plan is still the same as in my original email: a series of
regular alpha releases leading up to beta and GA. Alpha releases make it
easier for downstreams to integrate with our code, and making them regular
means features can be included when they are ready.

I know there are some incompatible changes waiting in the wings
(i.e. HDFS-6984 making FileStatus a PB rather than Writable, some of
HADOOP-9991 bumping dependency versions) that would be good to get in. If
you have changes like this, please set the target version to 3.0.0 and mark
them "Incompatible". We can use this JIRA query to track:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20%22Target%20Version%2Fs%22%20%3D%20%223.0.0%22%20and%20resolution%3D%22unresolved%22%20and%20%22Hadoop%20Flags%22%3D%22Incompatible%20change%22%20order%20by%20priority

There's some release-related stuff that needs to be sorted out (namely, the
new CHANGES.txt and release note generation from Yetus), but I'd
tentatively like to roll the first alpha a month out, so third week of
March.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:23 PM, Raymie Stata  wrote:

> Avoiding the use of JDK8 language features (and, presumably, APIs)
> means you've abandoned #1, i.e., you haven't (really) bumped the JDK
> source version to JDK8.
>
> Also, note that releasing from trunk is a way of achieving #3, it's
> not a way of abandoning it.
>
>
>
> On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang 
> wrote:
> > Hi Raymie,
> >
> > Konst proposed just releasing off of trunk rather than cutting a
> branch-2,
> > and there was general agreement there. So, consider #3 abandoned. 1&2 can
> > be achieved at the same time, we just need to avoid using JDK8 language
> > features in trunk so things can be backported.
> >
> > Best,
> > Andrew
> >
> > On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata 
> wrote:
> >
> >> In this (and the related threads), I see the following three
> requirements:
> >>
> >> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
> >>
> >> 2. "We'll still be releasing 2.x releases for a while, with similar
> >> feature sets as 3.x."
> >>
> >> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> >> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> >> Adding a branch-3, branch-3.x would be obnoxious."
> >>
> >> These three cannot be achieved at the same time.  Which do we abandon?
> >>
> >>
> >> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> >> wrote:
> >> >
> >> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth  wrote:
> >> >>
> >> >> 2) Simplification of configs - potentially separating client side
> >> configs
> >> >> and those used by daemons. This is another source of perpetual
> confusion
> >> >> for users.
> >> > + 1 on this.
> >> >
> >> > sanjay
> >>
>

Re: Looking to a Hadoop 3 release

2015-03-09 Thread Andrew Wang

Hi Raymie,

Konst proposed just releasing off of trunk rather than cutting a branch-2,
and there was general agreement there. So, consider #3 abandoned. 1&2 can
be achieved at the same time, we just need to avoid using JDK8 language
features in trunk so things can be backported.

Best,
Andrew

On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata  wrote:

> In this (and the related threads), I see the following three requirements:
>
> 1. "Bump the source JDK version to JDK8" (ie, drop JDK7 support).
>
> 2. "We'll still be releasing 2.x releases for a while, with similar
> feature sets as 3.x."
>
> 3. Avoid the "risk of split-brain behavior" by "minimize backporting
> headaches. Pulling trunk > branch-2 > branch-2.x is already tedious.
> Adding a branch-3, branch-3.x would be obnoxious."
>
> These three cannot be achieved at the same time.  Which do we abandon?
>
>
> On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia 
> wrote:
> >
> >> On Mar 5, 2015, at 3:21 PM, Siddharth Seth  wrote:
> >>
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> > + 1 on this.
> >
> > sanjay
>

Re: Looking to a Hadoop 3 release

2015-03-09 Thread sanjay Radia


> On Mar 5, 2015, at 3:21 PM, Siddharth Seth  wrote:
> 
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
+ 1 on this.

sanjay

Re: Looking to a Hadoop 3 release

2015-03-09 Thread Vinod Kumar Vavilapalli

On Mar 6, 2015, at 5:20 PM, Chris Douglas  wrote:

> On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
>  wrote:
>> I'd encourage everyone to post their wish list on the Roadmap wiki that 
>> *warrants* making incompatible changes forcing us to go 3.x.
> 
> This is a useful exercise, but not a prerequisite to releasing 3.0.0
> as an alpha off of trunk, right? Andrew summarized the operating
> assumptions for anyone working on it: rolling upgrades still work,
> wire compat is preserved, breaking changes may get rolled back when
> branch-3 is in beta (so be very conservative, notify others loudly).
> This applies to branches merged to trunk, also.

Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, 
because after that we will be back to restricting incompatible changes on 3.x 
line and we have to say no to features that need API breakage after that. If 
others feel there are features that warrant incompatibility, we should hear 
about them for inclusion in such a 3.x release. Till now, the operating 
assumption was to not break anything as much as possible. If we are opening the 
window on incompatibilities in 3.x, might as well get everyone to think about 
stuff that they want.

>> +1 to Jason's comments on general. We can keep rolling alphas that 
>> downstream can pick up, but I'd also like us to clarify the exit criterion 
>> for a GA release of 3.0 and its relation to the life of 2.x if we are going 
>> this route. This brings us back to the roadmap discussion, and a collective 
>> agreement about a logical step at a future point in time where we say we 
>> have enough incompatible features in 3.x that we can stop putting more of 
>> them and start stabilizing it.
> 
> We'll have this discussion again. We don't need to reach consensus on
> the roadmap, just that each artifact reflects the output of the
> project.

Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just 
requesting others to put their wish list up.

>> Irrespective of that, here is my proposal in the interim:
>> - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
>> atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking 
>> up the gauntlet on 3.0.
>> - Continue working on the classpath isolation effort and try making it as 
>> compatible as is possible for users to opt in and migrate easily.
> 
> +1 for 2.x, but again I don't understand the sequencing. -C

There isn't. I was saying "Irrespective of that"..

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

2015-03-07 Thread Eric Yang

breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> >> To summarize:+1 for a community-discussed roadmap of what we're
> breaking in Hadoop 3 and why it's worth it for users
> >> -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> >> +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> >> Jason
> >>  From: Andrew Wang 
> >> To: "hdfs-dev@hadoop.apache.org" 
> >> Cc: "common-...@hadoop.apache.org" ; "
> mapreduce-...@hadoop.apache.org" ; "
> yarn-...@hadoop.apache.org" 
> >> Sent: Wednesday, March 4, 2015 12:15 PM
> >> Subject: Re: Looking to a Hadoop 3 release
> >>
> >> Let's not dismiss this quite so handily.
> >>
> >> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while
> we
> >> could make classpath isolation opt-in via configuration, what we really
> >> want longer term is to have it on by default (or just always on). Stack
> in
> >> particular points out the practical difficulties in using an opt-in
> method
> >> in 2.x from a downstream project perspective. It's not pretty.
> >>
> >> The plan that both Sean and Jason propose (which I support) is to have
> an
> >> opt-in solution in 2.x, bake it there, then turn it on by default
> >> (incompatible) in a new major release. I think this lines up well with
> my
> >> proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> >> to help with 2.x release management if that would help with testing this
> >> feature.
> >>
> >> Even setting aside classpath isolation, a new major release is still
> >> justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> >> historically the voice of the user in our community, just highlighted
> it as
> >> a major compatibility issue, and myself and Tucu have also expressed our
> >> very strong concerns about bumping this in a minor release. 2.7's bump
> is a
> >> unique exception, but this is not something to be cited as precedent or
> >> policy.
> >>
> >> Where does this resistance to a new major release stem from? As I've
> >> described from the beginning, this will look basically like a 2.x
> release,
> >> except for the inclusion of classpath isolation by default and target
> >> version JDK8. I've expressed my desire to maintain API and wire
> >> compatibility, and we can audit the set of incompatible changes in
> trunk to
> >> ensure this. My proposal for doing alpha and beta releases leading up
> to GA
> >> also gives downstreams a nice amount of time for testing and validation.
> >>
> >> Regards,
> >> Andrew
> >>
> >>
> >>
> >> On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy 
> wrote:
> >>
> >>> Awesome, looks like we can just do this in a compatible manner -
> nothing
> >>> else on the list seems like it warrants a (premature) major release.
> >>>
> >>> Thanks Vinod.
> >>>
> >>> Arun
> >>>
> >>> 
> >>> From: Vinod Kumar Vavilapalli 
> >>> Sent: Tuesday, March 03, 2015 2:30 PM
> >>> To: common-...@hadoop.apache.org
> >>> Cc: hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> >>> yarn-...@hadoop.apache.org
> >>> Subject: Re: Looking to a Hadoop 3 release
> >>>
> >>> I started pitching in more on that JIRA.
> >>>
> >>> To add, I think we can and should strive for doing this in a compatible
> >>> manner, whatever the approach. Marking and calling it incompatible
> before
> >>> we see proposal/patch seems premature to me. Commented the same on
> JIRA:
> >>>
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> >>> .
> >>>
> >>> Thanks
> >>> +Vinod
> >>>
> >>> On Mar 2, 2015, at 8:08 PM, Andrew Wang   >>> andrew.w...@cloudera.com>> wrote:
> >>>
> >>> Regarding classpath isolation, based on what I hear from our customers,
> >>> it's still a big problem (even after the MR classloader work). The
> latest
> >>> Jackson version bump was quite painful for our downstream projects,
> and the
> >>> HDFS client still leaks a lot of dependencies. Would welcome more
> >>> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have
> already
> >>> chimed in.
> >>>
> >>>
> >>
> >>
> >
>

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Chris Douglas

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli
 wrote:
> I'd encourage everyone to post their wish list on the Roadmap wiki that 
> *warrants* making incompatible changes forcing us to go 3.x.

This is a useful exercise, but not a prerequisite to releasing 3.0.0
as an alpha off of trunk, right? Andrew summarized the operating
assumptions for anyone working on it: rolling upgrades still work,
wire compat is preserved, breaking changes may get rolled back when
branch-3 is in beta (so be very conservative, notify others loudly).
This applies to branches merged to trunk, also.

> +1 to Jason's comments on general. We can keep rolling alphas that downstream 
> can pick up, but I'd also like us to clarify the exit criterion for a GA 
> release of 3.0 and its relation to the life of 2.x if we are going this 
> route. This brings us back to the roadmap discussion, and a collective 
> agreement about a logical step at a future point in time where we say we have 
> enough incompatible features in 3.x that we can stop putting more of them and 
> start stabilizing it.

We'll have this discussion again. We don't need to reach consensus on
the roadmap, just that each artifact reflects the output of the
project.

> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
> atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking 
> up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as 
> compatible as is possible for users to opt in and migrate easily.

+1 for 2.x, but again I don't understand the sequencing. -C

> On Mar 5, 2015, at 1:44 PM, Jason Lowe  wrote:
>
>> I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
>> maintaining yet another release line and conscious of the incompatibilities 
>> going into that release line.
>> For the former, I would really rather not see a branch-3 cut so soon.  It's 
>> yet another line onto which to cherry-pick, and I don't see why we need to 
>> add this overhead at such an early phase.  We should only create branch-3 
>> when there's an incompatible change that the community wants and it should 
>> _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can 
>> develop 3.0 alphas and betas on trunk and release from trunk in the interim. 
>>  IMHO we need to stop treating trunk as a place to exile patches.
>>
>> For the latter, I think as a community we need to evaluate the benefits of 
>> breaking compatibility against the costs of migrating.  Each time we break 
>> compatibility we create a hurdle for people to jump when they move to the 
>> new release, and we should make those hurdles worth their time.  For 
>> example, wire-compatibility has been mentioned as part of this.  Any feature 
>> that breaks wire compatibility better be absolutely amazing, as it creates a 
>> huge hurdle for people to jump.
>> To summarize:+1 for a community-discussed roadmap of what we're breaking in 
>> Hadoop 3 and why it's worth it for users
>> -1 for creating branch-3 now, we can release from trunk until the next 
>> incompatibility for Hadoop 4 arrives
>> +1 for baking classpath isolation as opt-in on 2.x and eventually default on 
>> in 3.0
>> Jason
>>  From: Andrew Wang 
>> To: "hdfs-dev@hadoop.apache.org" 
>> Cc: "common-...@hadoop.apache.org" ; 
>> "mapreduce-...@hadoop.apache.org" ; 
>> "yarn-...@hadoop.apache.org" 
>> Sent: Wednesday, March 4, 2015 12:15 PM
>> Subject: Re: Looking to a Hadoop 3 release
>>
>> Let's not dismiss this quite so handily.
>>
>> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
>> could make classpath isolation opt-in via configuration, what we really
>> want longer term is to have it on by default (or just always on). Stack in
>> particular points out the practical difficulties in using an opt-in method
>> in 2.x from a downstream project perspective. It's not pretty.
>>
>> The plan that both Sean and Jason propose (which I support) is to have an
>> opt-in solution in 2.x, bake it there, then turn it on by default
>> (incompatible) in a new major release. I think this lines up well with my
>> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
>> to help with 2.x release management if that would help with testing this
>> feature.
>>
>> Even setting aside classpath isolation, a new major release is still
>> justified by JDK8. Somehow this is being ignored in the

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Andrew Wang

Hey Vinod,

I'm roughly okay with that plan. One question though, why gate JDK8 on a
2.8 and 2.9? Based on the status of HADOOP-11090, it sounds like branch-2
already runs okay on JDK8. Our past experience moving from JDK6 to JDK7 was
also very smooth except for JUnit ordering.

As an additional datapoint, Cloudera has already validated CDH5 on JDK8 and
supports it as a runtime:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_versions.html?scroll=concept_pdd_kzf_vp_unique_1

Best,
Andrew

On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> I'd encourage everyone to post their wish list on the Roadmap wiki that
> *warrants* making incompatible changes forcing us to go 3.x.
>
> +1 to Jason's comments on general. We can keep rolling alphas that
> downstream can pick up, but I'd also like us to clarify the exit criterion
> for a GA release of 3.0 and its relation to the life of 2.x if we are going
> this route. This brings us back to the roadmap discussion, and a collective
> agreement about a logical step at a future point in time where we say we
> have enough incompatible features in 3.x that we can stop putting more of
> them and start stabilizing it.
>
> Irrespective of that, here is my proposal in the interim:
>  - Run JDK7 + JDK8 first in a compatible manner like I mentioned before
> for atleast two releases in branch-2: say 2.8 and 2.9 before we consider
> taking up the gauntlet on 3.0.
>  - Continue working on the classpath isolation effort and try making it as
> compatible as is possible for users to opt in and migrate easily.
>
> Thanks,
> +Vinod
>
> On Mar 5, 2015, at 1:44 PM, Jason Lowe 
> wrote:
>
> > I'm OK with a 3.0.0 release as long as we are minimizing the pain of
> maintaining yet another release line and conscious of the incompatibilities
> going into that release line.
> > For the former, I would really rather not see a branch-3 cut so soon.
> It's yet another line onto which to cherry-pick, and I don't see why we
> need to add this overhead at such an early phase.  We should only create
> branch-3 when there's an incompatible change that the community wants and
> it should _not_ go into the next major release (i.e.: it's for Hadoop
> 4.0).  We can develop 3.0 alphas and betas on trunk and release from trunk
> in the interim.  IMHO we need to stop treating trunk as a place to exile
> patches.
> >
> > For the latter, I think as a community we need to evaluate the benefits
> of breaking compatibility against the costs of migrating.  Each time we
> break compatibility we create a hurdle for people to jump when they move to
> the new release, and we should make those hurdles worth their time.  For
> example, wire-compatibility has been mentioned as part of this.  Any
> feature that breaks wire compatibility better be absolutely amazing, as it
> creates a huge hurdle for people to jump.
> > To summarize:+1 for a community-discussed roadmap of what we're breaking
> in Hadoop 3 and why it's worth it for users
> > -1 for creating branch-3 now, we can release from trunk until the next
> incompatibility for Hadoop 4 arrives
> > +1 for baking classpath isolation as opt-in on 2.x and eventually
> default on in 3.0
> > Jason
> >      From: Andrew Wang 
> > To: "hdfs-dev@hadoop.apache.org" 
> > Cc: "common-...@hadoop.apache.org" ; "
> mapreduce-...@hadoop.apache.org" ; "
> yarn-...@hadoop.apache.org" 
> > Sent: Wednesday, March 4, 2015 12:15 PM
> > Subject: Re: Looking to a Hadoop 3 release
> >
> > Let's not dismiss this quite so handily.
> >
> > Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> > could make classpath isolation opt-in via configuration, what we really
> > want longer term is to have it on by default (or just always on). Stack
> in
> > particular points out the practical difficulties in using an opt-in
> method
> > in 2.x from a downstream project perspective. It's not pretty.
> >
> > The plan that both Sean and Jason propose (which I support) is to have an
> > opt-in solution in 2.x, bake it there, then turn it on by default
> > (incompatible) in a new major release. I think this lines up well with my
> > proposal of some alphas and betas leading up to a GA 3.x. I'm also
> willing
> > to help with 2.x release management if that would help with testing this
> > feature.
> >
> > Even setting aside classpath isolation, a new major release is still
> > justified by JDK8. Somehow this is being ignored in the discussion.
> Allen,
> >

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Vinod Kumar Vavilapalli

I'd encourage everyone to post their wish list on the Roadmap wiki that 
*warrants* making incompatible changes forcing us to go 3.x.

+1 to Jason's comments on general. We can keep rolling alphas that downstream 
can pick up, but I'd also like us to clarify the exit criterion for a GA 
release of 3.0 and its relation to the life of 2.x if we are going this route. 
This brings us back to the roadmap discussion, and a collective agreement about 
a logical step at a future point in time where we say we have enough 
incompatible features in 3.x that we can stop putting more of them and start 
stabilizing it.

Irrespective of that, here is my proposal in the interim:
 - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for 
atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up 
the gauntlet on 3.0.
 - Continue working on the classpath isolation effort and try making it as 
compatible as is possible for users to opt in and migrate easily.

Thanks,
+Vinod

On Mar 5, 2015, at 1:44 PM, Jason Lowe  wrote:

> I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
> maintaining yet another release line and conscious of the incompatibilities 
> going into that release line.
> For the former, I would really rather not see a branch-3 cut so soon.  It's 
> yet another line onto which to cherry-pick, and I don't see why we need to 
> add this overhead at such an early phase.  We should only create branch-3 
> when there's an incompatible change that the community wants and it should 
> _not_ go into the next major release (i.e.: it's for Hadoop 4.0).  We can 
> develop 3.0 alphas and betas on trunk and release from trunk in the interim.  
> IMHO we need to stop treating trunk as a place to exile patches.
> 
> For the latter, I think as a community we need to evaluate the benefits of 
> breaking compatibility against the costs of migrating.  Each time we break 
> compatibility we create a hurdle for people to jump when they move to the new 
> release, and we should make those hurdles worth their time.  For example, 
> wire-compatibility has been mentioned as part of this.  Any feature that 
> breaks wire compatibility better be absolutely amazing, as it creates a huge 
> hurdle for people to jump.
> To summarize:+1 for a community-discussed roadmap of what we're breaking in 
> Hadoop 3 and why it's worth it for users
> -1 for creating branch-3 now, we can release from trunk until the next 
> incompatibility for Hadoop 4 arrives
> +1 for baking classpath isolation as opt-in on 2.x and eventually default on 
> in 3.0
> Jason
>  From: Andrew Wang 
> To: "hdfs-dev@hadoop.apache.org"  
> Cc: "common-...@hadoop.apache.org" ; 
> "mapreduce-...@hadoop.apache.org" ; 
> "yarn-...@hadoop.apache.org"  
> Sent: Wednesday, March 4, 2015 12:15 PM
> Subject: Re: Looking to a Hadoop 3 release
> 
> Let's not dismiss this quite so handily.
> 
> Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
> could make classpath isolation opt-in via configuration, what we really
> want longer term is to have it on by default (or just always on). Stack in
> particular points out the practical difficulties in using an opt-in method
> in 2.x from a downstream project perspective. It's not pretty.
> 
> The plan that both Sean and Jason propose (which I support) is to have an
> opt-in solution in 2.x, bake it there, then turn it on by default
> (incompatible) in a new major release. I think this lines up well with my
> proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
> to help with 2.x release management if that would help with testing this
> feature.
> 
> Even setting aside classpath isolation, a new major release is still
> justified by JDK8. Somehow this is being ignored in the discussion. Allen,
> historically the voice of the user in our community, just highlighted it as
> a major compatibility issue, and myself and Tucu have also expressed our
> very strong concerns about bumping this in a minor release. 2.7's bump is a
> unique exception, but this is not something to be cited as precedent or
> policy.
> 
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
> 
> Regards,
>

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Vinod Kumar Vavilapalli

Yes, these are the kind of enhancements that need to be proposed and discussed 
for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth  wrote:


> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
> 
> Thanks
> - Sid
> 
> 
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran 
> wrote:
> 
>> Sorry, outlook dequoted Alejandros's comments.
>> 
>> Let me try again with his comments in italic and proofreading of mine
>> 
>> On 05/03/2015 13:59, "Steve Loughran" > ste...@hortonworks.com>> wrote:
>> 
>> 
>> 
>> On 05/03/2015 13:05, "Alejandro Abdelnur" > tuc...@gmail.com>> wrote:
>> 
>> IMO, if part of the community wants to take on the responsibility and work
>> that takes to do a new major release, we should not discourage them from
>> doing that.
>> 
>> Having multiple major branches active is a standard practice.
>> 
>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>> long time to get out, and during that time 0.21, 0.22, got released and
>> ignored; 0.23 picked up and used in production.
>> 
>> The 2.04-alpha release was more of a troublespot as it got picked up
>> widely enough to be used in products, and changes were made between that
>> alpha & 2.2 itself which raised compatibility issues.
>> 
>> For 3.x I'd propose
>> 
>> 
>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>> releases to shipping. Best effort, but not to the extent that it gets in
>> the way. More succinctly: we will care more about seamless migration from
>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
>> phase
>> 
>> As well as backwards compatibility, we need to think about Forwards
>> compatibility, with the goal being:
>> 
>> Any app written/shipped with the 3.x release binaries (JAR and native)
>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>> where y>=x  and is-release(x) and is-release(y)
>> 
>> That's important, as it means all server-side changes in 3.x which are
>> expected to to mandate client-side updates: protocols, HDFS erasure
>> decoding, security features, must be considered complete and stable before
>> we can say is-release(x). In an ideal world, we'll even get the semantics
>> right with tests to show this.
>> 
>> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
>> it's only one of the features, and given there's not any design doc on that
>> JIRA, way too immature to set a release schedule on. An alpha schedule with
>> no-guarantees and a regular alpha roll, could be viable, as new features go
>> in and can then be used to experimentally try this stuff in branches of
>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>> will be transitive downstream.
>> 
>> 
>> This time around we are not replacing the guts as we did from Hadoop 1 to
>> Hadoop 2, but superficial surgery to address issues were not considered (or
>> was too much to take on top of the guts transplant).
>> 
>> For the split brain concern, we did a great of job maintaining Hadoop 1 and
>> Hadoop 2 until Hadoop 1 faded away.
>> 
>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>> compatibility.
>> 
>> 
>> Based on that experience I would say that the coexistence of Hadoop 2 and
>> Hadoop 3 will be much less demanding/traumatic.
>> 
>> The re-layout of all the source trees was a major change there, assuming
>> there's no refactoring or switch of build tools then picking things back
>> will be tractable
>> 
>> 
>> Also, to facilitate the coexistence we should limit Java language features
>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
>> we can remove this limitation.
>> 
>> +1; setting javac.version will fix this
>> 
>> What is nice about having java 8 as the base JVM is that it means you can
>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>> and libs can use all Java 8 features they want to.
>> 
>> There's one policy change to consider there which is possibly, just
>> possibly, we could allow new modules

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Andrew Wang

Since these dependency bumps are very disruptive to downstreams, I want to
predicate upgrading our deps on having classpath isolation on. I think
that's what Tucu was getting at.

Best,
Andrew

On Fri, Mar 6, 2015 at 8:01 AM, Allen Wittenauer  wrote:

>
> Right, but that doesn't really answer the question….
>
> On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur  wrote:
>
> > If classloader isolation is in place, then dependency versions can freely
> > be upgraded as won't pollute apps space (things get trickier if there is
> an
> > ON/OFF switch).
> >
> > On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer 
> wrote:
> >
> >>
> >> Is there going to be a general upgrade of dependencies?  I'm thinking of
> >> jetty & jackson in particular.
> >>
> >> On Mar 5, 2015, at 5:24 PM, Andrew Wang 
> wrote:
> >>
> >>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> >>> page. In addition to the two things I've been pushing, I also looked
> >>> through Allen's list (thanks Allen for making this) and picked out the
> >>> shell script rewrite and the removal of HFTP as big changes. This would
> >> be
> >>> the place to propose features for inclusion in 3.x, I'd particularly
> >>> appreciate help on the YARN/MR side.
> >>>
> >>> Based on what I'm hearing, let me modulate my proposal to the
> following:
> >>>
> >>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> >>> changes don't look that scary, so I think this is fine. This does mean
> we
> >>> need to be more rigorous before merging branches to trunk. I think
> >>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> >> would
> >>> be very helpful in this regard.
> >>> - We do not include anything to break wire compatibility unless (as
> Jason
> >>> says) it's an unbelievably awesome feature.
> >>> - No harm in rolling alphas from trunk, as it doesn't lock us to
> anything
> >>> compatibility wise. Downstreams like releases.
> >>>
> >>> I'll take Steve's advice about not locking GA to a given date, but I
> also
> >>> share his belief that we can alpha/beta/GA faster than it took for
> Hadoop
> >>> 2. Let's roll some intermediate releases, work on the roadmap items,
> and
> >>> see how we're feeling in a few months.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth 
> wrote:
> >>>
>  I think it'll be useful to have a discussion about what else people
> >> would
>  like to see in Hadoop 3.x - especially if the change is potentially
>  incompatible. Also, what we expect the release schedule to be for
> major
>  releases and what triggers them - JVM version, major features, the
> need
> >> for
>  incompatible changes ? Assuming major versions will not be released
> >> every 6
>  months/1 year (adoption time, fairly disruptive for downstream
> projects,
>  and users) -  considering additional features/incompatible changes for
> >> 3.x
>  would be useful.
> 
>  Some features that come to mind immediately would be
>  1) enhancements to the RPC mechanics - specifically support for
> AsynRPC
> >> /
>  two way communication. There's a lot of places where we re-use
> >> heartbeats
>  to send more information than what would be done if the PRC layer
> >> supported
>  these features. Some of this can be done in a compatible manner to the
>  existing RPC sub-system. Others like 2 way communication probably
> >> cannot.
>  After this, having HDFS/YARN actually make use of these changes. The
> >> other
>  consideration is adoption of an alternate system ike gRpc which would
> be
>  incompatible.
>  2) Simplification of configs - potentially separating client side
> >> configs
>  and those used by daemons. This is another source of perpetual
> confusion
>  for users.
> 
>  Thanks
>  - Sid
> 
> 
>  On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <
> ste...@hortonworks.com>
>  wrote:
> 
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran"   > ste...@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur"  > tuc...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
>  work
> > that takes to do a new major release, we should not discourage them
> >> from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take
> a
> > long time to get out, and during that time 0.21, 0.22, got released
> and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, a

Re: Looking to a Hadoop 3 release

2015-03-06 Thread Allen Wittenauer


Right, but that doesn't really answer the question….

On Mar 5, 2015, at 10:23 PM, Alejandro Abdelnur  wrote:

> If classloader isolation is in place, then dependency versions can freely
> be upgraded as won't pollute apps space (things get trickier if there is an
> ON/OFF switch).
> 
> On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer  wrote:
> 
>> 
>> Is there going to be a general upgrade of dependencies?  I'm thinking of
>> jetty & jackson in particular.
>> 
>> On Mar 5, 2015, at 5:24 PM, Andrew Wang  wrote:
>> 
>>> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
>>> page. In addition to the two things I've been pushing, I also looked
>>> through Allen's list (thanks Allen for making this) and picked out the
>>> shell script rewrite and the removal of HFTP as big changes. This would
>> be
>>> the place to propose features for inclusion in 3.x, I'd particularly
>>> appreciate help on the YARN/MR side.
>>> 
>>> Based on what I'm hearing, let me modulate my proposal to the following:
>>> 
>>> - We avoid cutting branch-3, and release off of trunk. The trunk-only
>>> changes don't look that scary, so I think this is fine. This does mean we
>>> need to be more rigorous before merging branches to trunk. I think
>>> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
>> would
>>> be very helpful in this regard.
>>> - We do not include anything to break wire compatibility unless (as Jason
>>> says) it's an unbelievably awesome feature.
>>> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
>>> compatibility wise. Downstreams like releases.
>>> 
>>> I'll take Steve's advice about not locking GA to a given date, but I also
>>> share his belief that we can alpha/beta/GA faster than it took for Hadoop
>>> 2. Let's roll some intermediate releases, work on the roadmap items, and
>>> see how we're feeling in a few months.
>>> 
>>> Best,
>>> Andrew
>>> 
>>> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth  wrote:
>>> 
 I think it'll be useful to have a discussion about what else people
>> would
 like to see in Hadoop 3.x - especially if the change is potentially
 incompatible. Also, what we expect the release schedule to be for major
 releases and what triggers them - JVM version, major features, the need
>> for
 incompatible changes ? Assuming major versions will not be released
>> every 6
 months/1 year (adoption time, fairly disruptive for downstream projects,
 and users) -  considering additional features/incompatible changes for
>> 3.x
 would be useful.
 
 Some features that come to mind immediately would be
 1) enhancements to the RPC mechanics - specifically support for AsynRPC
>> /
 two way communication. There's a lot of places where we re-use
>> heartbeats
 to send more information than what would be done if the PRC layer
>> supported
 these features. Some of this can be done in a compatible manner to the
 existing RPC sub-system. Others like 2 way communication probably
>> cannot.
 After this, having HDFS/YARN actually make use of these changes. The
>> other
 consideration is adoption of an alternate system ike gRpc which would be
 incompatible.
 2) Simplification of configs - potentially separating client side
>> configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 
 Thanks
 - Sid
 
 
 On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran 
 wrote:
 
> Sorry, outlook dequoted Alejandros's comments.
> 
> Let me try again with his comments in italic and proofreading of mine
> 
> On 05/03/2015 13:59, "Steve Loughran"  ste...@hortonworks.com>> wrote:
> 
> 
> 
> On 05/03/2015 13:05, "Alejandro Abdelnur"  tuc...@gmail.com>> wrote:
> 
> IMO, if part of the community wants to take on the responsibility and
 work
> that takes to do a new major release, we should not discourage them
>> from
> doing that.
> 
> Having multiple major branches active is a standard practice.
> 
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
> 
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between
>> that
> alpha & 2.2 itself which raised compatibility issues.
> 
> For 3.x I'd propose
> 
> 
> 1.  Have less longevity of 3.x alpha/beta artifacts
> 2.  Make clear there are no guarantees of compatibility from
>> alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets
>> in
> the way. More succinctly: we will care more about seamless migration
>> from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> 3.

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Alejandro Abdelnur

If classloader isolation is in place, then dependency versions can freely
be upgraded as won't pollute apps space (things get trickier if there is an
ON/OFF switch).

On Thu, Mar 5, 2015 at 9:21 PM, Allen Wittenauer  wrote:

>
> Is there going to be a general upgrade of dependencies?  I'm thinking of
> jetty & jackson in particular.
>
> On Mar 5, 2015, at 5:24 PM, Andrew Wang  wrote:
>
> > I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> > page. In addition to the two things I've been pushing, I also looked
> > through Allen's list (thanks Allen for making this) and picked out the
> > shell script rewrite and the removal of HFTP as big changes. This would
> be
> > the place to propose features for inclusion in 3.x, I'd particularly
> > appreciate help on the YARN/MR side.
> >
> > Based on what I'm hearing, let me modulate my proposal to the following:
> >
> > - We avoid cutting branch-3, and release off of trunk. The trunk-only
> > changes don't look that scary, so I think this is fine. This does mean we
> > need to be more rigorous before merging branches to trunk. I think
> > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches
> would
> > be very helpful in this regard.
> > - We do not include anything to break wire compatibility unless (as Jason
> > says) it's an unbelievably awesome feature.
> > - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> > compatibility wise. Downstreams like releases.
> >
> > I'll take Steve's advice about not locking GA to a given date, but I also
> > share his belief that we can alpha/beta/GA faster than it took for Hadoop
> > 2. Let's roll some intermediate releases, work on the roadmap items, and
> > see how we're feeling in a few months.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth  wrote:
> >
> >> I think it'll be useful to have a discussion about what else people
> would
> >> like to see in Hadoop 3.x - especially if the change is potentially
> >> incompatible. Also, what we expect the release schedule to be for major
> >> releases and what triggers them - JVM version, major features, the need
> for
> >> incompatible changes ? Assuming major versions will not be released
> every 6
> >> months/1 year (adoption time, fairly disruptive for downstream projects,
> >> and users) -  considering additional features/incompatible changes for
> 3.x
> >> would be useful.
> >>
> >> Some features that come to mind immediately would be
> >> 1) enhancements to the RPC mechanics - specifically support for AsynRPC
> /
> >> two way communication. There's a lot of places where we re-use
> heartbeats
> >> to send more information than what would be done if the PRC layer
> supported
> >> these features. Some of this can be done in a compatible manner to the
> >> existing RPC sub-system. Others like 2 way communication probably
> cannot.
> >> After this, having HDFS/YARN actually make use of these changes. The
> other
> >> consideration is adoption of an alternate system ike gRpc which would be
> >> incompatible.
> >> 2) Simplification of configs - potentially separating client side
> configs
> >> and those used by daemons. This is another source of perpetual confusion
> >> for users.
> >>
> >> Thanks
> >> - Sid
> >>
> >>
> >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran 
> >> wrote:
> >>
> >>> Sorry, outlook dequoted Alejandros's comments.
> >>>
> >>> Let me try again with his comments in italic and proofreading of mine
> >>>
> >>> On 05/03/2015 13:59, "Steve Loughran"  >>> ste...@hortonworks.com>> wrote:
> >>>
> >>>
> >>>
> >>> On 05/03/2015 13:05, "Alejandro Abdelnur"  >>> tuc...@gmail.com>> wrote:
> >>>
> >>> IMO, if part of the community wants to take on the responsibility and
> >> work
> >>> that takes to do a new major release, we should not discourage them
> from
> >>> doing that.
> >>>
> >>> Having multiple major branches active is a standard practice.
> >>>
> >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> >>> long time to get out, and during that time 0.21, 0.22, got released and
> >>> ignored; 0.23 picked up and used in production.
> >>>
> >>> The 2.04-alpha release was more of a troublespot as it got picked up
> >>> widely enough to be used in products, and changes were made between
> that
> >>> alpha & 2.2 itself which raised compatibility issues.
> >>>
> >>> For 3.x I'd propose
> >>>
> >>>
> >>>  1.  Have less longevity of 3.x alpha/beta artifacts
> >>>  2.  Make clear there are no guarantees of compatibility from
> alpha/beta
> >>> releases to shipping. Best effort, but not to the extent that it gets
> in
> >>> the way. More succinctly: we will care more about seamless migration
> from
> >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
> >> alpha/beta
> >>> phase
> >>>
>

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Allen Wittenauer


Is there going to be a general upgrade of dependencies?  I'm thinking of jetty 
& jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang  wrote:

> I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
> page. In addition to the two things I've been pushing, I also looked
> through Allen's list (thanks Allen for making this) and picked out the
> shell script rewrite and the removal of HFTP as big changes. This would be
> the place to propose features for inclusion in 3.x, I'd particularly
> appreciate help on the YARN/MR side.
> 
> Based on what I'm hearing, let me modulate my proposal to the following:
> 
> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> changes don't look that scary, so I think this is fine. This does mean we
> need to be more rigorous before merging branches to trunk. I think
> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
> be very helpful in this regard.
> - We do not include anything to break wire compatibility unless (as Jason
> says) it's an unbelievably awesome feature.
> - No harm in rolling alphas from trunk, as it doesn't lock us to anything
> compatibility wise. Downstreams like releases.
> 
> I'll take Steve's advice about not locking GA to a given date, but I also
> share his belief that we can alpha/beta/GA faster than it took for Hadoop
> 2. Let's roll some intermediate releases, work on the roadmap items, and
> see how we're feeling in a few months.
> 
> Best,
> Andrew
> 
> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth  wrote:
> 
>> I think it'll be useful to have a discussion about what else people would
>> like to see in Hadoop 3.x - especially if the change is potentially
>> incompatible. Also, what we expect the release schedule to be for major
>> releases and what triggers them - JVM version, major features, the need for
>> incompatible changes ? Assuming major versions will not be released every 6
>> months/1 year (adoption time, fairly disruptive for downstream projects,
>> and users) -  considering additional features/incompatible changes for 3.x
>> would be useful.
>> 
>> Some features that come to mind immediately would be
>> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
>> two way communication. There's a lot of places where we re-use heartbeats
>> to send more information than what would be done if the PRC layer supported
>> these features. Some of this can be done in a compatible manner to the
>> existing RPC sub-system. Others like 2 way communication probably cannot.
>> After this, having HDFS/YARN actually make use of these changes. The other
>> consideration is adoption of an alternate system ike gRpc which would be
>> incompatible.
>> 2) Simplification of configs - potentially separating client side configs
>> and those used by daemons. This is another source of perpetual confusion
>> for users.
>> 
>> Thanks
>> - Sid
>> 
>> 
>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran 
>> wrote:
>> 
>>> Sorry, outlook dequoted Alejandros's comments.
>>> 
>>> Let me try again with his comments in italic and proofreading of mine
>>> 
>>> On 05/03/2015 13:59, "Steve Loughran" >> ste...@hortonworks.com>> wrote:
>>> 
>>> 
>>> 
>>> On 05/03/2015 13:05, "Alejandro Abdelnur" >> tuc...@gmail.com>> wrote:
>>> 
>>> IMO, if part of the community wants to take on the responsibility and
>> work
>>> that takes to do a new major release, we should not discourage them from
>>> doing that.
>>> 
>>> Having multiple major branches active is a standard practice.
>>> 
>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>>> long time to get out, and during that time 0.21, 0.22, got released and
>>> ignored; 0.23 picked up and used in production.
>>> 
>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>> widely enough to be used in products, and changes were made between that
>>> alpha & 2.2 itself which raised compatibility issues.
>>> 
>>> For 3.x I'd propose
>>> 
>>> 
>>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>>> releases to shipping. Best effort, but not to the extent that it gets in
>>> the way. More succinctly: we will care more about seamless migration from
>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>> alpha/beta
>>> phase
>>> 
>>> As well as backwards compatibility, we need to think about Forwards
>>> compatibility, with the goal being:
>>> 
>>> Any app written/shipped with the 3.x release binaries (JAR and native)
>>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>>> where y>=x  and is-release(x) and is-release(y)
>>> 
>>> That's important, as it means all server-side changes in 3.x which are
>>> expected to to mandate client-side up

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Andrew Wang

I've taken the liberty of adding a Hadoop 3 section to the Roadmap wiki
page. In addition to the two things I've been pushing, I also looked
through Allen's list (thanks Allen for making this) and picked out the
shell script rewrite and the removal of HFTP as big changes. This would be
the place to propose features for inclusion in 3.x, I'd particularly
appreciate help on the YARN/MR side.

Based on what I'm hearing, let me modulate my proposal to the following:

- We avoid cutting branch-3, and release off of trunk. The trunk-only
changes don't look that scary, so I think this is fine. This does mean we
need to be more rigorous before merging branches to trunk. I think
Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches would
be very helpful in this regard.
- We do not include anything to break wire compatibility unless (as Jason
says) it's an unbelievably awesome feature.
- No harm in rolling alphas from trunk, as it doesn't lock us to anything
compatibility wise. Downstreams like releases.

I'll take Steve's advice about not locking GA to a given date, but I also
share his belief that we can alpha/beta/GA faster than it took for Hadoop
2. Let's roll some intermediate releases, work on the roadmap items, and
see how we're feeling in a few months.

Best,
Andrew

On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth  wrote:

> I think it'll be useful to have a discussion about what else people would
> like to see in Hadoop 3.x - especially if the change is potentially
> incompatible. Also, what we expect the release schedule to be for major
> releases and what triggers them - JVM version, major features, the need for
> incompatible changes ? Assuming major versions will not be released every 6
> months/1 year (adoption time, fairly disruptive for downstream projects,
> and users) -  considering additional features/incompatible changes for 3.x
> would be useful.
>
> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
>
> Thanks
> - Sid
>
>
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran 
> wrote:
>
> > Sorry, outlook dequoted Alejandros's comments.
> >
> > Let me try again with his comments in italic and proofreading of mine
> >
> > On 05/03/2015 13:59, "Steve Loughran"  > ste...@hortonworks.com>> wrote:
> >
> >
> >
> > On 05/03/2015 13:05, "Alejandro Abdelnur"  > tuc...@gmail.com>> wrote:
> >
> > IMO, if part of the community wants to take on the responsibility and
> work
> > that takes to do a new major release, we should not discourage them from
> > doing that.
> >
> > Having multiple major branches active is a standard practice.
> >
> > Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> > long time to get out, and during that time 0.21, 0.22, got released and
> > ignored; 0.23 picked up and used in production.
> >
> > The 2.04-alpha release was more of a troublespot as it got picked up
> > widely enough to be used in products, and changes were made between that
> > alpha & 2.2 itself which raised compatibility issues.
> >
> > For 3.x I'd propose
> >
> >
> >   1.  Have less longevity of 3.x alpha/beta artifacts
> >   2.  Make clear there are no guarantees of compatibility from alpha/beta
> > releases to shipping. Best effort, but not to the extent that it gets in
> > the way. More succinctly: we will care more about seamless migration from
> > 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
> >   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> > accept policy (2). Hadoop's "instability guarantee" for the 3.x
> alpha/beta
> > phase
> >
> > As well as backwards compatibility, we need to think about Forwards
> > compatibility, with the goal being:
> >
> > Any app written/shipped with the 3.x release binaries (JAR and native)
> > will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> > where y>=x  and is-release(x) and is-release(y)
> >
> > That's important, as it means all server-side changes in 3.x which are
> > expected to to mandate client-side updates: protocols, HDFS erasure
> > decoding, security features, must be considered complete and stable
> before
> > we can say is-release(x). In an ideal world, we'll even get the semantics
> > right with tests to show this.
> >
> > Fixing classpath hell downstr

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Siddharth Seth

I think it'll be useful to have a discussion about what else people would
like to see in Hadoop 3.x - especially if the change is potentially
incompatible. Also, what we expect the release schedule to be for major
releases and what triggers them - JVM version, major features, the need for
incompatible changes ? Assuming major versions will not be released every 6
months/1 year (adoption time, fairly disruptive for downstream projects,
and users) -  considering additional features/incompatible changes for 3.x
would be useful.

Some features that come to mind immediately would be
1) enhancements to the RPC mechanics - specifically support for AsynRPC /
two way communication. There's a lot of places where we re-use heartbeats
to send more information than what would be done if the PRC layer supported
these features. Some of this can be done in a compatible manner to the
existing RPC sub-system. Others like 2 way communication probably cannot.
After this, having HDFS/YARN actually make use of these changes. The other
consideration is adoption of an alternate system ike gRpc which would be
incompatible.
2) Simplification of configs - potentially separating client side configs
and those used by daemons. This is another source of perpetual confusion
for users.

Thanks
- Sid


On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran 
wrote:

> Sorry, outlook dequoted Alejandros's comments.
>
> Let me try again with his comments in italic and proofreading of mine
>
> On 05/03/2015 13:59, "Steve Loughran"  ste...@hortonworks.com>> wrote:
>
>
>
> On 05/03/2015 13:05, "Alejandro Abdelnur"  tuc...@gmail.com>> wrote:
>
> IMO, if part of the community wants to take on the responsibility and work
> that takes to do a new major release, we should not discourage them from
> doing that.
>
> Having multiple major branches active is a standard practice.
>
> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
> long time to get out, and during that time 0.21, 0.22, got released and
> ignored; 0.23 picked up and used in production.
>
> The 2.04-alpha release was more of a troublespot as it got picked up
> widely enough to be used in products, and changes were made between that
> alpha & 2.2 itself which raised compatibility issues.
>
> For 3.x I'd propose
>
>
>   1.  Have less longevity of 3.x alpha/beta artifacts
>   2.  Make clear there are no guarantees of compatibility from alpha/beta
> releases to shipping. Best effort, but not to the extent that it gets in
> the way. More succinctly: we will care more about seamless migration from
> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>   3.  Anybody who ships code based on 3.x alpha/beta to recognise and
> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
> phase
>
> As well as backwards compatibility, we need to think about Forwards
> compatibility, with the goal being:
>
> Any app written/shipped with the 3.x release binaries (JAR and native)
> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
> where y>=x  and is-release(x) and is-release(y)
>
> That's important, as it means all server-side changes in 3.x which are
> expected to to mandate client-side updates: protocols, HDFS erasure
> decoding, security features, must be considered complete and stable before
> we can say is-release(x). In an ideal world, we'll even get the semantics
> right with tests to show this.
>
> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
> it's only one of the features, and given there's not any design doc on that
> JIRA, way too immature to set a release schedule on. An alpha schedule with
> no-guarantees and a regular alpha roll, could be viable, as new features go
> in and can then be used to experimentally try this stuff in branches of
> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
> will be transitive downstream.
>
>
> This time around we are not replacing the guts as we did from Hadoop 1 to
> Hadoop 2, but superficial surgery to address issues were not considered (or
> was too much to take on top of the guts transplant).
>
> For the split brain concern, we did a great of job maintaining Hadoop 1 and
> Hadoop 2 until Hadoop 1 faded away.
>
> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
> compatibility.
>
>
> Based on that experience I would say that the coexistence of Hadoop 2 and
> Hadoop 3 will be much less demanding/traumatic.
>
> The re-layout of all the source trees was a major change there, assuming
> there's no refactoring or switch of build tools then picking things back
> will be tractable
>
>
> Also, to facilitate the coexistence we should limit Java language features
> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
> we can remove this limitation.
>
> +1; setting javac.version will fix this
>
> What is nice about having java 8 as the base JVM is that it means you can
> be confident that all Had

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Steve Loughran

Sorry, outlook dequoted Alejandros's comments.

Let me try again with his comments in italic and proofreading of mine

On 05/03/2015 13:59, "Steve Loughran" 
mailto:ste...@hortonworks.com>> wrote:



On 05/03/2015 13:05, "Alejandro Abdelnur" 
mailto:tuc...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha & 2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work in and against a 3.y Hadoop cluster, for all x, y in Natural  where y>=x  
and is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on. But: it's 
only one of the features, and given there's not any design doc on that JIRA, 
way too immature to set a release schedule on. An alpha schedule with 
no-guarantees and a regular alpha roll, could be viable, as new features go in 
and can then be used to experimentally try this stuff in branches of Hbase 
(well volunteered, Stack!), etc. Of course instability guarantees will be 
transitive downstream.


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that "backport to branch-2" isn't going to happen.

-Steve

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Steve Loughran



On 05/03/2015 13:05, "Alejandro Abdelnur" 
mailto:tuc...@gmail.com>> wrote:

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a long 
time to get out, and during that time 0.21, 0.22, got released and ignored; 
0.23 picked up and used in production.

The 2.04-alpha release was more of a troublespot as it got picked up widely 
enough to be used in products, and changes were made between that alpha & 2.2 
itself which raised compatibility issues.

For 3.x I'd propose


  1.  Have less longevity of 3.x alpha/beta artifacts
  2.  Make clear there are no guarantees of compatibility from alpha/beta 
releases to shipping. Best effort, but not to the extent that it gets in the 
way. More succinctly: we will care more about seamless migration from 2.2+ to 
3.x than from a 3.0-alpha to 3.3 production.
  3.  Anybody who ships code based on 3.x alpha/beta to recognise and accept 
policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta phase

As well as backwards compatibility, we need to think about Forwards 
compatibility, with the goal being:

Any app written/shipped with the 3.x release binaries (JAR and native) will 
work against a 3.y Hadoop release, for all x, y in Natural  where y>=x  and 
is-release(x) and is-release(y)

That's important, as it means all server-side changes in 3.x which are expected 
to to mandate client-side updates: protocols, HDFS erasure decoding, security 
features, must be considered complete and stable before we can say 
is-release(x). In an ideal world, we'll even get the semantics right with tests 
to show this.

Fixing classpath hell downstream is certainly one feature I am +1 on this 
roadmap is classpath isolation. But: it's only one of the features, and given 
there's not any design doc on that JIRA, way too immature to set a release 
schedule on. An alpha schedule with no-guarantees and a regular alpha roll, 
could be viable, as new features go in and can then be used to experimentally 
try this stuff in branches of Hbase (well volunteered, Stack!), etc. Of course 
instability guarantees will transitive


This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS compatibility.


Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

The re-layout of all the source trees was a major change there, assuming 
there's no refactoring or switch of build tools then picking things back will 
be tractable


Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

+1; setting javac.version will fix this

What is nice about having java 8 as the base JVM is that it means you can be 
confident that all Hadoop 3 servers will be JDK8+, so downstream apps and libs 
can use all Java 8 features they want to.

There's one policy change to consider there which is possibly, just possibly, 
we could allow new modules in hadoop-tools to adopt Java 8 languages early, 
provided everyone recognised that "backport to branch-2" isn't going to happen.

-Steve

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Jason Lowe

I'm OK with a 3.0.0 release as long as we are minimizing the pain of 
maintaining yet another release line and conscious of the incompatibilities 
going into that release line.
For the former, I would really rather not see a branch-3 cut so soon.  It's yet 
another line onto which to cherry-pick, and I don't see why we need to add this 
overhead at such an early phase.  We should only create branch-3 when there's 
an incompatible change that the community wants and it should _not_ go into the 
next major release (i.e.: it's for Hadoop 4.0).  We can develop 3.0 alphas and 
betas on trunk and release from trunk in the interim.  IMHO we need to stop 
treating trunk as a place to exile patches.

For the latter, I think as a community we need to evaluate the benefits of 
breaking compatibility against the costs of migrating.  Each time we break 
compatibility we create a hurdle for people to jump when they move to the new 
release, and we should make those hurdles worth their time.  For example, 
wire-compatibility has been mentioned as part of this.  Any feature that breaks 
wire compatibility better be absolutely amazing, as it creates a huge hurdle 
for people to jump.
To summarize:+1 for a community-discussed roadmap of what we're breaking in 
Hadoop 3 and why it's worth it for users
-1 for creating branch-3 now, we can release from trunk until the next 
incompatibility for Hadoop 4 arrives
+1 for baking classpath isolation as opt-in on 2.x and eventually default on in 
3.0
Jason
  From: Andrew Wang 
 To: "hdfs-dev@hadoop.apache.org"  
Cc: "common-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-...@hadoop.apache.org"  
 Sent: Wednesday, March 4, 2015 12:15 PM
 Subject: Re: Looking to a Hadoop 3 release

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy  wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> 
> From: Vinod Kumar Vavilapalli 
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-...@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang  andrew.w...@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Alejandro Abdelnur

IMO, if part of the community wants to take on the responsibility and work
that takes to do a new major release, we should not discourage them from
doing that.

Having multiple major branches active is a standard practice.

This time around we are not replacing the guts as we did from Hadoop 1 to
Hadoop 2, but superficial surgery to address issues were not considered (or
was too much to take on top of the guts transplant).

For the split brain concern, we did a great of job maintaining Hadoop 1 and
Hadoop 2 until Hadoop 1 faded away.

Based on that experience I would say that the coexistence of Hadoop 2 and
Hadoop 3 will be much less demanding/traumatic.

Also, to facilitate the coexistence we should limit Java language features
to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
we can remove this limitation.

Thanks.

On Thu, Mar 5, 2015 at 11:40 AM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> The 'resistance' is not so much about  a new major release, more so about
> the content and the roadmap of the release. Other than the two specific
> features raised (the need for breaking compat for them is something that I
> am debating), I haven't seen a roadmap of branch-3 about any more features
> that this community needs to discuss about. If all the difference between
> branch-2 and branch-3 is going to be JDK + a couple of incompat changes, it
> is a big problem in two dimensions (1) it's a burden keeping the branches
> in sync and avoiding the split-brain we experienced with 1.x, 2.x or worse
> branch-0.23, branch-2 and (2) very hard to ask people to not break more
> things in branch-3.
>
> We seem to have agreed upon a course of action for JDK7. And now we are
> taking a different direction for JDK8. Going by this new proposal, come
> 2016, we will have to deal with JDK9 and 3 mainline incompatible hadoop
> releases.
>
> Regarding, individual improvements like classpath isolation, shell script
> stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be
> possible for every major feature that we develop to be a opt in, unless the
> change is so great and users can balance out the incompatibilities for the
> new stuff they are getting. Even with an ground breaking change like with
> YARN, we spent a bit of time to ensure compatibility (MAPREDUCE-5108) that
> has paid so many times over in return. Breaking compatibility shouldn't
> come across as too cheap a thing.
>
> Thanks,
> +Vinod
>
> On Mar 4, 2015, at 10:15 AM, Andrew Wang  andrew.w...@cloudera.com>> wrote:
>
> Where does this resistance to a new major release stem from? As I've
> described from the beginning, this will look basically like a 2.x release,
> except for the inclusion of classpath isolation by default and target
> version JDK8. I've expressed my desire to maintain API and wire
> compatibility, and we can audit the set of incompatible changes in trunk to
> ensure this. My proposal for doing alpha and beta releases leading up to GA
> also gives downstreams a nice amount of time for testing and validation.
>
>

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Vinod Kumar Vavilapalli

The 'resistance' is not so much about  a new major release, more so about the 
content and the roadmap of the release. Other than the two specific features 
raised (the need for breaking compat for them is something that I am debating), 
I haven't seen a roadmap of branch-3 about any more features that this 
community needs to discuss about. If all the difference between branch-2 and 
branch-3 is going to be JDK + a couple of incompat changes, it is a big problem 
in two dimensions (1) it's a burden keeping the branches in sync and avoiding 
the split-brain we experienced with 1.x, 2.x or worse branch-0.23, branch-2 and 
(2) very hard to ask people to not break more things in branch-3.

We seem to have agreed upon a course of action for JDK7. And now we are taking 
a different direction for JDK8. Going by this new proposal, come 2016, we will 
have to deal with JDK9 and 3 mainline incompatible hadoop releases.

Regarding, individual improvements like classpath isolation, shell script 
stuff, Jason Lowe captured it perfectly on HADOOP-11656 - it should be possible 
for every major feature that we develop to be a opt in, unless the change is so 
great and users can balance out the incompatibilities for the new stuff they 
are getting. Even with an ground breaking change like with YARN, we spent a bit 
of time to ensure compatibility (MAPREDUCE-5108) that has paid so many times 
over in return. Breaking compatibility shouldn't come across as too cheap a 
thing.

Thanks,
+Vinod

On Mar 4, 2015, at 10:15 AM, Andrew Wang 
mailto:andrew.w...@cloudera.com>> wrote:

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Vinod Kumar Vavilapalli

Moving to JDK8 involves a lot of things
 (1) Get Hadoop apps to be able to run on JDK8 and chose JDK8 language 
features. This is already possible with the decoupling of apps from the 
platform.
 (2) Get the platform to run on JDK8. This can be done so that we can run 
Hadoop on both JDK8 and JDK7 without any compatibility issues. This in itself 
is a huge move, what with potential GC behavior changes, native library compat 
etc.
 (3) Get the platform to use JDK8 language features. As much as I love the new 
stuff in JDK8, I'm willing to postpone usage of the language features in the 
platform till the time when JDK8 is already in full force.

So, how about we do (1) + (2) for now, get JDK8 going and then come around to 
make the decision of dropping support for JDK7? This is no different from what 
we did for the adoption of JDK7. For a bit of time (2/3 releases?), we were 
able to run on both JDK6 and JDK7 and we are phasing out JDK6 only when most of 
the community stopped using it.

Thanks,
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang  wrote:
>> Given that we already agreed to put in JDK7 in 2.7, and that the
>> classpath is a fairly minor irritant given some existing solutions (e.g. a
>> new default classloader), how do you quantify the benefit for users?
>> 
>> I looked at our thread on this topic from last time, and we (meaning at
> least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
> 2.x for practical reasons. We waited for so long that we had some assurance
> JDK6 was on the outs. Multiple distros also already had bumped their min
> version to JDK7. This is not true this time around. Bumping the JDK version
> is hugely impactful on the end user, and my email on the earlier thread
> still reflects my thoughts on JDK compatibility:
> 
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E
> 
>> .

> Right now, the incompatible changes would be JDK8, classpath isolation, and
> whatever is already in trunk. I can audit these existing trunk changes when
> branch-3 is cut.

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Chris Douglas

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko
 wrote:
> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.

Agreed; let's try to minimize backporting headaches. Pulling trunk >
branch-2 > branch-2.x is already tedious. Adding a branch-3,
branch-3.x would be obnoxious.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.

+1 This sounds like a good approach. Marked as alpha, we can break
compatibility in minor versions. Stabilizing a beta can correspond
with cutting branch-3, since that will be winding down branch-2. This
shouldn't disrupt existing plans for branch-2.

However, this requires that committers not accumulate too much
compatibility debt in trunk. Undoing all that in branch-3 imposes a
burdensome tax. Scanning through Allen's diff: that doesn't appear to
be the case so far, but it recommends against developing features "in
place" on trunk. Just be considerate of users and developers who will
need to move from (and maintain) branch-2.

> I believe we can start planning 3.0 from trunk right after 2.7 is out.

If we're publishing a snapshot, we don't need too much planning. -C

> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
> wrote:
>
>> Hi devs,
>>
>> It's been a year and a half since 2.x went GA, and I think we're about due
>> for a 3.x release.
>> Notably, there are two incompatible changes I'd like to call out, that will
>> have a tremendous positive impact for our users.
>>
>> First, classpath isolation being done at HADOOP-11656, which has been a
>> long-standing request from many downstreams and Hadoop users.
>>
>> Second, bumping the source and target JDK version to JDK8 (related to
>> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
>> months from now). In the past, we've had issues with our dependencies
>> discontinuing support for old JDKs, so this will future-proof us.
>>
>> Between the two, we'll also have quite an opportunity to clean up and
>> upgrade our dependencies, another common user and developer request.
>>
>> I'd like to propose that we start rolling a series of monthly-ish series of
>> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
>> other cat herding responsibilities. There are already quite a few changes
>> slated for 3.0 besides the above (for instance the shell script rewrite) so
>> there's already value in a 3.0 alpha, and the more time we give downstreams
>> to integrate, the better.
>>
>> This opens up discussion about inclusion of other changes, but I'm hoping
>> to freeze incompatible changes after maybe two alphas, do a beta (with no
>> further incompat changes allowed), and then finally a 3.x GA. For those
>> keeping track, that means a 3.x GA in about four months.
>>
>> I would also like to stress though that this is not intended to be a big
>> bang release. For instance, it would be great if we could maintain wire
>> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
>> branch-2 and branch-3 similar also makes backports easier, since we're
>> likely maintaining 2.x for a while yet.
>>
>> Please let me know any comments / concerns related to the above. If people
>> are friendly to the idea, I'd like to cut a branch-3 and start working on
>> the first alpha.
>>
>> Best,
>> Andrew
>>

Re: Looking to a Hadoop 3 release

2015-03-05 Thread Yongjun Zhang

Thanks all.

There is an open issue HDFS-6962 (ACLs inheritance conflicts with
umaskmode), for which the incompatibility appears to make it not suitable
for 2.x and it's targetted 3.0, please see:

https://issues.apache.org/jira/browse/HDFS-6962?focusedCommentId=14335418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14335418

Best,

--Yongjun


On Wed, Mar 4, 2015 at 8:13 PM, Allen Wittenauer  wrote:

>
> One of the questions that keeps popping up is “what exactly is in trunk?”
>
> As some may recall, I had done some experiments creating the change log
> based upon JIRA.  While the interest level appeared to be approaching zero,
> I kept playing with it a bit and eventually also started playing with the
> release notes script (for various reasons I won’t bore you with.)
>
> In any case, I’ve started posting the results of these runs on one of my
> github repos if anyone was wanting a quick reference as to JIRA’s opinion
> on the matter:
>
> https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
>
>
>

Re: Looking to a Hadoop 3 release

2015-03-04 Thread Allen Wittenauer


One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based 
upon JIRA.  While the interest level appeared to be approaching zero, I kept 
playing with it a bit and eventually also started playing with the release 
notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github 
repos if anyone was wanting a quick reference as to JIRA’s opinion on the 
matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0

Re: Looking to a Hadoop 3 release

2015-03-04 Thread Karthik Kambatla

On Wed, Mar 4, 2015 at 10:46 AM, Stack  wrote:

> In general +1 on 3.0.0. Its time. If we start now, it might make it out by
> 2016. If we start now, downstreamers can start aligning themselves to land
> versions that suit at about the same time.
>
> While two big items have been called out as possible incompatible changes,
> and there is ongoing discussion as to whether they are or not*, is there
> any chance of getting a longer list of big differences between the
> branches? In particular I'd be interested in improvements that are 'off' by
> default that would be better defaulted 'on'.
>
> Thanks,
> St.Ack
>
> * Let me note that 'compatible' around these parts is a trampled concept
> seemingly open to interpretation with a definition that is other than
> prevails elsewhere in software. See Allen's list above, and in our
> downstream project, the recent HBASE-13149 "HBase server MR tools are
> broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
> 2.x if only so we can leave behind all current notions of 'compatibility'
> and just start over (as per Allen).
>

Unfortunately, our compatibility policies

are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
"compatibility store" more inline with the general expectations.




>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es

RE: Looking to a Hadoop 3 release

2015-03-04 Thread Zheng, Kai

Might I have some comments for this, just providing my thought. Thanks.

>> If we start now, it might make it out by 2016. If we start now, 
>> downstreamers can start aligning themselves to land versions that suit at 
>> about the same time.
Not only for down streamers to align with the long term release, but also for 
contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more 
possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used 
in the same Java application/process without conflicts, providing good 
isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, 
manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a 
strong dedicated and clean Kerberos library in Java for both client and KDC 
sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-...@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 
2016. If we start now, downstreamers can start aligning themselves to land 
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and 
there is ongoing discussion as to whether they are or not*, is there any chance 
of getting a longer list of big differences between the branches? In particular 
I'd be interested in improvements that are 'off' by default that would be 
better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept 
seemingly open to interpretation with a definition that is other than prevails 
elsewhere in software. See Allen's list above, and in our downstream project, 
the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", 
among others.  Let 3.x be incompatible with 2.x if only so we can leave behind 
all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about 
> due for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that 
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been 
> a long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to 
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
> months from now). In the past, we've had issues with our dependencies 
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and 
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish 
> series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
> and other cat herding responsibilities. There are already quite a few 
> changes slated for 3.0 besides the above (for instance the shell 
> script rewrite) so there's already value in a 3.0 alpha, and the more 
> time we give downstreams to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm 
> hoping to freeze incompatible changes after maybe two alphas, do a 
> beta (with no further incompat changes allowed), and then finally a 
> 3.x GA. For those keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a 
> big bang release. For instance, it would be great if we could maintain 
> wire compatibility between 2.x and 3.x, so rolling upgrades work. 
> Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're 
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If 
> people are friendly to the idea, I'd like to cut a branch-3 and start 
> working on the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

2015-03-04 Thread Stack

In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

2015-03-04 Thread Andrew Wang

Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy  wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> 
> From: Vinod Kumar Vavilapalli 
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-...@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang  andrew.w...@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Arun Murthy

Awesome, looks like we can just do this in a compatible manner - nothing else 
on the list seems like it warrants a (premature) major release.

Thanks Vinod.

Arun


From: Vinod Kumar Vavilapalli 
Sent: Tuesday, March 03, 2015 2:30 PM
To: common-...@hadoop.apache.org
Cc: hdfs-dev@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, 
whatever the approach. Marking and calling it incompatible before we see 
proposal/patch seems premature to me. Commented the same on JIRA: 
https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang 
mailto:andrew.w...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Vinod Kumar Vavilapalli


I started pitching in more on that JIRA.

To add, I think we can and should strive for doing this in a compatible manner, 
whatever the approach. Marking and calling it incompatible before we see 
proposal/patch seems premature to me. Commented the same on JIRA: 
https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875.

Thanks
+Vinod

On Mar 2, 2015, at 8:08 PM, Andrew Wang 
mailto:andrew.w...@cloudera.com>> wrote:

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Allen Wittenauer

Between:

* removing -finalize
* breaking HDFS browsing
* changing du’s output (in the 2.7 branch)
* changing various names of metrics (either intentionally or otherwise)
* changing the JDK release

… and probably lots of other stuff in branch-2 I haven’t seen/know 
about, our best course of action is to:

$ git rm hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md

At least this way we as caretakers don’t come across as hypocrits.  
It’s pretty clear the direction has shown we only care about API compatibility 
and the rest is ignored when it isn’t “convenient”.  [The next time someone 
tells you that Hadoop is hard to operate, I want you think about this email.]  
(1)

Making 2.7 build with JDK7 led to the *exact* situation I figured it 
would:  now we have a precedent where we just say to the community “You know 
those guarantees?  Yeah, you might as well ignore them because we’re going to 
change the core component any damn time we feel like it.”

We haven’t made a release branch off of trunk since branch-0.23.  If 
anyone thinks that’s healthy, there is some beach property in Alberta you might 
be interested in as well. Our release cycle came to a screeching halt after 
0.20 and we’ve never recovered.

However, I offer an alternative.

This same circular argument comes up all the time: (2)

* There aren’t enough changes in trunk to make a new branch. 
* We can’t upgrade/change component X because there is no plan to make 
a new major release.

To quote Frozen:  Let It Go

We’re probably at the point where there aren’t likely to be very many 
more earth shattering changes to the Hadoop code base.  The community has 
decided instead to push these types of changes as separate projects via 
incubator to avoid the committer paralysis that this community suffers.  

Because of this, I don’t think the “enough changes” argument works 
anymore.  Instead, we need to pick a new metric to build a cadence to force 
regular updates.  I’d offer that the “every two years” JDK EOL sets the perfect 
cadence, matched by many other enterprise and OSS software, and gives us an 
opportunity to reflect in the version number that the critical component of our 
software has changed.

This cadence allows for people to plan appropriately and know what our 
roadmap and direction actually is.  Folks are more likely to build “real” 
solutions rather than make compromises that suffer in quality in the name of 
compatibility simply because they don’t know when their work will actually show 
up. We’ll have a normal, regular opportunity to update dependencies (regardless 
of the state of HADOOP-11656).

Now, if you’ll excuse me, I have more contributor's patches to go 
through.

(1) FWIW, I made the decision not to worry about backward compatibility in the 
shell code rewrite when I made the realization that the jsvc log and pid file 
names were poorly chosen to allow for certain capabilities.  Did anyone 
actually touch them from outside the software? Probably not.  But it is still 
effectively an interface, so off to trunk it went. 

(2) … and that’s before we even get to the “Version numbers are cheap” 
arguments that were made during the Great Renames of 0.20 and 0.23.

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Andrew Wang

Hi Junping, thanks for your response,

I view branch-3 as essentially the same size as our recent 2.x releases,
with the exception of incompatible changes like classpath isolation and
JDK8 target version. These, while perhaps not revolutionary, are still
incompatible, and require a major version bump.

I don't see a forking of the community effort, since backports should flow
pretty easily from branch-3 to branch-2 the same way they currently can
flow from branch-2 to branch-2.6. It's just an extra git commit, not like
what we had to deal with in the branch-1 days with a custom backport.

Hopefully that addresses your concerns.

Thanks,
Andrew

On Tue, Mar 3, 2015 at 6:12 AM, Junping Du  wrote:

> Thanks all for good discussions here.
> +1 on supporting Java 8 ASAP. In addition, I agree that we should
> separating this effort with cutting down Hadoop 3.
> IMO, Hadoop is still very cool today, and we should only consider Hadoop 3
> until we have revolutionary feature (like YARN for 2.0) which deserve to
> break fundamental compatibilities. Or it may just cause more distractions
> for community effort.
> Just 2 cents.
>
> Thanks,
>
> Junping
> 
> From: Akira AJISAKA 
> Sent: Tuesday, March 03, 2015 12:04 PM
> To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
>  > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
>  > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
>  > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> Regards,
> Akira
>
> On 3/2/15 15:19, Andrew Wang wrote:
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>
>

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Andrew Wang

Hi Akira, thanks for responding,

On Tue, Mar 3, 2015 at 4:04 AM, Akira AJISAKA 
wrote:

> Thanks Andrew for bringing this up.
> +1 mostly looks fine but I'm thinking it's not now to cut branch-3.
>
> > classpath isolation
>
> IMHO, classpath isolation is a good thing to do.
> We should pay down the technical dept ASAP. I'm willing to help.
>
> I'm thinking we can cut branch-3 and release 3.0 alpha
> after HADOOP-11656 is fixed. That is, I'd like to mark
> this issue as a blocker for 3.0.
> I wonder that even if we cut branch-3 now, trunk and
> branch-3 would be the same for a while. That seems useless.
>
> I'm willing to wait a bit here, but I think even what we have now is worth
kicking the tires, and either the JDK8 target version or classpath
isolation would make it even more compelling.

If you're worried about backport overheads, Konst's proposal of releasing
directly from trunk might be appealing. Needs some more examination though.


> > JDK8
>
> As Steve suggested, JDK8 can be in both trunk and branch-2.
> +1 for moving to JDK8 ASAP.
>
> We can make sure branch-2 runs well under JDK8, but I'm against doing a
target version bump to JDK8 like we're planning to do for JDK7 in a minor
release. As I described in my reply to Arun, that was a special
circumstance, and JDK target version bumps really are deserving of a new
major release.


> > maintaining 2.x
>
> For user side, now there is little merit to upgrade to 3.x.
> More important thing is how long 2.x will be maintained.
> Therefore we should consider when to stop backporting
> new features to 2.x, and when to stop maintaining 2.x.
> I'd like to maintain 2.x as long as possible, at least
> one year after 3.x GA release.
>
> The value in releasing alphas right now is not so much for end users, but
for downstream projects which need time to integrate. I don't expect
end-users to really jump on 3.x until the downstreams have also rolled new
releases based on 3.x.

Determining when support for 2.x is over is done by the community. I
personally plan to keep backporting for a while after 3.x GA is released.
If backports to branch-2 tail off, it just takes one committer with the
interest to keep maintaining it. This has been a common thing in HBase for
instance, Lars H maintained 0.92 for a long time because he had the
interest.


> * Other issue
>
> What's the current status of HDFS symlink?
> If HADOOP-10019 requires some incompatible changes,
> I'd like to include in 3.x.
>
> There are still a lot of unresolved compatibility and security issues,
especially with cross-filesystem symlinks. We tabled this work before, and
frankly I'm not sure these issues will ever be satisfactorily resolved.
Even today, there are plenty of Unix apps that don't handle symlinks
correctly, and we still lack equivalents of more secure syscalls like
openat() in the first place.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Andrew Wang

Hi Konst, thanks for taking a look. I think I essentially agree with your
points.

On Mon, Mar 2, 2015 at 11:04 PM, Konstantin Shvachko 
wrote:

> Andrew,
>
> Hadoop 3 seems in general like a good idea to me.
> 1. I did not understand if you propose to release 3.0 instead of 2.7 or in
> addition?
>I think 2.7 is needed at least as a stabilization step for the 2.x line.
>
> I agree with this, 2.7 is needed, and I think Vinod/Arun are working on it
now.

I expect branch-2 to be maintained for a while yet, separate from a
branch-3.

> 2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
> manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
> other versions. If that somehow beneficial for commercial vendors, which I
> don't see how, for the community it was proven to be very disruptive. Would
> be really good to avoid it this time.
>
> My motivations here are purely what I've stated above. I remember the pain
of the branch-1 days as well, and this would be a far, far smaller
difference. JDK8 min version and classpath isolation are compelling, yet
incompatible, which is why I'm proposing Hadoop 3. Besides those two
features, it should be approximately the same "size" as our 2.x releases.

> 3. Could we release Hadoop 3 directly from trunk? With a proper feature
> freeze in advance. Current trunk is in the best working condition I've seen
> in years - much better, than when hadoop-2 was coming to life. It could
> make a good alpha.
> I believe we can start planning 3.0 from trunk right after 2.7 is out.
>

I agree with this, and would be okay with this if our audit of trunk
reveals no incompatible changes we're uncomfortable releasing.

I'll note though that committing to multiple branches is way easier now
with git and cherry-pick, so that overhead is reduced. Rolling out an alpha
now is strictly a good thing for our downstreams, even if it means we need
to do extra commits.

Thanks,
Andrew

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Junping Du

Thanks all for good discussions here.
+1 on supporting Java 8 ASAP. In addition, I agree that we should separating 
this effort with cutting down Hadoop 3. 
IMO, Hadoop is still very cool today, and we should only consider Hadoop 3 
until we have revolutionary feature (like YARN for 2.0) which deserve to break 
fundamental compatibilities. Or it may just cause more distractions for 
community effort.
Just 2 cents.

Thanks,

Junping

From: Akira AJISAKA 
Sent: Tuesday, March 03, 2015 12:04 PM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

 > classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

 > JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

 > maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

2015-03-03 Thread Akira AJISAKA

Thanks Andrew for bringing this up.
+1 mostly looks fine but I'm thinking it's not now to cut branch-3.

> classpath isolation

IMHO, classpath isolation is a good thing to do.
We should pay down the technical dept ASAP. I'm willing to help.

I'm thinking we can cut branch-3 and release 3.0 alpha
after HADOOP-11656 is fixed. That is, I'd like to mark
this issue as a blocker for 3.0.
I wonder that even if we cut branch-3 now, trunk and
branch-3 would be the same for a while. That seems useless.

> JDK8

As Steve suggested, JDK8 can be in both trunk and branch-2.
+1 for moving to JDK8 ASAP.

> maintaining 2.x

For user side, now there is little merit to upgrade to 3.x.
More important thing is how long 2.x will be maintained.
Therefore we should consider when to stop backporting
new features to 2.x, and when to stop maintaining 2.x.
I'd like to maintain 2.x as long as possible, at least
one year after 3.x GA release.

* Other issue

What's the current status of HDFS symlink?
If HADOOP-10019 requires some incompatible changes,
I'd like to include in 3.x.

Regards,
Akira

On 3/2/15 15:19, Andrew Wang wrote:

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Konstantin Shvachko

Andrew,

Hadoop 3 seems in general like a good idea to me.
1. I did not understand if you propose to release 3.0 instead of 2.7 or in
addition?
   I think 2.7 is needed at least as a stabilization step for the 2.x line.

2. If Hadoop 3 and 2.x are meant to exist together, we run a risk to
manifest split-brain behavior again, as we had with hadoop-1, hadoop-2 and
other versions. If that somehow beneficial for commercial vendors, which I
don't see how, for the community it was proven to be very disruptive. Would
be really good to avoid it this time.

3. Could we release Hadoop 3 directly from trunk? With a proper feature
freeze in advance. Current trunk is in the best working condition I've seen
in years - much better, than when hadoop-2 was coming to life. It could
make a good alpha.
I believe we can start planning 3.0 from trunk right after 2.7 is out.

Thanks,
--Konst

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Andrew Wang

 Thanks as always for the feedback everyone. Some inline comments to Arun's
email, as his were the most extensive:


>  Given that we already agreed to put in JDK7 in 2.7, and that the
> classpath is a fairly minor irritant given some existing solutions (e.g. a
> new default classloader), how do you quantify the benefit for users?
>
> I looked at our thread on this topic from last time, and we (meaning at
least myself and Tucu) agreed to a one-time exception to the JDK7 bump in
2.x for practical reasons. We waited for so long that we had some assurance
JDK6 was on the outs. Multiple distros also already had bumped their min
version to JDK7. This is not true this time around. Bumping the JDK version
is hugely impactful on the end user, and my email on the earlier thread
still reflects my thoughts on JDK compatibility:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3CCAGB5D2a5fEDfBApQyER_zyhc8a4Xd_ea1wJSsxxkiAiDZO9%2BNg%40mail.gmail.com%3E

Regarding classpath isolation, based on what I hear from our customers,
it's still a big problem (even after the MR classloader work). The latest
Jackson version bump was quite painful for our downstream projects, and the
HDFS client still leaks a lot of dependencies. Would welcome more
discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
chimed in.

Having the freedom to upgrade our dependencies at will would also be a big
win for us as developers.

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely
> welcome to run the RM role for that release.
>
>  Furthermore, I'm really concerned that this will be used as an
> opportunity to further break compat in more egregious ways.
>
>  Also, are you foreseeing more compat breaks? OTOH, if we all agree that
> we should absolutely prevent compat breakages such as the client-server
> wire protocol, I feel the point of a major release is kinda lost.
>
>
Right now, the incompatible changes would be JDK8, classpath isolation, and
whatever is already in trunk. I can audit these existing trunk changes when
branch-3 is cut.

I would like to keep this list as short as possible, to preserve wire
compat and rolling upgrade. As far as major releases go, this is not one to
be scared of. However, since it's incompatible, it still needs that major
version bump.

Best,
Andrew

P.S. Vinod, the shell script rewrite is incompatible. Allen intentionally
excluded it from branch-2 for this reason.

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Steve Loughran

I'm +1 for a migrate to Java 8 as soon as possible.

That's branch-2 & trunk, as having them on the same language level makes 
cherrypicking stuff off trunk possible. That's particularly the case for Java 8 
as it is the first major change to the language since Java 5.

w.r.t shipping trunk as 3.x, it's going to take longer than planned. Hopefully 
not as long as the 2.x release process, but you never know.   Which means I 
expect some more Hadoop 2 releases this year. We need to make the jump there 
too, get 2.7 out the door and include a roadmap in there to when the java 8+ 
only event happens across the codebase.


-Steve


ps. for anyone who wants a pure java8 build today, set -Djavac.version=1.8 on 
the classpath of a maven build. Last time I tried there were some (minor) bits 
of YARN that wouldn't compile...




On 2 March 2015 at 18:31:00, Arun Murthy 
(a...@hortonworks.com<mailto:a...@hortonworks.com>) wrote:

Andrew,

Thanks for bringing up this discussion.

I'm a little puzzled for I feel like we are rehashing the same discussion from 
last year - where we agreed on a different course of action w.r.t switch to 
JDK7.

IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
for users such as Yahoo/Twitter/eBay who have several clusters between which 
compatibility is paramount.

Now, breaking compatibility is perfectly fine over time where there is 
sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1).

However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
cost of the breakage.

Given that we already agreed to put in JDK7 in 2.7, and that the classpath is a 
fairly minor irritant given some existing solutions (e.g. a new default 
classloader), how do you quantify the benefit for users?

We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
to run the RM role for that release.

Furthermore, I'm really concerned that this will be used as an opportunity to 
further break compat in more egregious ways.

Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
should absolutely prevent compat breakages such as the client-server wire 
protocol, I feel the point of a major release is kinda lost.

Overall, my biggest concern is the compatibility story vis-a-vis the benefit.

Thoughts?

thanks,
Arun


From: Andrew Wang 
Sent: Monday, March 02, 2015 3:19 PM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Vinod Kumar Vavilapalli

Agreed. The difference between a 3.0 GA release and a parallel 2.x release line 
is just JDK8 + a different classpath (potentially isolated) - doesn't sound 
like a big enough delta warranting the license to break compat.

Thanks,
+Vinod

On Mar 2, 2015, at 6:30 PM, Arun Murthy  wrote:

> Andrew,
> 
> Thanks for bringing up this discussion.
> 
> I'm a little puzzled for I feel like we are rehashing the same discussion 
> from last year - where we agreed on a different course of action w.r.t switch 
> to JDK7.
> 
> IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
> for users such as Yahoo/Twitter/eBay who have several clusters between which 
> compatibility is paramount. 
> 
> Now, breaking compatibility is perfectly fine over time where there is 
> sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 
> 
> However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
> cost of the breakage.
> 
> Given that we already agreed to put in JDK7 in 2.7, and that the classpath is 
> a fairly minor irritant given some existing solutions (e.g. a new default 
> classloader), how do you quantify the benefit for users?
> 
> We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
> to run the RM role for that release.
> 
> Furthermore, I'm really concerned that this will be used as an opportunity to 
> further break compat in more egregious ways. 
> 
> Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
> should absolutely prevent compat breakages such as the client-server wire 
> protocol, I feel the point of a major release is kinda lost.
> 
> Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> 
> From: Andrew Wang 
> Sent: Monday, March 02, 2015 3:19 PM
> To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
> hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Looking to a Hadoop 3 release
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Vinod Kumar Vavilapalli


> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.


Is moving to JDK8 fundamentally different from the move to JDK7? We are moving 
to JDK7 via release 2.7 that I am helping with now.


> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.


Aren't the shell script rewrite changes supposed to be compatible?

Thanks,
+Vinod

Re: Looking to a Hadoop 3 release

2015-03-02 Thread sanjay Radia

Andrew 
  Thanks for bringing up the issue of moving to Java8. Java8 is important
However, I am not seeing a strong motivation for changing the major number.
We can go to Java8 in  the 2.series. 
The classpath issue for Hadoop-11656 is too minor to force a major number 
change (no pun intended).

Lets separate the issue of Java8 and Hadoop 3.0

sanjay


> On Mar 2, 2015, at 3:19 PM, Andrew Wang  wrote:
> 
> Hi devs,
> 
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
> 
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
> 
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
> 
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
> 
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
> 
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
> 
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
> 
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
> 
> Best,
> Andrew

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Arun Murthy

Andrew,

 Thanks for bringing up this discussion.

 I'm a little puzzled for I feel like we are rehashing the same discussion from 
last year - where we agreed on a different course of action w.r.t switch to 
JDK7.

 IAC, breaking compatibility for hadoop-3 is a pretty big cost - particularly 
for users such as Yahoo/Twitter/eBay who have several clusters between which 
compatibility is paramount. 

 Now, breaking compatibility is perfectly fine over time where there is 
sufficient benefit e.g. HDFS HA or YARN in hadoop-2 (v/s hadoop-1). 

 However, I'm struggling to quantify the benefit of hadoop-3 for users for the 
cost of the breakage.

 Given that we already agreed to put in JDK7 in 2.7, and that the classpath is 
a fairly minor irritant given some existing solutions (e.g. a new default 
classloader), how do you quantify the benefit for users?

 We could just do JDK8 in hadoop-2.10 or some such, you are definitely welcome 
to run the RM role for that release.

 Furthermore, I'm really concerned that this will be used as an opportunity to 
further break compat in more egregious ways. 

 Also, are you foreseeing more compat breaks? OTOH, if we all agree that we 
should absolutely prevent compat breakages such as the client-server wire 
protocol, I feel the point of a major release is kinda lost.

 Overall, my biggest concern is the compatibility story vis-a-vis the benefit. 

 Thoughts?

thanks,
Arun


From: Andrew Wang 
Sent: Monday, March 02, 2015 3:19 PM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai

Sorry for the bad. I thought it was sending to my colleagues. 

By the way, for the JDK8 support, we (Intel) would like to investigate further 
and help, thanks.

Regards,
Kai

-Original Message-
From: Zheng, Kai 
Sent: Tuesday, March 03, 2015 8:49 AM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: RE: Looking to a Hadoop 3 release

JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090


-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew

RE: Looking to a Hadoop 3 release

2015-03-02 Thread Zheng, Kai

JDK8 support is in the consideration, looks like many issues were reported and 
resolved already.

https://issues.apache.org/jira/browse/HADOOP-11090

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew

RE: Looking to a Hadoop 3 release

2015-03-02 Thread Liu, Yi A

+1

Regards,
Yi Liu

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com] 
Sent: Tuesday, March 03, 2015 7:20 AM
To: common-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; 
hdfs-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Looking to a Hadoop 3 release

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due for 
a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will 
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a 
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to 
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months 
from now). In the past, we've had issues with our dependencies discontinuing 
support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and upgrade 
our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and other 
cat herding responsibilities. There are already quite a few changes slated for 
3.0 besides the above (for instance the shell script rewrite) so there's 
already value in a 3.0 alpha, and the more time we give downstreams to 
integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping to 
freeze incompatible changes after maybe two alphas, do a beta (with no further 
incompat changes allowed), and then finally a 3.x GA. For those keeping track, 
that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big bang 
release. For instance, it would be great if we could maintain wire 
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're likely 
maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people are 
friendly to the idea, I'd like to cut a branch-3 and start working on the first 
alpha.

Best,
Andrew

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Robert Kanter

+1  Happy to help too

On Mon, Mar 2, 2015 at 3:57 PM, Yongjun Zhang  wrote:

> Thanks Andrew for the proposal.
>
> +1, and I will be happy to help.
>
> --Yongjun
>
>
>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Yongjun Zhang

Thanks Andrew for the proposal.

+1, and I will be happy to help.

--Yongjun




On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Lei Xu

+1.  Would love to help.



On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang  wrote:
> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew



-- 
Lei (Eddy) Xu
Software Engineer, Cloudera

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Karthik Kambatla

+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.


> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>

Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.


>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>

Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?


>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities.


Will be glad to help.


> There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.


> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es

Re: Looking to a Hadoop 3 release

2015-03-02 Thread Aaron T. Myers

+1, this sounds like a good plan to me.

Thanks a lot for volunteering to take this on, Andrew.

Best,
Aaron

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>

Looking to a Hadoop 3 release

2015-03-02 Thread Andrew Wang

Hi devs,

It's been a year and a half since 2.x went GA, and I think we're about due
for a 3.x release.
Notably, there are two incompatible changes I'd like to call out, that will
have a tremendous positive impact for our users.

First, classpath isolation being done at HADOOP-11656, which has been a
long-standing request from many downstreams and Hadoop users.

Second, bumping the source and target JDK version to JDK8 (related to
HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
months from now). In the past, we've had issues with our dependencies
discontinuing support for old JDKs, so this will future-proof us.

Between the two, we'll also have quite an opportunity to clean up and
upgrade our dependencies, another common user and developer request.

I'd like to propose that we start rolling a series of monthly-ish series of
3.0 alpha releases ASAP, with myself volunteering to take on the RM and
other cat herding responsibilities. There are already quite a few changes
slated for 3.0 besides the above (for instance the shell script rewrite) so
there's already value in a 3.0 alpha, and the more time we give downstreams
to integrate, the better.

This opens up discussion about inclusion of other changes, but I'm hoping
to freeze incompatible changes after maybe two alphas, do a beta (with no
further incompat changes allowed), and then finally a 3.x GA. For those
keeping track, that means a 3.x GA in about four months.

I would also like to stress though that this is not intended to be a big
bang release. For instance, it would be great if we could maintain wire
compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
branch-2 and branch-3 similar also makes backports easier, since we're
likely maintaining 2.x for a while yet.

Please let me know any comments / concerns related to the above. If people
are friendly to the idea, I'd like to cut a branch-3 and start working on
the first alpha.

Best,
Andrew

79 matches

Mail list logo