Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-30 Thread Andrew Wang
>
>
> maybe discuss having a list @ release time. As an example, s3 and
> encryption at rest shipped in beta stage... what's in 2.8 that "we don't
> yet trust ourselves?".  Me, I'd put erasure coding in there just because
> I've no familiarity with it
>
> Quick clarification, EC isn't scheduled for 2.8. IMO it's still an open
question whether we want to include in any branch-2 release. Elliot
(wearing his Facebook hat) said he'd be hesitant to deploy it because of
the significant NN changes. This might apply to our other big users like
Yahoo or Twitter.


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-26 Thread Steve Loughran

> On 25 Nov 2015, at 22:01, Vinod Kumar Vavilapalli  wrote:
> 
> Tx for your comments, Andrew!
> 
> I did talk about it in a few discussions in the past related to this but yes, 
> we never codified the feature-level alpha/beta tags. Part of the reason why I 
> never pushed for such a codification is that (a) it is a subjective decision 
> that the feature contributors usually have the best say on and (2) voting on 
> the alpha-ness / beta-ness may not be a productive exercise in non-trivial 
> number of cases (as I have seen with the release-level tags, some users think 
> an alpha release is of production quality enough for _their_ use-cases).
> 
> That said, I agree about noting down our general recommendations on what an 
> alpha feature means, what a beta feature means etc. Let me file a JIRA for 
> this.

maybe discuss having a list @ release time. As an example, s3 and encryption at 
rest shipped in beta stage... what's in 2.8 that "we don't yet trust 
ourselves?".  Me, I'd put erasure coding in there just because I've no 
familiarity with it


> 
> The second point you made is absolutely true. Atleast on YARN / MR side, I 
> usually end up traversing (some if not all of) alpha features and making sure 
> the corresponding APIs are explicitly marked private or public unstable / 
> evolving. I do think that there is a lot of value in us  getting more 
> systematic with this - how about we do this for the feature list of 2.8 and 
> evolve the process?
> 
> In general, may be we could have a list of ‘check-list’ JIRAs that we always 
> address before every release. Few things already come to my mind:


> - Mark which features are alpha / beta and make sure the corresponding APIs, 
> public interfaces reflect the state

+ have people add JIRAs for the next version to actually mark things as 
stable/out of beta

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Vinod Kumar Vavilapalli
This is the current state from the feedback I gathered.
 - Support priorities across applications within the same queue YARN-1963
— Can push as an alpha / beta feature per Sunil
 - YARN-1197 Support changing resources of an allocated container:
— Can push as an alpha/beta feature per Wangda
 - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well most of 
it anyways.
— Can push as an alpha feature.
 - YARN Timeline Service v1.5 - YARN-4233
— Should include per Li Lu
 - YARN Timeline Service Next generation: YARN-2928
— Per analysis from Sangjin, drop this from 2.8.

One open feature status
 - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?

Updated the Roadmap wiki with the same.

Thanks
+Vinod

> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
> 
> I reviewed the current state of the YARN-2928 changes regarding its impact
> if the timeline service v.2 is disabled. It does appear that there are a
> lot of things that still do get created and enabled unconditionally
> regardless of configuration. While this is understandable when we were
> working to implement the feature, this clearly needs to be cleaned up so
> that when disabled the timeline service v.2 doesn't impact other things.
> 
> I filed a JIRA for that work:
> https://issues.apache.org/jira/browse/YARN-4356
> 
> We need to complete it before we can merge.
> 
> Somewhat related is the status of the configuration and what it means in
> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
> know there is an ongoing discussion regarding YARN-4183. We'll need to
> reflect the outcome of that discussion.
> 
> My overall impression of whether this can be done for 2.8 is that it looks
> rather challenging given the suggested timeframe. We also need to complete
> several major tasks before it is ready.
> 
> Sangjin
> 
> 
> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
> 
>> 
>> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
>> vino...@hortonworks.com> wrote:
>> 
>>>— YARN Timeline Service Next generation: YARN-2928: Lots of momentum,
>>> but clearly a work in progress. Two options here
>>>— If it is safe to ship it into 2.8 in a disable manner, we can
>>> get the early code into trunk and all the way int o2.8.
>>>— If it is not safe, it organically rolls over into 2.9
>>> 
>> 
>> I'll review the changes on YARN-2928 to see what impact it has (if any) if
>> the timeline service v.2 is disabled.
>> 
>> Another condition for it to make 2.8 is whether the branch will be in a
>> shape in a couple of weeks such that it adds value for folks that want to
>> test it. Hopefully it will become clearer soon.
>> 
>> Sangjin
>> 



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Allen Wittenauer

> On Nov 25, 2015, at 11:23 AM, Vinod Kumar Vavilapalli  
> wrote:
> 
> There are 40 odd incompatible changes in 3.x: 
> https://issues.apache.org/jira/issues/?jql=project%20in%20%28HADOOP%2C%20YARN%2C%20HDFS%2C%20MAPREDUCE%29%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%203.0.0%20AND%20fixVersion%20not%20in%20%282.6.2%2C%202.6.3%2C%202.7.1%2C%202.7.2%2C%202.7.3%2C%202.8.0%29%20and%20%22Hadoop%20Flags%22%20in%20%28%22Incompatible%20change%22%29%20ORDER%20BY%20key%20ASC%2C%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> 
> Need to dig deeper on their impact. Clearly all my local shell scripts 
> completely stopped working, it will be good to have some bridging there to 
> help users migrate.

I think you should file a JIRA on what actually broke.  I’m genuinely 
curious.

> Like I said before, I will spend more time on trunk only changes in order to 
> kick-start a 3.x discussion.
> 
> What are the incompatible changes in the 2.x line that you are talking about?

Thanks for confirming what I’ve always suspected.

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Subramaniam V K
Hi Vinod,

Thanks for driving this. Can you add YARN-2573 which includes the work done
to integrate ReservationSystem with the RM failover mechanism to your list.
This can be reviewed and committed (branch-2) also about a month back.

Cheers,
Subru

On Wed, Nov 25, 2015 at 11:37 AM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> I think we’ve converged at a high level w.r.t 2.8. And as I just sent out
> an email, I updated the Roadmap wiki reflecting the same:
> https://wiki.apache.org/hadoop/Roadmap
>
> I plan to create a 2.8 branch EOD today.
>
> The goal for all of us should be to restrict improvements & fixes to only
> (a) the feature-set documented under 2.8 in the RoadMap wiki and (b) other
> minor features that are already in 2.8.
>
> Thanks
> +Vinod
>
>
> > On Nov 11, 2015, at 12:13 PM, Vinod Kumar Vavilapalli <
> vino...@hortonworks.com> wrote:
> >
> >  - Cut a branch about two weeks from now
> >  - Do an RC mid next month (leaving ~4weeks since branch-cut)
> >  - As with 2.7.x series, the first release will still be called as early
> / alpha release in the interest of
> > — gaining downstream adoption
> > — wider testing,
> > — yet reserving our right to fix any inadvertent incompatibilities
> introduced.
>
>


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Andrew Wang
Hey Vinod,

I'm fine with the idea of alpha/beta marking in the abstract, but had a
question: do we define these terms in our compatibility policy or
elsewhere? I think it's commonly understood among us developers (alpha
means not fully tested and API unstable, beta means it's not fully tested
but is API stable), but it'd be good to have it written down.

Also I think we've only done alpha/beta tagging at the release-level
previously which is a simpler story to tell users. So it's important for
this release that alpha features set their interface stability annotations
to "evolving". There isn't a corresponding annotation for "interface
quality", but IMO that's overkill.

Thanks,
Andrew

On Wed, Nov 25, 2015 at 11:08 AM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> This is the current state from the feedback I gathered.
>  - Support priorities across applications within the same queue YARN-1963
> — Can push as an alpha / beta feature per Sunil
>  - YARN-1197 Support changing resources of an allocated container:
> — Can push as an alpha/beta feature per Wangda
>  - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> most of it anyways.
> — Can push as an alpha feature.
>  - YARN Timeline Service v1.5 - YARN-4233
> — Should include per Li Lu
>  - YARN Timeline Service Next generation: YARN-2928
> — Per analysis from Sangjin, drop this from 2.8.
>
> One open feature status
>  - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>
> Updated the Roadmap wiki with the same.
>
> Thanks
> +Vinod
>
> > On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
> >
> > I reviewed the current state of the YARN-2928 changes regarding its
> impact
> > if the timeline service v.2 is disabled. It does appear that there are a
> > lot of things that still do get created and enabled unconditionally
> > regardless of configuration. While this is understandable when we were
> > working to implement the feature, this clearly needs to be cleaned up so
> > that when disabled the timeline service v.2 doesn't impact other things.
> >
> > I filed a JIRA for that work:
> > https://issues.apache.org/jira/browse/YARN-4356
> >
> > We need to complete it before we can merge.
> >
> > Somewhat related is the status of the configuration and what it means in
> > various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
> > know there is an ongoing discussion regarding YARN-4183. We'll need to
> > reflect the outcome of that discussion.
> >
> > My overall impression of whether this can be done for 2.8 is that it
> looks
> > rather challenging given the suggested timeframe. We also need to
> complete
> > several major tasks before it is ready.
> >
> > Sangjin
> >
> >
> > On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
> >
> >>
> >> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
> >> vino...@hortonworks.com> wrote:
> >>
> >>>— YARN Timeline Service Next generation: YARN-2928: Lots of
> momentum,
> >>> but clearly a work in progress. Two options here
> >>>— If it is safe to ship it into 2.8 in a disable manner, we can
> >>> get the early code into trunk and all the way int o2.8.
> >>>— If it is not safe, it organically rolls over into 2.9
> >>>
> >>
> >> I'll review the changes on YARN-2928 to see what impact it has (if any)
> if
> >> the timeline service v.2 is disabled.
> >>
> >> Another condition for it to make 2.8 is whether the branch will be in a
> >> shape in a couple of weeks such that it adds value for folks that want
> to
> >> test it. Hopefully it will become clearer soon.
> >>
> >> Sangjin
> >>
>
>


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth
Regarding interface visibility/stability, I'm aware of 2 relevant JIRAs
right now.

HADOOP-10776 proposes to mark Public some of the security plumbing like
the FileSystem delegation token methods, UserGroupInformation, Token and
Credentials.  At this point, I think we are only fooling ourselves trying
to treat these as Private or LimitedPrivate.  I believe they are de facto
Public, because downstream applications just don't have any other reliable
way to do what these interfaces do.

HADOOP-12600 proposes to mark FileContext Stable.  I expect this one is
simply an oversight.

I've taken the bold step of marking both issues as 2.8.0 blockers.  We can
of course reconsider if this is controversial.

--Chris Nauroth




On 11/25/15, 11:30 AM, "Vinod Kumar Vavilapalli" 
wrote:

>Steve,
>
>
>> There's a lot of stuff in 2.8; I note that I'd like to see the s3a perf
>>improvements & openstack fixes in there: for which I need reviewers. I
>>don't have the spare time to do this myself.
>
>If you think they are useful, it helps to file tickets (or point out
>existing tickets), start discussion etc w.r.t these areas in order to
>attract contributors.
>
>
>> -likewise, DFSConfigKeys stayed in hdfs-server. I know it's tagged as
>>@Private, but it's long been where all the string constants for HDFS
>>options live. Forcing users to retype them in their own source is not
>>only dangerous (it only encourages typos), it actually stops you using
>>your IDE finding out where those constants get used.
>
>> We do now have a set of keys in the client, HdfsClientConfigKeys, but
>>these are still declared as @Private. Which is a mistake for the reasons
>>above, and because it encourages hadoop developers to assume that they
>>are free to make whatever changes they want to this code, and if it
>>breaks something, say "it was tagged as private²
>
>
>If these are worth going after, please file tickets under HDFS-6200 if
>they don¹t exist already.
>
>
>> 
>> 1. We have to recognise that a lot of things marked @Private are in
>>fact essential for clients to use. Not just constants, but actual
>>classes.
>> 
>> 2. We have to look hard at @LimitedPrivate and question the legitimacy
>>of tagging things as so, especially anything
>>"@InterfaceAudience.LimitedPrivate({""MapReduce"}) ‹because any YARN app
>>you write ends up needing those classes. For evidence, look at
>>DistributedShell's imports, and pick a few at random: NMClientAsyncImpl,
>>ConverterUtils being easy targets.
>
>There are existing tickets for some of these under YARN-1953 that need
>some developer love.
>
>
>> Returning to the pending 2.8.0 release, there's a way to find out
>>what's going to break: build and test things against the snapshots,
>>without waiting for the beta releases and expecting the downstream
>>projects to do it for you. If they don't build, that's a success: you've
>>found a compatibility problem to fix. If they don't test, well that's
>>trouble ‹you are in finger pointing time.
>
>
>I¹ve tried doing this in the past without much success. Some downstream
>components did pick up RCs but a majority of them needed a release -
>hence my current approach.
>
>Thanks
>+Vinod
>
>



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Vinod Kumar Vavilapalli
I think we’ve converged at a high level w.r.t 2.8. And as I just sent out an 
email, I updated the Roadmap wiki reflecting the same: 
https://wiki.apache.org/hadoop/Roadmap

I plan to create a 2.8 branch EOD today.

The goal for all of us should be to restrict improvements & fixes to only (a) 
the feature-set documented under 2.8 in the RoadMap wiki and (b) other minor 
features that are already in 2.8.

Thanks
+Vinod


> On Nov 11, 2015, at 12:13 PM, Vinod Kumar Vavilapalli 
>  wrote:
> 
>  - Cut a branch about two weeks from now
>  - Do an RC mid next month (leaving ~4weeks since branch-cut)
>  - As with 2.7.x series, the first release will still be called as early / 
> alpha release in the interest of
> — gaining downstream adoption
> — wider testing,
> — yet reserving our right to fix any inadvertent incompatibilities 
> introduced.



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Vinod Kumar Vavilapalli
Okay, tx for this clarification Chris! I dug more into this and now realized 
the actual scope of this. Given the the limited nature of this feature 
(non-Namenode etc) and the WIP nature of the larger umbrella HADOOP-11744, we 
will ship the feature but I’ll stop calling this out as a notable feature.

Thanks
+Vinod


> On Nov 25, 2015, at 12:04 PM, Chris Nauroth  wrote:
> 
> Hi Vinod,
> 
> The HDFS-8155 work is complete in branch-2 already, so feel free to
> include it in the roadmap.
> 
> For those watching the thread that aren't familiar with HDFS-8155, I want
> to call out that it was a client-side change only.  The WebHDFS client is
> capable of obtaining OAuth2 tokens and passing them along in its HTTP
> requests.  The NameNode and DataNode server side currently do not have any
> support for OAuth2, so overall, this feature is only useful in some very
> unique deployment architectures right now.  This is all discussed
> explicitly in documentation committed with HDFS-8155, but I wanted to
> prevent any mistaken assumptions for people only reading this thread.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" 
> wrote:
> 
>> This is the current state from the feedback I gathered.
>> - Support priorities across applications within the same queue YARN-1963
>>   ‹ Can push as an alpha / beta feature per Sunil
>> - YARN-1197 Support changing resources of an allocated container:
>>   ‹ Can push as an alpha/beta feature per Wangda
>> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>> most of it anyways.
>>   ‹ Can push as an alpha feature.
>> - YARN Timeline Service v1.5 - YARN-4233
>>   ‹ Should include per Li Lu
>> - YARN Timeline Service Next generation: YARN-2928
>>   ‹ Per analysis from Sangjin, drop this from 2.8.
>> 
>> One open feature status
>> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>> 
>> Updated the Roadmap wiki with the same.
>> 
>> Thanks
>> +Vinod
>> 
>>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
>>> 
>>> I reviewed the current state of the YARN-2928 changes regarding its
>>> impact
>>> if the timeline service v.2 is disabled. It does appear that there are a
>>> lot of things that still do get created and enabled unconditionally
>>> regardless of configuration. While this is understandable when we were
>>> working to implement the feature, this clearly needs to be cleaned up so
>>> that when disabled the timeline service v.2 doesn't impact other things.
>>> 
>>> I filed a JIRA for that work:
>>> https://issues.apache.org/jira/browse/YARN-4356
>>> 
>>> We need to complete it before we can merge.
>>> 
>>> Somewhat related is the status of the configuration and what it means in
>>> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
>>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>>> reflect the outcome of that discussion.
>>> 
>>> My overall impression of whether this can be done for 2.8 is that it
>>> looks
>>> rather challenging given the suggested timeframe. We also need to
>>> complete
>>> several major tasks before it is ready.
>>> 
>>> Sangjin
>>> 
>>> 
>>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
>>> 
 
 On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
 vino...@hortonworks.com> wrote:
 
>   ‹ YARN Timeline Service Next generation: YARN-2928: Lots of
> momentum,
> but clearly a work in progress. Two options here
>   ‹ If it is safe to ship it into 2.8 in a disable manner, we can
> get the early code into trunk and all the way int o2.8.
>   ‹ If it is not safe, it organically rolls over into 2.9
> 
 
 I'll review the changes on YARN-2928 to see what impact it has (if
 any) if
 the timeline service v.2 is disabled.
 
 Another condition for it to make 2.8 is whether the branch will be in a
 shape in a couple of weeks such that it adds value for folks that want
 to
 test it. Hopefully it will become clearer soon.
 
 Sangjin
 
>> 
> 
> 



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Vinod Kumar Vavilapalli
Tx for your comments, Andrew!

I did talk about it in a few discussions in the past related to this but yes, 
we never codified the feature-level alpha/beta tags. Part of the reason why I 
never pushed for such a codification is that (a) it is a subjective decision 
that the feature contributors usually have the best say on and (2) voting on 
the alpha-ness / beta-ness may not be a productive exercise in non-trivial 
number of cases (as I have seen with the release-level tags, some users think 
an alpha release is of production quality enough for _their_ use-cases).

That said, I agree about noting down our general recommendations on what an 
alpha feature means, what a beta feature means etc. Let me file a JIRA for this.

The second point you made is absolutely true. Atleast on YARN / MR side, I 
usually end up traversing (some if not all of) alpha features and making sure 
the corresponding APIs are explicitly marked private or public unstable / 
evolving. I do think that there is a lot of value in us  getting more 
systematic with this - how about we do this for the feature list of 2.8 and 
evolve the process?

In general, may be we could have a list of ‘check-list’ JIRAs that we always 
address before every release. Few things already come to my mind:
 - Mark which features are alpha / beta and make sure the corresponding APIs, 
public interfaces reflect the state
 - Revise all newly added configuration properties to make sure they follow our 
general naming patterns. New contributors sometimes create non-standard 
properties that we come to regret supporting.
 - Generate a list of newly added public entry-points and validate that they 
are all indeed meant to be public
 - [...]

Thoughts?

+Vinod


> On Nov 25, 2015, at 11:47 AM, Andrew Wang  wrote:
> 
> Hey Vinod,
> 
> I'm fine with the idea of alpha/beta marking in the abstract, but had a
> question: do we define these terms in our compatibility policy or
> elsewhere? I think it's commonly understood among us developers (alpha
> means not fully tested and API unstable, beta means it's not fully tested
> but is API stable), but it'd be good to have it written down.
> 
> Also I think we've only done alpha/beta tagging at the release-level
> previously which is a simpler story to tell users. So it's important for
> this release that alpha features set their interface stability annotations
> to "evolving". There isn't a corresponding annotation for "interface
> quality", but IMO that's overkill.
> 
> Thanks,
> Andrew
> 
> On Wed, Nov 25, 2015 at 11:08 AM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
> 
>> This is the current state from the feedback I gathered.
>> - Support priorities across applications within the same queue YARN-1963
>>— Can push as an alpha / beta feature per Sunil
>> - YARN-1197 Support changing resources of an allocated container:
>>— Can push as an alpha/beta feature per Wangda
>> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>> most of it anyways.
>>— Can push as an alpha feature.
>> - YARN Timeline Service v1.5 - YARN-4233
>>— Should include per Li Lu
>> - YARN Timeline Service Next generation: YARN-2928
>>— Per analysis from Sangjin, drop this from 2.8.
>> 
>> One open feature status
>> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>> 
>> Updated the Roadmap wiki with the same.
>> 
>> Thanks
>> +Vinod
>> 
>>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
>>> 
>>> I reviewed the current state of the YARN-2928 changes regarding its
>> impact
>>> if the timeline service v.2 is disabled. It does appear that there are a
>>> lot of things that still do get created and enabled unconditionally
>>> regardless of configuration. While this is understandable when we were
>>> working to implement the feature, this clearly needs to be cleaned up so
>>> that when disabled the timeline service v.2 doesn't impact other things.
>>> 
>>> I filed a JIRA for that work:
>>> https://issues.apache.org/jira/browse/YARN-4356
>>> 
>>> We need to complete it before we can merge.
>>> 
>>> Somewhat related is the status of the configuration and what it means in
>>> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
>>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>>> reflect the outcome of that discussion.
>>> 
>>> My overall impression of whether this can be done for 2.8 is that it
>> looks
>>> rather challenging given the suggested timeframe. We also need to
>> complete
>>> several major tasks before it is ready.
>>> 
>>> Sangjin
>>> 
>>> 
>>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
>>> 
 
 On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
 vino...@hortonworks.com> wrote:
 
>   — YARN Timeline Service Next generation: YARN-2928: Lots of
>> momentum,
> but clearly a work in progress. Two options here
>   — If it is safe to 

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth
+1.  Thanks, Vinod.

--Chris Nauroth




On 11/25/15, 1:45 PM, "Vinod Kumar Vavilapalli"  wrote:

>Okay, tx for this clarification Chris! I dug more into this and now
>realized the actual scope of this. Given the the limited nature of this
>feature (non-Namenode etc) and the WIP nature of the larger umbrella
>HADOOP-11744, we will ship the feature but I’ll stop calling this out as
>a notable feature.
>
>Thanks
>+Vinod
>
>
>> On Nov 25, 2015, at 12:04 PM, Chris Nauroth 
>>wrote:
>> 
>> Hi Vinod,
>> 
>> The HDFS-8155 work is complete in branch-2 already, so feel free to
>> include it in the roadmap.
>> 
>> For those watching the thread that aren't familiar with HDFS-8155, I
>>want
>> to call out that it was a client-side change only.  The WebHDFS client
>>is
>> capable of obtaining OAuth2 tokens and passing them along in its HTTP
>> requests.  The NameNode and DataNode server side currently do not have
>>any
>> support for OAuth2, so overall, this feature is only useful in some very
>> unique deployment architectures right now.  This is all discussed
>> explicitly in documentation committed with HDFS-8155, but I wanted to
>> prevent any mistaken assumptions for people only reading this thread.
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" 
>> wrote:
>> 
>>> This is the current state from the feedback I gathered.
>>> - Support priorities across applications within the same queue
>>>YARN-1963
>>>   ‹ Can push as an alpha / beta feature per Sunil
>>> - YARN-1197 Support changing resources of an allocated container:
>>>   ‹ Can push as an alpha/beta feature per Wangda
>>> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>>> most of it anyways.
>>>   ‹ Can push as an alpha feature.
>>> - YARN Timeline Service v1.5 - YARN-4233
>>>   ‹ Should include per Li Lu
>>> - YARN Timeline Service Next generation: YARN-2928
>>>   ‹ Per analysis from Sangjin, drop this from 2.8.
>>> 
>>> One open feature status
>>> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>>> 
>>> Updated the Roadmap wiki with the same.
>>> 
>>> Thanks
>>> +Vinod
>>> 
 On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
 
 I reviewed the current state of the YARN-2928 changes regarding its
 impact
 if the timeline service v.2 is disabled. It does appear that there
are a
 lot of things that still do get created and enabled unconditionally
 regardless of configuration. While this is understandable when we were
 working to implement the feature, this clearly needs to be cleaned up
so
 that when disabled the timeline service v.2 doesn't impact other
things.
 
 I filed a JIRA for that work:
 https://issues.apache.org/jira/browse/YARN-4356
 
 We need to complete it before we can merge.
 
 Somewhat related is the status of the configuration and what it means
in
 various contexts (client/app-side vs. server-side, v.1 vs. v.2,
etc.). I
 know there is an ongoing discussion regarding YARN-4183. We'll need to
 reflect the outcome of that discussion.
 
 My overall impression of whether this can be done for 2.8 is that it
 looks
 rather challenging given the suggested timeframe. We also need to
 complete
 several major tasks before it is ready.
 
 Sangjin
 
 
 On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
 
> 
> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
> vino...@hortonworks.com> wrote:
> 
>>   ‹ YARN Timeline Service Next generation: YARN-2928: Lots of
>> momentum,
>> but clearly a work in progress. Two options here
>>   ‹ If it is safe to ship it into 2.8 in a disable manner, we
>>can
>> get the early code into trunk and all the way int o2.8.
>>   ‹ If it is not safe, it organically rolls over into 2.9
>> 
> 
> I'll review the changes on YARN-2928 to see what impact it has (if
> any) if
> the timeline service v.2 is disabled.
> 
> Another condition for it to make 2.8 is whether the branch will be
>in a
> shape in a couple of weeks such that it adds value for folks that
>want
> to
> test it. Hopefully it will become clearer soon.
> 
> Sangjin
> 
>>> 
>> 
>> 
>
>



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Andrew Wang
SGTM, thanks Vinod! LMK if you need reviews on any of that.

Regarding the release checklist, another item I'd add is updating the
release notes in the project documentation, we've forgotten in the past.

On Wed, Nov 25, 2015 at 2:01 PM, Vinod Kumar Vavilapalli  wrote:

> Tx for your comments, Andrew!
>
> I did talk about it in a few discussions in the past related to this but
> yes, we never codified the feature-level alpha/beta tags. Part of the
> reason why I never pushed for such a codification is that (a) it is a
> subjective decision that the feature contributors usually have the best say
> on and (2) voting on the alpha-ness / beta-ness may not be a productive
> exercise in non-trivial number of cases (as I have seen with the
> release-level tags, some users think an alpha release is of production
> quality enough for _their_ use-cases).
>
> That said, I agree about noting down our general recommendations on what
> an alpha feature means, what a beta feature means etc. Let me file a JIRA
> for this.
>
> The second point you made is absolutely true. Atleast on YARN / MR side, I
> usually end up traversing (some if not all of) alpha features and making
> sure the corresponding APIs are explicitly marked private or public
> unstable / evolving. I do think that there is a lot of value in us  getting
> more systematic with this - how about we do this for the feature list of
> 2.8 and evolve the process?
>
> In general, may be we could have a list of ‘check-list’ JIRAs that we
> always address before every release. Few things already come to my mind:
>  - Mark which features are alpha / beta and make sure the corresponding
> APIs, public interfaces reflect the state
>  - Revise all newly added configuration properties to make sure they
> follow our general naming patterns. New contributors sometimes create
> non-standard properties that we come to regret supporting.
>  - Generate a list of newly added public entry-points and validate that
> they are all indeed meant to be public
>  - [...]
>
> Thoughts?
>
> +Vinod
>
>
> > On Nov 25, 2015, at 11:47 AM, Andrew Wang 
> wrote:
> >
> > Hey Vinod,
> >
> > I'm fine with the idea of alpha/beta marking in the abstract, but had a
> > question: do we define these terms in our compatibility policy or
> > elsewhere? I think it's commonly understood among us developers (alpha
> > means not fully tested and API unstable, beta means it's not fully tested
> > but is API stable), but it'd be good to have it written down.
> >
> > Also I think we've only done alpha/beta tagging at the release-level
> > previously which is a simpler story to tell users. So it's important for
> > this release that alpha features set their interface stability
> annotations
> > to "evolving". There isn't a corresponding annotation for "interface
> > quality", but IMO that's overkill.
> >
> > Thanks,
> > Andrew
> >
> > On Wed, Nov 25, 2015 at 11:08 AM, Vinod Kumar Vavilapalli <
> > vino...@apache.org> wrote:
> >
> >> This is the current state from the feedback I gathered.
> >> - Support priorities across applications within the same queue YARN-1963
> >>— Can push as an alpha / beta feature per Sunil
> >> - YARN-1197 Support changing resources of an allocated container:
> >>— Can push as an alpha/beta feature per Wangda
> >> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> >> most of it anyways.
> >>— Can push as an alpha feature.
> >> - YARN Timeline Service v1.5 - YARN-4233
> >>— Should include per Li Lu
> >> - YARN Timeline Service Next generation: YARN-2928
> >>— Per analysis from Sangjin, drop this from 2.8.
> >>
> >> One open feature status
> >> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
> >>
> >> Updated the Roadmap wiki with the same.
> >>
> >> Thanks
> >> +Vinod
> >>
> >>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
> >>>
> >>> I reviewed the current state of the YARN-2928 changes regarding its
> >> impact
> >>> if the timeline service v.2 is disabled. It does appear that there are
> a
> >>> lot of things that still do get created and enabled unconditionally
> >>> regardless of configuration. While this is understandable when we were
> >>> working to implement the feature, this clearly needs to be cleaned up
> so
> >>> that when disabled the timeline service v.2 doesn't impact other
> things.
> >>>
> >>> I filed a JIRA for that work:
> >>> https://issues.apache.org/jira/browse/YARN-4356
> >>>
> >>> We need to complete it before we can merge.
> >>>
> >>> Somewhat related is the status of the configuration and what it means
> in
> >>> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.).
> I
> >>> know there is an ongoing discussion regarding YARN-4183. We'll need to
> >>> reflect the outcome of that discussion.
> >>>
> >>> My overall impression of whether this can be done for 2.8 is that it
> >> looks
> >>> rather challenging given the 

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Chris Nauroth
Hi Vinod,

The HDFS-8155 work is complete in branch-2 already, so feel free to
include it in the roadmap.

For those watching the thread that aren't familiar with HDFS-8155, I want
to call out that it was a client-side change only.  The WebHDFS client is
capable of obtaining OAuth2 tokens and passing them along in its HTTP
requests.  The NameNode and DataNode server side currently do not have any
support for OAuth2, so overall, this feature is only useful in some very
unique deployment architectures right now.  This is all discussed
explicitly in documentation committed with HDFS-8155, but I wanted to
prevent any mistaken assumptions for people only reading this thread.

--Chris Nauroth




On 11/25/15, 11:08 AM, "Vinod Kumar Vavilapalli" 
wrote:

>This is the current state from the feedback I gathered.
> - Support priorities across applications within the same queue YARN-1963
>‹ Can push as an alpha / beta feature per Sunil
> - YARN-1197 Support changing resources of an allocated container:
>‹ Can push as an alpha/beta feature per Wangda
> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
>most of it anyways.
>‹ Can push as an alpha feature.
> - YARN Timeline Service v1.5 - YARN-4233
>‹ Should include per Li Lu
> - YARN Timeline Service Next generation: YARN-2928
>‹ Per analysis from Sangjin, drop this from 2.8.
>
>One open feature status
> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>
>Updated the Roadmap wiki with the same.
>
>Thanks
>+Vinod
>
>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
>> 
>> I reviewed the current state of the YARN-2928 changes regarding its
>>impact
>> if the timeline service v.2 is disabled. It does appear that there are a
>> lot of things that still do get created and enabled unconditionally
>> regardless of configuration. While this is understandable when we were
>> working to implement the feature, this clearly needs to be cleaned up so
>> that when disabled the timeline service v.2 doesn't impact other things.
>> 
>> I filed a JIRA for that work:
>> https://issues.apache.org/jira/browse/YARN-4356
>> 
>> We need to complete it before we can merge.
>> 
>> Somewhat related is the status of the configuration and what it means in
>> various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
>> know there is an ongoing discussion regarding YARN-4183. We'll need to
>> reflect the outcome of that discussion.
>> 
>> My overall impression of whether this can be done for 2.8 is that it
>>looks
>> rather challenging given the suggested timeframe. We also need to
>>complete
>> several major tasks before it is ready.
>> 
>> Sangjin
>> 
>> 
>> On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
>> 
>>> 
>>> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
>>> vino...@hortonworks.com> wrote:
>>> 
‹ YARN Timeline Service Next generation: YARN-2928: Lots of
momentum,
 but clearly a work in progress. Two options here
‹ If it is safe to ship it into 2.8 in a disable manner, we can
 get the early code into trunk and all the way int o2.8.
‹ If it is not safe, it organically rolls over into 2.9
 
>>> 
>>> I'll review the changes on YARN-2928 to see what impact it has (if
>>>any) if
>>> the timeline service v.2 is disabled.
>>> 
>>> Another condition for it to make 2.8 is whether the branch will be in a
>>> shape in a couple of weeks such that it adds value for folks that want
>>>to
>>> test it. Hopefully it will become clearer soon.
>>> 
>>> Sangjin
>>> 
>



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Vinod Kumar Vavilapalli
Haohui,

It’ll help to document this whole line of discussion about hdfs jar change and 
its impact/non-impact for existing users so there is less confusion.

Thanks
+Vinod


> On Nov 11, 2015, at 3:26 PM, Haohui Mai  wrote:
> 
> bq. If and only if they take the Hadoop class path at face value.
> Many applications don’t because of conflicting dependencies and
> instead import specific jars.
> 
> We do make the assumptions that applications need to pick up all the
> dependency (either automatically or manually). The situation is
> similar with adding a new dependency into hdfs in a minor release.
> 
> Maven / gradle obviously help, but I'd love to hear more about it how
> you get it to work. In trunk hadoop-env.sh adds 118 jars into the
> class path. Are you manually importing 118 jars for every single
> applications?
> 
> 
> 
> On Wed, Nov 11, 2015 at 3:09 PM, Haohui Mai  wrote:
>> bq. currently pulling in hadoop-client gives downstream apps
>> hadoop-hdfs-client, but not hadoop-hdfs server side, right?
>> 
>> Right now hadoop-client pulls in hadoop-hdfs directly to ensure a
>> smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x?
>> 
>> On Wed, Nov 11, 2015 at 3:00 PM, Steve Loughran  
>> wrote:
>>> 
 On 11 Nov 2015, at 22:15, Haohui Mai  wrote:
 
 bq.  it basically makes the assumption that everyone recompiles for
 every minor release.
 
 I don't think that the statement holds. HDFS-6200 keeps classes in the
 same package. hdfs-client becomes a transitive dependency of the
 original hdfs jar.
 
 Applications continue to work without recompilation as the classes
 will be in the same name and will be available in the classpath. They
 have the option of switching to depending only on hdfs-client to
 minimize the dependency when they are comfortable.
 
 I'm not claiming that there are no bugs in HDFS-6200, but just like
 other features we discover bugs and fix them continuously.
 
 ~Haohui
 
>>> 
>>> currently pulling in hadoop-client gives downstream apps 
>>> hadoop-hdfs-client, but not hadoop-hdfs server side, right?
> 



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Vinod Kumar Vavilapalli
Steve,


> There's a lot of stuff in 2.8; I note that I'd like to see the s3a perf 
> improvements & openstack fixes in there: for which I need reviewers. I don't 
> have the spare time to do this myself.

If you think they are useful, it helps to file tickets (or point out existing 
tickets), start discussion etc w.r.t these areas in order to attract 
contributors.


> -likewise, DFSConfigKeys stayed in hdfs-server. I know it's tagged as 
> @Private, but it's long been where all the string constants for HDFS options 
> live. Forcing users to retype them in their own source is not only dangerous 
> (it only encourages typos), it actually stops you using your IDE finding out 
> where those constants get used. 

> We do now have a set of keys in the client, HdfsClientConfigKeys, but these 
> are still declared as @Private. Which is a mistake for the reasons above, and 
> because it encourages hadoop developers to assume that they are free to make 
> whatever changes they want to this code, and if it breaks something, say "it 
> was tagged as private”


If these are worth going after, please file tickets under HDFS-6200 if they 
don’t exist already.


> 
> 1. We have to recognise that a lot of things marked @Private are in fact 
> essential for clients to use. Not just constants, but actual classes.
> 
> 2. We have to look hard at @LimitedPrivate and question the legitimacy of 
> tagging things as so, especially anything 
> "@InterfaceAudience.LimitedPrivate({""MapReduce"}) —because any YARN app you 
> write ends up needing those classes. For evidence, look at DistributedShell's 
> imports, and pick a few at random: NMClientAsyncImpl, ConverterUtils being 
> easy targets.

There are existing tickets for some of these under YARN-1953 that need some 
developer love.


> Returning to the pending 2.8.0 release, there's a way to find out what's 
> going to break: build and test things against the snapshots, without waiting 
> for the beta releases and expecting the downstream projects to do it for you. 
> If they don't build, that's a success: you've found a compatibility problem 
> to fix. If they don't test, well that's trouble —you are in finger pointing 
> time.


I’ve tried doing this in the past without much success. Some downstream 
components did pick up RCs but a majority of them needed a release - hence my 
current approach.

Thanks
+Vinod



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-13 Thread Sangjin Lee
I reviewed the current state of the YARN-2928 changes regarding its impact
if the timeline service v.2 is disabled. It does appear that there are a
lot of things that still do get created and enabled unconditionally
regardless of configuration. While this is understandable when we were
working to implement the feature, this clearly needs to be cleaned up so
that when disabled the timeline service v.2 doesn't impact other things.

I filed a JIRA for that work:
https://issues.apache.org/jira/browse/YARN-4356

We need to complete it before we can merge.

Somewhat related is the status of the configuration and what it means in
various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
know there is an ongoing discussion regarding YARN-4183. We'll need to
reflect the outcome of that discussion.

My overall impression of whether this can be done for 2.8 is that it looks
rather challenging given the suggested timeframe. We also need to complete
several major tasks before it is ready.

Sangjin


On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:

>
> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
> vino...@hortonworks.com> wrote:
>
>> — YARN Timeline Service Next generation: YARN-2928: Lots of momentum,
>> but clearly a work in progress. Two options here
>> — If it is safe to ship it into 2.8 in a disable manner, we can
>> get the early code into trunk and all the way int o2.8.
>> — If it is not safe, it organically rolls over into 2.9
>>
>
> I'll review the changes on YARN-2928 to see what impact it has (if any) if
> the timeline service v.2 is disabled.
>
> Another condition for it to make 2.8 is whether the branch will be in a
> shape in a couple of weeks such that it adds value for folks that want to
> test it. Hopefully it will become clearer soon.
>
> Sangjin
>


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-12 Thread Sunil Govind
Thank you Vinod for starting this discussion.

+1 for getting a beta/alpha version for Application Priority (YARN-1963).
Major patches are already in and MAPREDUCE-5870 (making MR also to use
app-priority) is in final review stages. This feature is also tested
in-house. With 2.8.0 alpha, I feel we can see more use case based testing
on same.

A documentation JIRA is already raised for App Priority, and I will mark it
for 2.8 so that it will be done before the RC cut.

Thank You
Sunil
On Thu, Nov 12, 2015 at 2:41 AM Vinod Vavilapalli 
wrote:

> I’ll let others comment on specific features.
>
> Regarding the 3.x vs 2.x point, as I noted before on other threads, given
> all the incompatibilities in trunk it will be ways off before users can run
> their production workloads on a 3.x release. Therefore, as I was proposing
> before, we should continue the 2.x lines, but soon get started on rolling
> out a release candidate based off trunk.
>
> Like with 2.8, I’d like to go back and prepare some notes on trunk’s
> content so we can objectively discuss about it.
>
> +Vinod


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-12 Thread Steve Loughran

There's a lot of stuff in 2.8; I note that I'd like to see the s3a perf 
improvements & openstack fixes in there: for which I need reviewers. I don't 
have the spare time to do this myself.

I've already been building & testing both Apache Slider (incubating) and Apache 
Spark against both 2.8.0-SNAPSHOT & 3.0.0-SNAPSHOT. 

What's been troublesome for builds which use maven as the way of managing 
dependencies (I'm ignoring the fact that spark *also* has an SBT build with ivy 
doing dep management)? HDFS client

-hadoop-hdfs-client pulled HdfsConfiguration. I'd been explicitly creating this 
to force in hdfs-default.xml & hdfs-site.xml loading, so that I could do sanity 
checks on things like security settings prior to attempting AM launch.

-likewise, DFSConfigKeys stayed in hdfs-server. I know it's tagged as @Private, 
but it's long been where all the string constants for HDFS options live. 
Forcing users to retype them in their own source is not only dangerous (it only 
encourages typos), it actually stops you using your IDE finding out where those 
constants get used. 

We do now have a set of keys in the client, HdfsClientConfigKeys, but these are 
still declared as @Private. Which is a mistake for the reasons above, and 
because it encourages hadoop developers to assume that they are free to make 
whatever changes they want to this code, and if it breaks something, say "it 
was tagged as private"

1. We have to recognise that a lot of things marked @Private are in fact 
essential for clients to use. Not just constants, but actual classes.

2. We have to look hard at @LimitedPrivate and question the legitimacy of 
tagging things as so, especially anything 
"@InterfaceAudience.LimitedPrivate({""MapReduce"}) —because any YARN app you 
write ends up needing those classes. For evidence, look at DistributedShell's 
imports, and pick a few at random: NMClientAsyncImpl, ConverterUtils being easy 
targets.

3. Or for real fun, UGI: @InterfaceAudience.LimitedPrivate({"HDFS", 
"MapReduce", "HBase", "Hive", "Oozie"})

I'd advocate marking all "MapReduce" as "YarnApp" and have people working on 
those classes accept that they will be used downstream and treat changes with 
caution. Yes, they may be messy, but how things are used. At least with a 
modern IDE you can add in the downstream projects and identify those uses with 
ease.

In the end SLIDER-948 addressed the problems for me. I switched to pulling in 
hadoop-hdfs *and* copied and pasted all the DFSConfigurationKeys I used into my 
own file of constants. 

HDFS-9301 should make these changes things I could revert —and other projects 
not notice them ever existing —but I've left them them in to isolate me from 
any more situations like this. To be completely ruthless: I don't trust that 
code to not break my builds any more.

Behaviour-wise, I've not seen much in the way of changes; all tests work the 
same. Oh and Spark wouldn't compile against 3.0 as an exception tagged as 
@Deprecated since Hadoop 0.18 got pulled. Trivially fixed.

Returning to the pending 2.8.0 release, there's a way to find out what's going 
to break: build and test things against the snapshots, without waiting for the 
beta releases and expecting the downstream projects to do it for you. If they 
don't build, that's a success: you've found a compatibility problem to fix. If 
they don't test, well that's trouble —you are in finger pointing time.

-Steve

> On 11 Nov 2015, at 23:26, Haohui Mai  wrote:
> 
> bq. If and only if they take the Hadoop class path at face value.
> Many applications don’t because of conflicting dependencies and
> instead import specific jars.
> 
> We do make the assumptions that applications need to pick up all the
> dependency (either automatically or manually). The situation is
> similar with adding a new dependency into hdfs in a minor release.
> 
> Maven / gradle obviously help, but I'd love to hear more about it how
> you get it to work. In trunk hadoop-env.sh adds 118 jars into the
> class path. Are you manually importing 118 jars for every single
> applications?
> 
> 
> 
> On Wed, Nov 11, 2015 at 3:09 PM, Haohui Mai  wrote:
>> bq. currently pulling in hadoop-client gives downstream apps
>> hadoop-hdfs-client, but not hadoop-hdfs server side, right?
>> 
>> Right now hadoop-client pulls in hadoop-hdfs directly to ensure a
>> smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x?
>> 





Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-12 Thread Karthik Kambatla
I am really against the notion of calling x.y.0 releases alpha/beta; it is
very confusing. If we think a release is alpha/beta quality, why not
release it as x.y.0-alpha or x.y.0-beta, and follow it up eventually with
x.y.0 GA.

I am in favor of labeling any of the newer features to be of alpha/beta
quality.

SharedCache is another close to done feature.

On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli  wrote:

> Agreed on not mixing this with major release discussions.
>
> Okay, I just finished my review of 2.8 content.
>
> A quick summary follows.
>
> Current state of originally planned items
>
>  - Nearly Done / Done and so need to close down quickly
> — Support *both* JDK7 and JDK8 runtimes HADOOP-11090
> — Supporting non-exclusive node-labels: YARN-3214: Done, can push as
> an alpha / beta feature
> — Support priorities across applications within the same queue
> YARN-1963: Can push as an alpha / beta feature
>
>  - Definitely have to move out into 2.9 and beyond
> — Early work for disk and network isolation in YARN: YARN-2139,
> YARN-2140: Early noise, some critical pieces designed, done but not a lot
> of movement of late
> — Classpath isolation for downstream clients HADOOP-11656: Lots of
> chatter a while ago, not much movement of late
> — Support for Erasure Codes in HDFS HDFS-7285<
> https://issues.apache.org/jira/browse/HDFS-7285>: Moved out to 2.9 in the
> interest of stability / bake-in
>
> Non-planned features that went into 2.8.0
>
> — The overall list of new features:
> https://issues.apache.org/jira/issues/?filter=12333994
> — HDFS-6200 Create a separate jar for hdfs-client: Compatible
> improvement - no dimension of alpha/betaness here.
> — HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
> — Stability improvements ready to use:
> — HDFS-8008Support client-side back off when the datanodes are
> congested
> — HDFS-8009Signal congestion on the DataNode
> — YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> most of it anyways. Can push as an alpha feature.
> — YARN-1197 Support changing resources of an allocated container: Can
> push as an alpha/beta feature
>
> Items in progress to think about in 2 weeks
>
> — YARN Timeline Service v1.5 - YARN-4233: A short term bridge before
> YARN-2928 comes around. I think this should go in given the tremendous
> activity recently.
> — YARN Timeline Service Next generation: YARN-2928: Lots of momentum,
> but clearly a work in progress. Two options here
> — If it is safe to ship it into 2.8 in a disable manner, we can
> get the early code into trunk and all the way int o2.8.
> — If it is not safe, it organically rolls over into 2.9
> — Compatibility tools to catch backwards, forwards compatibility
> issues at patch submission, release times. Some of it is captured at
> YARN-3292. This also involves resurrecting jdiff
> (HADOOP-11776/YARN-3426/MAPREDUCE-6310) and/or investing in new tools.
>
> This is my plan of action for now in terms of the release itself
>
>  - Cut a branch about two weeks from now
>  - Do an RC mid next month (leaving ~4weeks since branch-cut)
>  - As with 2.7.x series, the first release will still be called as early /
> alpha release in the interest of
> — gaining downstream adoption
> — wider testing,
> — yet reserving our right to fix any inadvertent incompatibilities
> introduced.
>
> If we can get answers on “Items to think about now” during this and next
> week, we will overall be in good shape.
>
> Thoughts?
>
> Thanks
> +Vinod
> PS:As you may have noted above, this time around, I want to do something
> that we’ve always wanted to do, but never explicitly did. I’m calling out
> readiness of each feature as they stand today so we can inform our users
> better of what they can start relying on in production clusters.
>
>
> On Oct 5, 2015, at 11:53 AM, Colin P. McCabe > wrote:
>
> I think it makes sense to have a 2.8 release since there are a
> tremendous number of JIRAs in 2.8 that are not in 2.7.  Doing a 3.x
> release seems like something we should consider separately since it
> would not have the same compatibility guarantees as a 2.8 release.
> There's a pretty big delta between trunk and 2.8 as well.
>
> cheers,
> Colin
>
> On Sat, Sep 26, 2015 at 1:36 PM, Chris Douglas  > wrote:
> With two active sustaining branches (2.6, 2.7), what would you think
> of releasing trunk as 3.x instead of pushing 2.8? There are many new
> features (EC, Y1197, etc.), and trunk could be the source of several
> alpha/beta releases before we fork the 3.x line. -C
>
> On Sat, Sep 26, 2015 at 12:49 PM, Vinod Vavilapalli
> > wrote:
> As you may have noted, 2.8.0 got completely derailed what with 2.7.x and
> the unusually long 2.6.1 

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-12 Thread Karthik Kambatla
Did we consider cutting a branch-3 that borrows relatively compatible
patches from trunk to run the 3.x line? That said, I would like for us to
really tighten our compatibility policies and actually stick to them
starting the next major release.

On Wed, Nov 11, 2015 at 1:11 PM, Vinod Vavilapalli 
wrote:

> I’ll let others comment on specific features.
>
> Regarding the 3.x vs 2.x point, as I noted before on other threads, given
> all the incompatibilities in trunk it will be ways off before users can run
> their production workloads on a 3.x release. Therefore, as I was proposing
> before, we should continue the 2.x lines, but soon get started on rolling
> out a release candidate based off trunk.
>
> Like with 2.8, I’d like to go back and prepare some notes on trunk’s
> content so we can objectively discuss about it.
>
> +Vinod


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Sangjin Lee
On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli  wrote:

> — YARN Timeline Service Next generation: YARN-2928: Lots of momentum,
> but clearly a work in progress. Two options here
> — If it is safe to ship it into 2.8 in a disable manner, we can
> get the early code into trunk and all the way int o2.8.
> — If it is not safe, it organically rolls over into 2.9
>

I'll review the changes on YARN-2928 to see what impact it has (if any) if
the timeline service v.2 is disabled.

Another condition for it to make 2.8 is whether the branch will be in a
shape in a couple of weeks such that it adds value for folks that want to
test it. Hopefully it will become clearer soon.

Sangjin


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Vinod Vavilapalli
Agreed on not mixing this with major release discussions.

Okay, I just finished my review of 2.8 content.

A quick summary follows.

Current state of originally planned items

 - Nearly Done / Done and so need to close down quickly
— Support *both* JDK7 and JDK8 runtimes HADOOP-11090
— Supporting non-exclusive node-labels: YARN-3214: Done, can push as an 
alpha / beta feature
— Support priorities across applications within the same queue YARN-1963: 
Can push as an alpha / beta feature

 - Definitely have to move out into 2.9 and beyond
— Early work for disk and network isolation in YARN: YARN-2139, YARN-2140: 
Early noise, some critical pieces designed, done but not a lot of movement of 
late
— Classpath isolation for downstream clients HADOOP-11656: Lots of chatter 
a while ago, not much movement of late
— Support for Erasure Codes in HDFS 
HDFS-7285: Moved out to 2.9 in 
the interest of stability / bake-in

Non-planned features that went into 2.8.0

— The overall list of new features: 
https://issues.apache.org/jira/issues/?filter=12333994
— HDFS-6200 Create a separate jar for hdfs-client: Compatible improvement - 
no dimension of alpha/betaness here.
— HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
— Stability improvements ready to use:
— HDFS-8008Support client-side back off when the datanodes are 
congested
— HDFS-8009Signal congestion on the DataNode
— YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well most 
of it anyways. Can push as an alpha feature.
— YARN-1197 Support changing resources of an allocated container: Can push 
as an alpha/beta feature

Items in progress to think about in 2 weeks

— YARN Timeline Service v1.5 - YARN-4233: A short term bridge before 
YARN-2928 comes around. I think this should go in given the tremendous activity 
recently.
— YARN Timeline Service Next generation: YARN-2928: Lots of momentum, but 
clearly a work in progress. Two options here
— If it is safe to ship it into 2.8 in a disable manner, we can get the 
early code into trunk and all the way int o2.8.
— If it is not safe, it organically rolls over into 2.9
— Compatibility tools to catch backwards, forwards compatibility issues at 
patch submission, release times. Some of it is captured at YARN-3292. This also 
involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310) and/or 
investing in new tools.

This is my plan of action for now in terms of the release itself

 - Cut a branch about two weeks from now
 - Do an RC mid next month (leaving ~4weeks since branch-cut)
 - As with 2.7.x series, the first release will still be called as early / 
alpha release in the interest of
— gaining downstream adoption
— wider testing,
— yet reserving our right to fix any inadvertent incompatibilities 
introduced.

If we can get answers on “Items to think about now” during this and next week, 
we will overall be in good shape.

Thoughts?

Thanks
+Vinod
PS:As you may have noted above, this time around, I want to do something that 
we’ve always wanted to do, but never explicitly did. I’m calling out readiness 
of each feature as they stand today so we can inform our users better of what 
they can start relying on in production clusters.


On Oct 5, 2015, at 11:53 AM, Colin P. McCabe 
> wrote:

I think it makes sense to have a 2.8 release since there are a
tremendous number of JIRAs in 2.8 that are not in 2.7.  Doing a 3.x
release seems like something we should consider separately since it
would not have the same compatibility guarantees as a 2.8 release.
There's a pretty big delta between trunk and 2.8 as well.

cheers,
Colin

On Sat, Sep 26, 2015 at 1:36 PM, Chris Douglas 
> wrote:
With two active sustaining branches (2.6, 2.7), what would you think
of releasing trunk as 3.x instead of pushing 2.8? There are many new
features (EC, Y1197, etc.), and trunk could be the source of several
alpha/beta releases before we fork the 3.x line. -C

On Sat, Sep 26, 2015 at 12:49 PM, Vinod Vavilapalli
> wrote:
As you may have noted, 2.8.0 got completely derailed what with 2.7.x and the 
unusually long 2.6.1 release.

With 2.6.1 out of the way, and two parallel threads in progress for 2.6.2 and 
2.7.2, it’s time for us to look back at where we are with Hadoop 2.8.

I’ll do a quick survey of where the individual features are and the amount of 
content already present in 2.8 and kick-start 2.8.0 process again.

+Vinod


On Apr 21, 2015, at 2:39 PM, vino...@apache.org 
wrote:

With 2.7.0 out of the way, and with more maintenance releases to stabilize it, 
I propose we start thinking about 2.8.0.

Here's my first cut of the proposal, will update the Roadmap wiki.
- Support 

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Vinod Vavilapalli
I’ll let others comment on specific features.

Regarding the 3.x vs 2.x point, as I noted before on other threads, given all 
the incompatibilities in trunk it will be ways off before users can run their 
production workloads on a 3.x release. Therefore, as I was proposing before, we 
should continue the 2.x lines, but soon get started on rolling out a release 
candidate based off trunk.

Like with 2.8, I’d like to go back and prepare some notes on trunk’s content so 
we can objectively discuss about it.

+Vinod

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Wangda Tan
Thanks to Vinod for starting this discussion!

+1 to add YARN-1197 (container resizing) to 2.8.0, it is end-to-end tested.
I'd prefer to push it as an Alpha feature before wilder testing.

And also agree to call first release of 2.8 as an Alpha release according
to the number of new features / code changes.

Regards,
Wangda

On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli  wrote:

> Agreed on not mixing this with major release discussions.
>
> Okay, I just finished my review of 2.8 content.
>
> A quick summary follows.
>
> Current state of originally planned items
>
>  - Nearly Done / Done and so need to close down quickly
> — Support *both* JDK7 and JDK8 runtimes HADOOP-11090
> — Supporting non-exclusive node-labels: YARN-3214: Done, can push as
> an alpha / beta feature
> — Support priorities across applications within the same queue
> YARN-1963: Can push as an alpha / beta feature
>
>  - Definitely have to move out into 2.9 and beyond
> — Early work for disk and network isolation in YARN: YARN-2139,
> YARN-2140: Early noise, some critical pieces designed, done but not a lot
> of movement of late
> — Classpath isolation for downstream clients HADOOP-11656: Lots of
> chatter a while ago, not much movement of late
> — Support for Erasure Codes in HDFS HDFS-7285<
> https://issues.apache.org/jira/browse/HDFS-7285>: Moved out to 2.9 in the
> interest of stability / bake-in
>
> Non-planned features that went into 2.8.0
>
> — The overall list of new features:
> https://issues.apache.org/jira/issues/?filter=12333994
> — HDFS-6200 Create a separate jar for hdfs-client: Compatible
> improvement - no dimension of alpha/betaness here.
> — HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
> — Stability improvements ready to use:
> — HDFS-8008Support client-side back off when the datanodes are
> congested
> — HDFS-8009Signal congestion on the DataNode
> — YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> most of it anyways. Can push as an alpha feature.
> — YARN-1197 Support changing resources of an allocated container: Can
> push as an alpha/beta feature
>
> Items in progress to think about in 2 weeks
>
> — YARN Timeline Service v1.5 - YARN-4233: A short term bridge before
> YARN-2928 comes around. I think this should go in given the tremendous
> activity recently.
> — YARN Timeline Service Next generation: YARN-2928: Lots of momentum,
> but clearly a work in progress. Two options here
> — If it is safe to ship it into 2.8 in a disable manner, we can
> get the early code into trunk and all the way int o2.8.
> — If it is not safe, it organically rolls over into 2.9
> — Compatibility tools to catch backwards, forwards compatibility
> issues at patch submission, release times. Some of it is captured at
> YARN-3292. This also involves resurrecting jdiff
> (HADOOP-11776/YARN-3426/MAPREDUCE-6310) and/or investing in new tools.
>
> This is my plan of action for now in terms of the release itself
>
>  - Cut a branch about two weeks from now
>  - Do an RC mid next month (leaving ~4weeks since branch-cut)
>  - As with 2.7.x series, the first release will still be called as early /
> alpha release in the interest of
> — gaining downstream adoption
> — wider testing,
> — yet reserving our right to fix any inadvertent incompatibilities
> introduced.
>
> If we can get answers on “Items to think about now” during this and next
> week, we will overall be in good shape.
>
> Thoughts?
>
> Thanks
> +Vinod
> PS:As you may have noted above, this time around, I want to do something
> that we’ve always wanted to do, but never explicitly did. I’m calling out
> readiness of each feature as they stand today so we can inform our users
> better of what they can start relying on in production clusters.
>
>
> On Oct 5, 2015, at 11:53 AM, Colin P. McCabe > wrote:
>
> I think it makes sense to have a 2.8 release since there are a
> tremendous number of JIRAs in 2.8 that are not in 2.7.  Doing a 3.x
> release seems like something we should consider separately since it
> would not have the same compatibility guarantees as a 2.8 release.
> There's a pretty big delta between trunk and 2.8 as well.
>
> cheers,
> Colin
>
> On Sat, Sep 26, 2015 at 1:36 PM, Chris Douglas  > wrote:
> With two active sustaining branches (2.6, 2.7), what would you think
> of releasing trunk as 3.x instead of pushing 2.8? There are many new
> features (EC, Y1197, etc.), and trunk could be the source of several
> alpha/beta releases before we fork the 3.x line. -C
>
> On Sat, Sep 26, 2015 at 12:49 PM, Vinod Vavilapalli
> > wrote:
> As you may have noted, 2.8.0 got completely derailed what with 2.7.x and
> the unusually long 2.6.1 release.
>
> With 2.6.1 out of 

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Allen Wittenauer

> On Nov 11, 2015, at 12:13 PM, Vinod Vavilapalli  
> wrote:
> 
>— HDFS-6200 Create a separate jar for hdfs-client: Compatible improvement 
> - no dimension of alpha/betaness here.

IMO: this feels like a massive break in backwards compatibility. Anyone 
who is looking for specific methods in specific jars are going to have a bad 
time. Also, it seems as though every week a new issue crops up that is related 
to this change.  Is Slider still having problems with it?  The reasoning “well, 
the pom sets the dependencies so it’s ok” feels like an *extremely weak* reason 
this wasn’t marked incompatible— it basically makes the assumption that 
everyone recompiles for every minor release.

>— Compatibility tools to catch backwards, forwards compatibility issues at 
> patch submission, release times. Some of it is captured at YARN-3292. This 
> also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310) 
> and/or investing in new tools.

There has been talk in the past about adding Java ACC support to Yetus.

> Thoughts?

I’d rather see efforts on 3.x than another disastrous 2.x release.  The 
track record is not good.  At least a new major will signify that danger looms 
ahead.  We’re already treating 2.x minor releases as effectively major (see the 
list of incompatible JIRAs) so what different does it make if we do 2.x vs. 3.x 
anyway?

> 
> Thanks
> +Vinod
> PS:As you may have noted above, this time around, I want to do something that 
> we’ve always wanted to do, but never explicitly did. I’m calling out 
> readiness of each feature as they stand today so we can inform our users 
> better of what they can start relying on in production clusters.

… except some of these changes are so deep reaching that even if you 
don’t use the feature, you’re still impacted by it ...




Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Li Lu


On Nov 11, 2015, at 12:13, Vinod Vavilapalli 
> wrote:

   — YARN Timeline Service v1.5 - YARN-4233: A short term bridge before 
YARN-2928 comes around. I think this should go in given the tremendous activity 
recently.

+1, let’s target ATS v1.5 work to 2.8. Most critical patches will be ready for 
review this week. YARN-4234 also addresses the versioning problem for ATS v1, 
v1.5, and v2, which will bridge the gap between v1 and v2.

   — YARN Timeline Service Next generation: YARN-2928: Lots of momentum, but 
clearly a work in progress. Two options here
   — If it is safe to ship it into 2.8 in a disable manner, we can get the 
early code into trunk and all the way int o2.8.

Let’s review all changes in YARN-2928 branch and evaluate their impact on the 
upstream code. Right now in YARN-2928 branch we’ve modified RM/NM/Distributed 
shell and MR. Before we make a decision on this I think we need to review their 
effects so that ATS v2 (in its alpha mode) is not affecting existing parts.

   — If it is not safe, it organically rolls over into 2.9

Sure.



Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Allen Wittenauer

> On Nov 11, 2015, at 1:11 PM, Vinod Vavilapalli  
> wrote:
> 
> I’ll let others comment on specific features.
> 
> Regarding the 3.x vs 2.x point, as I noted before on other threads, given all 
> the incompatibilities in trunk it will be ways off before users can run their 
> production workloads on a 3.x release.

[citation needed]

Seriously. Back that statement up especially in light of there having 
been more incompatibilities in all the 2.x releases combined than in 3.x. 

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Haohui Mai
bq.  it basically makes the assumption that everyone recompiles for
every minor release.

I don't think that the statement holds. HDFS-6200 keeps classes in the
same package. hdfs-client becomes a transitive dependency of the
original hdfs jar.

Applications continue to work without recompilation as the classes
will be in the same name and will be available in the classpath. They
have the option of switching to depending only on hdfs-client to
minimize the dependency when they are comfortable.

I'm not claiming that there are no bugs in HDFS-6200, but just like
other features we discover bugs and fix them continuously.

~Haohui


On Wed, Nov 11, 2015 at 12:43 PM, Allen Wittenauer  wrote:
>
>> On Nov 11, 2015, at 12:13 PM, Vinod Vavilapalli  
>> wrote:
>>
>>— HDFS-6200 Create a separate jar for hdfs-client: Compatible improvement 
>> - no dimension of alpha/betaness here.
>
> IMO: this feels like a massive break in backwards compatibility. 
> Anyone who is looking for specific methods in specific jars are going to have 
> a bad time. Also, it seems as though every week a new issue crops up that is 
> related to this change.  Is Slider still having problems with it?  The 
> reasoning “well, the pom sets the dependencies so it’s ok” feels like an 
> *extremely weak* reason this wasn’t marked incompatible— it basically makes 
> the assumption that everyone recompiles for every minor release.
>
>>— Compatibility tools to catch backwards, forwards compatibility issues 
>> at patch submission, release times. Some of it is captured at YARN-3292. 
>> This also involves resurrecting jdiff 
>> (HADOOP-11776/YARN-3426/MAPREDUCE-6310) and/or investing in new tools.
>
> There has been talk in the past about adding Java ACC support to 
> Yetus.
>
>> Thoughts?
>
> I’d rather see efforts on 3.x than another disastrous 2.x release.  
> The track record is not good.  At least a new major will signify that danger 
> looms ahead.  We’re already treating 2.x minor releases as effectively major 
> (see the list of incompatible JIRAs) so what different does it make if we do 
> 2.x vs. 3.x anyway?
>
>>
>> Thanks
>> +Vinod
>> PS:As you may have noted above, this time around, I want to do something 
>> that we’ve always wanted to do, but never explicitly did. I’m calling out 
>> readiness of each feature as they stand today so we can inform our users 
>> better of what they can start relying on in production clusters.
>
> … except some of these changes are so deep reaching that even if you 
> don’t use the feature, you’re still impacted by it ...
>
>


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Allen Wittenauer

> On Nov 11, 2015, at 2:15 PM, Haohui Mai  wrote:
> 
> bq.  it basically makes the assumption that everyone recompiles for
> every minor release.
> 
> I don't think that the statement holds. HDFS-6200 keeps classes in the
> same package. hdfs-client becomes a transitive dependency of the
> original hdfs jar.
> 
> Applications continue to work without recompilation as the classes
> will be in the same name and will be available in the class path.

If and only if they take the Hadoop class path at face value.  Many 
applications don’t because of conflicting dependencies and instead import 
specific jars.

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Steve Loughran

> On 11 Nov 2015, at 22:15, Haohui Mai  wrote:
> 
> bq.  it basically makes the assumption that everyone recompiles for
> every minor release.
> 
> I don't think that the statement holds. HDFS-6200 keeps classes in the
> same package. hdfs-client becomes a transitive dependency of the
> original hdfs jar.
> 
> Applications continue to work without recompilation as the classes
> will be in the same name and will be available in the classpath. They
> have the option of switching to depending only on hdfs-client to
> minimize the dependency when they are comfortable.
> 
> I'm not claiming that there are no bugs in HDFS-6200, but just like
> other features we discover bugs and fix them continuously.
> 
> ~Haohui
> 

currently pulling in hadoop-client gives downstream apps hadoop-hdfs-client, 
but not hadoop-hdfs server side, right?


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Haohui Mai
bq. currently pulling in hadoop-client gives downstream apps
hadoop-hdfs-client, but not hadoop-hdfs server side, right?

Right now hadoop-client pulls in hadoop-hdfs directly to ensure a
smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x?

On Wed, Nov 11, 2015 at 3:00 PM, Steve Loughran  wrote:
>
>> On 11 Nov 2015, at 22:15, Haohui Mai  wrote:
>>
>> bq.  it basically makes the assumption that everyone recompiles for
>> every minor release.
>>
>> I don't think that the statement holds. HDFS-6200 keeps classes in the
>> same package. hdfs-client becomes a transitive dependency of the
>> original hdfs jar.
>>
>> Applications continue to work without recompilation as the classes
>> will be in the same name and will be available in the classpath. They
>> have the option of switching to depending only on hdfs-client to
>> minimize the dependency when they are comfortable.
>>
>> I'm not claiming that there are no bugs in HDFS-6200, but just like
>> other features we discover bugs and fix them continuously.
>>
>> ~Haohui
>>
>
> currently pulling in hadoop-client gives downstream apps hadoop-hdfs-client, 
> but not hadoop-hdfs server side, right?


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Haohui Mai
bq. If and only if they take the Hadoop class path at face value.
Many applications don’t because of conflicting dependencies and
instead import specific jars.

We do make the assumptions that applications need to pick up all the
dependency (either automatically or manually). The situation is
similar with adding a new dependency into hdfs in a minor release.

Maven / gradle obviously help, but I'd love to hear more about it how
you get it to work. In trunk hadoop-env.sh adds 118 jars into the
class path. Are you manually importing 118 jars for every single
applications?



On Wed, Nov 11, 2015 at 3:09 PM, Haohui Mai  wrote:
> bq. currently pulling in hadoop-client gives downstream apps
> hadoop-hdfs-client, but not hadoop-hdfs server side, right?
>
> Right now hadoop-client pulls in hadoop-hdfs directly to ensure a
> smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x?
>
> On Wed, Nov 11, 2015 at 3:00 PM, Steve Loughran  
> wrote:
>>
>>> On 11 Nov 2015, at 22:15, Haohui Mai  wrote:
>>>
>>> bq.  it basically makes the assumption that everyone recompiles for
>>> every minor release.
>>>
>>> I don't think that the statement holds. HDFS-6200 keeps classes in the
>>> same package. hdfs-client becomes a transitive dependency of the
>>> original hdfs jar.
>>>
>>> Applications continue to work without recompilation as the classes
>>> will be in the same name and will be available in the classpath. They
>>> have the option of switching to depending only on hdfs-client to
>>> minimize the dependency when they are comfortable.
>>>
>>> I'm not claiming that there are no bugs in HDFS-6200, but just like
>>> other features we discover bugs and fix them continuously.
>>>
>>> ~Haohui
>>>
>>
>> currently pulling in hadoop-client gives downstream apps hadoop-hdfs-client, 
>> but not hadoop-hdfs server side, right?


Re: [DISCUSS] Looking to a 2.8.0 release

2015-10-05 Thread Colin P. McCabe
I think it makes sense to have a 2.8 release since there are a
tremendous number of JIRAs in 2.8 that are not in 2.7.  Doing a 3.x
release seems like something we should consider separately since it
would not have the same compatibility guarantees as a 2.8 release.
There's a pretty big delta between trunk and 2.8 as well.

cheers,
Colin

On Sat, Sep 26, 2015 at 1:36 PM, Chris Douglas  wrote:
> With two active sustaining branches (2.6, 2.7), what would you think
> of releasing trunk as 3.x instead of pushing 2.8? There are many new
> features (EC, Y1197, etc.), and trunk could be the source of several
> alpha/beta releases before we fork the 3.x line. -C
>
> On Sat, Sep 26, 2015 at 12:49 PM, Vinod Vavilapalli
>  wrote:
>> As you may have noted, 2.8.0 got completely derailed what with 2.7.x and the 
>> unusually long 2.6.1 release.
>>
>> With 2.6.1 out of the way, and two parallel threads in progress for 2.6.2 
>> and 2.7.2, it’s time for us to look back at where we are with Hadoop 2.8.
>>
>> I’ll do a quick survey of where the individual features are and the amount 
>> of content already present in 2.8 and kick-start 2.8.0 process again.
>>
>> +Vinod
>>
>>
>>> On Apr 21, 2015, at 2:39 PM, vino...@apache.org wrote:
>>>
>>> With 2.7.0 out of the way, and with more maintenance releases to stabilize 
>>> it, I propose we start thinking about 2.8.0.
>>>
>>> Here's my first cut of the proposal, will update the Roadmap wiki.
>>>  - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
>>>  - Compatibility tools to catch backwards, forwards compatibility issues at 
>>> patch submission, release times. Some of it is captured at YARN-3292. This 
>>> also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310) 
>>> and/or investing in new tools.
>>>  - HADOOP-11656 Classpath isolation for downstream clients
>>>  - Support for Erasure Codes in HDFS HDFS-7285
>>>  - Early work for disk and network isolation in YARN: YARN-2139, YARN-2140
>>>  - YARN Timeline Service Next generation: YARN-2928. At least branch-merge 
>>> + early peek.
>>>  - Supporting non-exclusive node-labels: YARN-3214
>>>
>>> I'm experimenting with more agile 2.7.x releases and would like to continue 
>>> the same by volunteering as the RM for 2.8.x too.
>>>
>>> Given the long time we took with 2.7.0, the timeline I am looking at is 
>>> 8-12 weeks. We can pick as many features as they finish along and make a 
>>> more predictable releases instead of holding up releases for ever.
>>>
>>> Thoughts?
>>>
>>> Thanks
>>> +Vinod
>>


Re: [DISCUSS] Looking to a 2.8.0 release

2015-09-26 Thread Chris Douglas
With two active sustaining branches (2.6, 2.7), what would you think
of releasing trunk as 3.x instead of pushing 2.8? There are many new
features (EC, Y1197, etc.), and trunk could be the source of several
alpha/beta releases before we fork the 3.x line. -C

On Sat, Sep 26, 2015 at 12:49 PM, Vinod Vavilapalli
 wrote:
> As you may have noted, 2.8.0 got completely derailed what with 2.7.x and the 
> unusually long 2.6.1 release.
>
> With 2.6.1 out of the way, and two parallel threads in progress for 2.6.2 and 
> 2.7.2, it’s time for us to look back at where we are with Hadoop 2.8.
>
> I’ll do a quick survey of where the individual features are and the amount of 
> content already present in 2.8 and kick-start 2.8.0 process again.
>
> +Vinod
>
>
>> On Apr 21, 2015, at 2:39 PM, vino...@apache.org wrote:
>>
>> With 2.7.0 out of the way, and with more maintenance releases to stabilize 
>> it, I propose we start thinking about 2.8.0.
>>
>> Here's my first cut of the proposal, will update the Roadmap wiki.
>>  - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
>>  - Compatibility tools to catch backwards, forwards compatibility issues at 
>> patch submission, release times. Some of it is captured at YARN-3292. This 
>> also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310) 
>> and/or investing in new tools.
>>  - HADOOP-11656 Classpath isolation for downstream clients
>>  - Support for Erasure Codes in HDFS HDFS-7285
>>  - Early work for disk and network isolation in YARN: YARN-2139, YARN-2140
>>  - YARN Timeline Service Next generation: YARN-2928. At least branch-merge + 
>> early peek.
>>  - Supporting non-exclusive node-labels: YARN-3214
>>
>> I'm experimenting with more agile 2.7.x releases and would like to continue 
>> the same by volunteering as the RM for 2.8.x too.
>>
>> Given the long time we took with 2.7.0, the timeline I am looking at is 8-12 
>> weeks. We can pick as many features as they finish along and make a more 
>> predictable releases instead of holding up releases for ever.
>>
>> Thoughts?
>>
>> Thanks
>> +Vinod
>


Re: [DISCUSS] Looking to a 2.8.0 release

2015-04-22 Thread Vinod Kumar Vavilapalli
+dev lists.

Forgot about that, sure. I added this and my initial list to the Roadmap wiki.

Thank
+Vinod

On Apr 21, 2015, at 9:34 PM, Rohith Sharma K S rohithsharm...@huawei.com 
wrote:

 Dear Vinod
 
   Regarding the road map of Hadoop-2.8.0, Can basic Application priority 
 working model includes in next release?
 
 Thanks  Regards
 Rohith Sharma K S
 
 -Original Message-
 From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] 
 Sent: 22 April 2015 03:09
 To: common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; 
 hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
 Cc: vino...@apache.org
 Subject: [DISCUSS] Looking to a 2.8.0 release
 
 With 2.7.0 out of the way, and with more maintenance releases to stabilize 
 it, I propose we start thinking about 2.8.0.
 
 Here's my first cut of the proposal, will update the Roadmap wiki.
 - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
 - Compatibility tools to catch backwards, forwards compatibility issues at 
 patch submission, release times. Some of it is captured at YARN-3292. This 
 also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310)
 and/or investing in new tools.
 - HADOOP-11656 Classpath isolation for downstream clients
 - Support for Erasure Codes in HDFS HDFS-7285
 - Early work for disk and network isolation in YARN: YARN-2139, YARN-2140
 - YARN Timeline Service Next generation: YARN-2928. At least branch-merge
 + early peek.
 - Supporting non-exclusive node-labels: YARN-3214
 
 I'm experimenting with more agile 2.7.x releases and would like to continue 
 the same by volunteering as the RM for 2.8.x too.
 
 Given the long time we took with 2.7.0, the timeline I am looking at is
 8-12 weeks. We can pick as many features as they finish along and make a more 
 predictable releases instead of holding up releases for ever.
 
 Thoughts?
 
 Thanks
 +Vinod



Re: [DISCUSS] Looking to a 2.8.0 release

2015-04-21 Thread Karthik Kambatla
The feature set here seems pretty long, even for 2 - 3 months. Can we come
up with a minimum set of features (or a number of features) that justify a
new minor release, and start stabilizing as soon as those are in?

On Tue, Apr 21, 2015 at 2:39 PM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 With 2.7.0 out of the way, and with more maintenance releases to stabilize
 it, I propose we start thinking about 2.8.0.

 Here's my first cut of the proposal, will update the Roadmap wiki.
  - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
  - Compatibility tools to catch backwards, forwards compatibility issues at
 patch submission, release times. Some of it is captured at YARN-3292. This
 also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310)
 and/or investing in new tools.
  - HADOOP-11656 Classpath isolation for downstream clients
  - Support for Erasure Codes in HDFS HDFS-7285
  - Early work for disk and network isolation in YARN: YARN-2139, YARN-2140
  - YARN Timeline Service Next generation: YARN-2928. At least branch-merge
 + early peek.
  - Supporting non-exclusive node-labels: YARN-3214

 I'm experimenting with more agile 2.7.x releases and would like to continue
 the same by volunteering as the RM for 2.8.x too.

 Given the long time we took with 2.7.0, the timeline I am looking at is
 8-12 weeks. We can pick as many features as they finish along and make a
 more predictable releases instead of holding up releases for ever.

 Thoughts?

 Thanks
 +Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: [DISCUSS] Looking to a 2.8.0 release

2015-04-21 Thread Andrew Wang
I would also like to support Karthik's proposal on the release thread about
version numbering. 2.7.0 being alpha is pretty confusing since all of the
other 2.x releases since GA have been stable. I think users would prefer a
version like 2.8.0-alpha1 instead, which is very clear and similar to
what we did for 2.0 and 2.1. Then we release 2.8.0 when we're actually
stable.

I don't know if it's retroactively possible to do this for 2.7.0, but it's
something to consider for the next 2.7 alpha or beta or whatever.

On Tue, Apr 21, 2015 at 3:12 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 The feature set here seems pretty long, even for 2 - 3 months. Can we come
 up with a minimum set of features (or a number of features) that justify a
 new minor release, and start stabilizing as soon as those are in?

 On Tue, Apr 21, 2015 at 2:39 PM, Vinod Kumar Vavilapalli 
 vino...@apache.org
  wrote:

  With 2.7.0 out of the way, and with more maintenance releases to
 stabilize
  it, I propose we start thinking about 2.8.0.
 
  Here's my first cut of the proposal, will update the Roadmap wiki.
   - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
   - Compatibility tools to catch backwards, forwards compatibility issues
 at
  patch submission, release times. Some of it is captured at YARN-3292.
 This
  also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310)
  and/or investing in new tools.
   - HADOOP-11656 Classpath isolation for downstream clients
   - Support for Erasure Codes in HDFS HDFS-7285
   - Early work for disk and network isolation in YARN: YARN-2139,
 YARN-2140
   - YARN Timeline Service Next generation: YARN-2928. At least
 branch-merge
  + early peek.
   - Supporting non-exclusive node-labels: YARN-3214
 
  I'm experimenting with more agile 2.7.x releases and would like to
 continue
  the same by volunteering as the RM for 2.8.x too.
 
  Given the long time we took with 2.7.0, the timeline I am looking at is
  8-12 weeks. We can pick as many features as they finish along and make a
  more predictable releases instead of holding up releases for ever.
 
  Thoughts?
 
  Thanks
  +Vinod
 



 --
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es



Re: [DISCUSS] Looking to a 2.8.0 release

2015-04-21 Thread Vinod Kumar Vavilapalli
Thanks for your comment Karthik. Here's my take.

Feature based release is going to put us back into the same prolonged release 
cycle. That is why I am proposing that we look at 2-3 months time and get 
whatever is ready. If we don't have even a single feature in by then, clearly 
we can drop the timeline.

I can understanding the initial reaction that the list is long, but some of the 
features in the list are just early milestones - disk/network, Timeline Service 
next-gen and not stable functionality. Related to that, I want us to start 
thinking about alpha'ness of features. Will start a separate thread for that.

Thanks
+Vinod

On Apr 21, 2015, at 3:12 PM, Karthik Kambatla 
ka...@cloudera.commailto:ka...@cloudera.com wrote:

The feature set here seems pretty long, even for 2 - 3 months. Can we come up 
with a minimum set of features (or a number of features) that justify a new 
minor release, and start stabilizing as soon as those are in?

On Tue, Apr 21, 2015 at 2:39 PM, Vinod Kumar Vavilapalli 
vino...@apache.orgmailto:vino...@apache.org wrote:
With 2.7.0 out of the way, and with more maintenance releases to stabilize
it, I propose we start thinking about 2.8.0.

Here's my first cut of the proposal, will update the Roadmap wiki.
 - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
 - Compatibility tools to catch backwards, forwards compatibility issues at
patch submission, release times. Some of it is captured at YARN-3292. This
also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310)
and/or investing in new tools.
 - HADOOP-11656 Classpath isolation for downstream clients
 - Support for Erasure Codes in HDFS HDFS-7285
 - Early work for disk and network isolation in YARN: YARN-2139, YARN-2140
 - YARN Timeline Service Next generation: YARN-2928. At least branch-merge
+ early peek.
 - Supporting non-exclusive node-labels: YARN-3214

I'm experimenting with more agile 2.7.x releases and would like to continue
the same by volunteering as the RM for 2.8.x too.

Given the long time we took with 2.7.0, the timeline I am looking at is
8-12 weeks. We can pick as many features as they finish along and make a
more predictable releases instead of holding up releases for ever.

Thoughts?

Thanks
+Vinod



--
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.eshttp://five.sentenc.es/




Re: [DISCUSS] Looking to a 2.8.0 release

2015-04-21 Thread Vinod Kumar Vavilapalli

Sure, I agree it's better to have clear guidelines and scheme. Let me fork this 
thread about that.

Re 2.7.0, I just forgot about the naming initially though I was clear in the 
discussion/voting. I so had to end up calling it alpha outside of the release 
artifact naming.

Thanks
+Vinod

On Apr 21, 2015, at 4:26 PM, Andrew Wang andrew.w...@cloudera.com wrote:

 I would also like to support Karthik's proposal on the release thread about
 version numbering. 2.7.0 being alpha is pretty confusing since all of the
 other 2.x releases since GA have been stable. I think users would prefer a
 version like 2.8.0-alpha1 instead, which is very clear and similar to
 what we did for 2.0 and 2.1. Then we release 2.8.0 when we're actually
 stable.
 
 I don't know if it's retroactively possible to do this for 2.7.0, but it's
 something to consider for the next 2.7 alpha or beta or whatever.
 
 On Tue, Apr 21, 2015 at 3:12 PM, Karthik Kambatla ka...@cloudera.com
 wrote:
 
 The feature set here seems pretty long, even for 2 - 3 months. Can we come
 up with a minimum set of features (or a number of features) that justify a
 new minor release, and start stabilizing as soon as those are in?
 
 On Tue, Apr 21, 2015 at 2:39 PM, Vinod Kumar Vavilapalli 
 vino...@apache.org
 wrote:
 
 With 2.7.0 out of the way, and with more maintenance releases to
 stabilize
 it, I propose we start thinking about 2.8.0.
 
 Here's my first cut of the proposal, will update the Roadmap wiki.
 - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
 - Compatibility tools to catch backwards, forwards compatibility issues
 at
 patch submission, release times. Some of it is captured at YARN-3292.
 This
 also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310)
 and/or investing in new tools.
 - HADOOP-11656 Classpath isolation for downstream clients
 - Support for Erasure Codes in HDFS HDFS-7285
 - Early work for disk and network isolation in YARN: YARN-2139,
 YARN-2140
 - YARN Timeline Service Next generation: YARN-2928. At least
 branch-merge
 + early peek.
 - Supporting non-exclusive node-labels: YARN-3214
 
 I'm experimenting with more agile 2.7.x releases and would like to
 continue
 the same by volunteering as the RM for 2.8.x too.
 
 Given the long time we took with 2.7.0, the timeline I am looking at is
 8-12 weeks. We can pick as many features as they finish along and make a
 more predictable releases instead of holding up releases for ever.
 
 Thoughts?
 
 Thanks
 +Vinod
 
 
 
 
 --
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es