Jenkins build is back to normal : Hadoop-Common-trunk #1424

2015-03-04 Thread Apache Jenkins Server
See 



Re: Looking to a Hadoop 3 release

2015-03-04 Thread Andrew Wang
Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy  wrote:

> Awesome, looks like we can just do this in a compatible manner - nothing
> else on the list seems like it warrants a (premature) major release.
>
> Thanks Vinod.
>
> Arun
>
> 
> From: Vinod Kumar Vavilapalli 
> Sent: Tuesday, March 03, 2015 2:30 PM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: Looking to a Hadoop 3 release
>
> I started pitching in more on that JIRA.
>
> To add, I think we can and should strive for doing this in a compatible
> manner, whatever the approach. Marking and calling it incompatible before
> we see proposal/patch seems premature to me. Commented the same on JIRA:
> https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
> .
>
> Thanks
> +Vinod
>
> On Mar 2, 2015, at 8:08 PM, Andrew Wang  andrew.w...@cloudera.com>> wrote:
>
> Regarding classpath isolation, based on what I hear from our customers,
> it's still a big problem (even after the MR classloader work). The latest
> Jackson version bump was quite painful for our downstream projects, and the
> HDFS client still leaks a lot of dependencies. Would welcome more
> discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
> chimed in.
>
>


Re: Looking to a Hadoop 3 release

2015-03-04 Thread Stack
In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>


Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Vinod Kumar Vavilapalli
Hi all,

I'd like us to revive the effort at 
https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit builds 
being able to work with branches. Having the Jenkins verify patches on branches 
is very useful even if there may be relaxed review oversight on the said-branch.

Unless there are objections, I'd request help from Giri who already has a patch 
sitting there for more than a year before. This may need us to collectively 
agree on some convention - the last comment says that the branch patch name 
should be in some format for this to work.

Thanks,
+Vinod


Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Karthik Kambatla
Thanks for reviving this on email, Vinod. Newer folks like me might not be
aware of this JIRA/effort.

This would be wonderful to have so (1) we know the status of release
branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
Jonathan's or Matt's proposal for including branch name looks reasonable to
me.

If none has any objections, I think we can continue on JIRA and get this
in.

On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> Hi all,
>
> I'd like us to revive the effort at
> https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
> builds being able to work with branches. Having the Jenkins verify patches
> on branches is very useful even if there may be relaxed review oversight on
> the said-branch.
>
> Unless there are objections, I'd request help from Giri who already has a
> patch sitting there for more than a year before. This may need us to
> collectively agree on some convention - the last comment says that the
> branch patch name should be in some format for this to work.
>
> Thanks,
> +Vinod
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Sean Busbey
+1

If we can make things look like HBase support for precommit testing on
branches (HBASE-12944), that would make it easier for new and occasional
contributors who might end up working in other ecosystem projects. AFAICT,
Jonathan's proposal for branch names in patch names does this.



On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla  wrote:

> Thanks for reviving this on email, Vinod. Newer folks like me might not be
> aware of this JIRA/effort.
>
> This would be wonderful to have so (1) we know the status of release
> branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
> Jonathan's or Matt's proposal for including branch name looks reasonable to
> me.
>
> If none has any objections, I think we can continue on JIRA and get this
> in.
>
> On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli <
> vino...@hortonworks.com> wrote:
>
> > Hi all,
> >
> > I'd like us to revive the effort at
> > https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
> > builds being able to work with branches. Having the Jenkins verify
> patches
> > on branches is very useful even if there may be relaxed review oversight
> on
> > the said-branch.
> >
> > Unless there are objections, I'd request help from Giri who already has a
> > patch sitting there for more than a year before. This may need us to
> > collectively agree on some convention - the last comment says that the
> > branch patch name should be in some format for this to work.
> >
> > Thanks,
> > +Vinod
> >
>
>
>
> --
> Karthik Kambatla
> Software Engineer, Cloudera Inc.
> 
> http://five.sentenc.es
>



-- 
Sean


[jira] [Reopened] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened HADOOP-11668:
---
  Assignee: Allen Wittenauer  (was: Vinayakumar B)

Re-opening.  The problem here isn't start/stop, it's *-daemons.sh, which are 
now broken.

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Allen Wittenauer
> Attachments: HADOOP-11668-01.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)

2015-03-04 Thread Adam Budde (JIRA)
Adam Budde created HADOOP-11670:
---

 Summary: Fix IAM instance profile auth for s3a (broken in 
HADOOP-11446)
 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). 

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: 2.7 status

2015-03-04 Thread Zheng, Kai
Thanks Vinod for the hints. 

I have updated the both patches aligning with latest codes, and added more unit 
tests. The building results look reasonable. Thanks anyone that would give them 
more review and I would update in timely manner. 

Regards,
Kai

-Original Message-
From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] 
Sent: Tuesday, March 03, 2015 11:31 AM
To: Zheng, Kai
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; Hadoop Common; 
yarn-...@hadoop.apache.org
Subject: Re: 2.7 status

Kai, please ping the reviewers that were already looking at your patches 
before. If the patches go in by end of this week, we can include them.

Thanks,
+Vinod

On Mar 2, 2015, at 7:04 PM, Zheng, Kai  wrote:

> Is it interested to get the following issues in the release ? Thanks !
> 
> HADOOP-10670
> HADOOP-10671
> 
> Regards,
> Kai
> 
> -Original Message-
> From: Yongjun Zhang [mailto:yzh...@cloudera.com]
> Sent: Monday, March 02, 2015 4:46 AM
> To: hdfs-...@hadoop.apache.org
> Cc: Vinod Kumar Vavilapalli; Hadoop Common; 
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: 2.7 status
> 
> Hi,
> 
> Thanks for working on 2.7 release.
> 
> Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
> enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
> requires applications (such as oozie) to set a config property or call an API 
> to enable the fallback.
> 
> This jira has been reviewed, and "almost" ready to get in. However, there is 
> a concern that we have to change the relevant applications. Please see my 
> comment here:
> 
> https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14
> 321823&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
> bpanel#comment-14321823
> 
> Any of your comments will be highly appreciated. This jira was postponed from 
> 2.6. I think it should be no problem to skip 2.7. But your comments would 
> help us to decide what to do with this jira for future releases.
> 
> Thanks.
> 
> --Yongjun
> 
> 
> On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy  wrote:
> 
>> Sounds good, thanks for the help Vinod!
>> 
>> Arun
>> 
>> 
>> From: Vinod Kumar Vavilapalli
>> Sent: Sunday, March 01, 2015 11:43 AM
>> To: Hadoop Common; Jason Lowe; Arun Murthy
>> Subject: Re: 2.7 status
>> 
>> Agreed. How about we roll an RC end of this week? As a Java 7+ 
>> release with features, patches that already got in?
>> 
>> Here's a filter tracking blocker tickets - 
>> https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.
>> 
>> +Arun
>> Arun, I'd like to help get 2.7 out without further delay. Do you mind 
>> me taking over release duties?
>> 
>> Thanks,
>> +Vinod
>> 
>> From: Jason Lowe 
>> Sent: Friday, February 13, 2015 8:11 AM
>> To: common-dev@hadoop.apache.org
>> Subject: Re: 2.7 status
>> 
>> I'd like to see a 2.7 release sooner than later.  It has been almost 
>> 3 months since Hadoop 2.6 was released, and there have already been 
>> 634 JIRAs committed to 2.7.  That's a lot of changes waiting for an official 
>> release.
>> 
>> https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2
>> C 
>> hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolut
>> i
>> on%3DFixed
>> Jason
>> 
>>  From: Sangjin Lee 
>> To: "common-dev@hadoop.apache.org" 
>> Sent: Tuesday, February 10, 2015 1:30 PM
>> Subject: 2.7 status
>> 
>> Folks,
>> 
>> What is the current status of the 2.7 release? I know initially it 
>> started out as a "java-7" only release, but looking at the JIRAs that 
>> is very much not the case.
>> 
>> Do we have a certain timeframe for 2.7 or is it time to discuss it?
>> 
>> Thanks,
>> Sangjin
>> 
>> 



Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Zhijie Shen
+1. It¹s really helpful for branch development. To continue Karthik¹s
point, is it good make pre-commit testing against branch-2 as the default
too like that against trunk?

On 3/4/15, 1:47 PM, "Sean Busbey"  wrote:

>+1
>
>If we can make things look like HBase support for precommit testing on
>branches (HBASE-12944), that would make it easier for new and occasional
>contributors who might end up working in other ecosystem projects. AFAICT,
>Jonathan's proposal for branch names in patch names does this.
>
>
>
>On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla 
>wrote:
>
>> Thanks for reviving this on email, Vinod. Newer folks like me might not
>>be
>> aware of this JIRA/effort.
>>
>> This would be wonderful to have so (1) we know the status of release
>> branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
>> Jonathan's or Matt's proposal for including branch name looks
>>reasonable to
>> me.
>>
>> If none has any objections, I think we can continue on JIRA and get this
>> in.
>>
>> On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli <
>> vino...@hortonworks.com> wrote:
>>
>> > Hi all,
>> >
>> > I'd like us to revive the effort at
>> > https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
>> > builds being able to work with branches. Having the Jenkins verify
>> patches
>> > on branches is very useful even if there may be relaxed review
>>oversight
>> on
>> > the said-branch.
>> >
>> > Unless there are objections, I'd request help from Giri who already
>>has a
>> > patch sitting there for more than a year before. This may need us to
>> > collectively agree on some convention - the last comment says that the
>> > branch patch name should be in some format for this to work.
>> >
>> > Thanks,
>> > +Vinod
>> >
>>
>>
>>
>> --
>> Karthik Kambatla
>> Software Engineer, Cloudera Inc.
>> 
>> http://five.sentenc.es
>>
>
>
>
>-- 
>Sean



RE: Looking to a Hadoop 3 release

2015-03-04 Thread Zheng, Kai
Might I have some comments for this, just providing my thought. Thanks.

>> If we start now, it might make it out by 2016. If we start now, 
>> downstreamers can start aligning themselves to land versions that suit at 
>> about the same time.
Not only for down streamers to align with the long term release, but also for 
contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more 
possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used 
in the same Java application/process without conflicts, providing good 
isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, 
manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a 
strong dedicated and clean Kerberos library in Java for both client and KDC 
sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 
2016. If we start now, downstreamers can start aligning themselves to land 
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and 
there is ongoing discussion as to whether they are or not*, is there any chance 
of getting a longer list of big differences between the branches? In particular 
I'd be interested in improvements that are 'off' by default that would be 
better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept 
seemingly open to interpretation with a definition that is other than prevails 
elsewhere in software. See Allen's list above, and in our downstream project, 
the recent HBASE-13149 "HBase server MR tools are broken on Hadoop 2.5+ Yarn", 
among others.  Let 3.x be incompatible with 2.x if only so we can leave behind 
all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about 
> due for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that 
> will have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been 
> a long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to 
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
> months from now). In the past, we've had issues with our dependencies 
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and 
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish 
> series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
> and other cat herding responsibilities. There are already quite a few 
> changes slated for 3.0 besides the above (for instance the shell 
> script rewrite) so there's already value in a 3.0 alpha, and the more 
> time we give downstreams to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm 
> hoping to freeze incompatible changes after maybe two alphas, do a 
> beta (with no further incompat changes allowed), and then finally a 
> 3.x GA. For those keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a 
> big bang release. For instance, it would be great if we could maintain 
> wire compatibility between 2.x and 3.x, so rolling upgrades work. 
> Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're 
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If 
> people are friendly to the idea, I'd like to cut a branch-3 and start 
> working on the first alpha.
>
> Best,
> Andrew
>


Re: Looking to a Hadoop 3 release

2015-03-04 Thread Karthik Kambatla
On Wed, Mar 4, 2015 at 10:46 AM, Stack  wrote:

> In general +1 on 3.0.0. Its time. If we start now, it might make it out by
> 2016. If we start now, downstreamers can start aligning themselves to land
> versions that suit at about the same time.
>
> While two big items have been called out as possible incompatible changes,
> and there is ongoing discussion as to whether they are or not*, is there
> any chance of getting a longer list of big differences between the
> branches? In particular I'd be interested in improvements that are 'off' by
> default that would be better defaulted 'on'.
>
> Thanks,
> St.Ack
>
> * Let me note that 'compatible' around these parts is a trampled concept
> seemingly open to interpretation with a definition that is other than
> prevails elsewhere in software. See Allen's list above, and in our
> downstream project, the recent HBASE-13149 "HBase server MR tools are
> broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
> 2.x if only so we can leave behind all current notions of 'compatibility'
> and just start over (as per Allen).
>

Unfortunately, our compatibility policies

are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
"compatibility store" more inline with the general expectations.




>
>
> On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
> wrote:
>
> > Hi devs,
> >
> > It's been a year and a half since 2.x went GA, and I think we're about
> due
> > for a 3.x release.
> > Notably, there are two incompatible changes I'd like to call out, that
> will
> > have a tremendous positive impact for our users.
> >
> > First, classpath isolation being done at HADOOP-11656, which has been a
> > long-standing request from many downstreams and Hadoop users.
> >
> > Second, bumping the source and target JDK version to JDK8 (related to
> > HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> > months from now). In the past, we've had issues with our dependencies
> > discontinuing support for old JDKs, so this will future-proof us.
> >
> > Between the two, we'll also have quite an opportunity to clean up and
> > upgrade our dependencies, another common user and developer request.
> >
> > I'd like to propose that we start rolling a series of monthly-ish series
> of
> > 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> > other cat herding responsibilities. There are already quite a few changes
> > slated for 3.0 besides the above (for instance the shell script rewrite)
> so
> > there's already value in a 3.0 alpha, and the more time we give
> downstreams
> > to integrate, the better.
> >
> > This opens up discussion about inclusion of other changes, but I'm hoping
> > to freeze incompatible changes after maybe two alphas, do a beta (with no
> > further incompat changes allowed), and then finally a 3.x GA. For those
> > keeping track, that means a 3.x GA in about four months.
> >
> > I would also like to stress though that this is not intended to be a big
> > bang release. For instance, it would be great if we could maintain wire
> > compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> > branch-2 and branch-3 similar also makes backports easier, since we're
> > likely maintaining 2.x for a while yet.
> >
> > Please let me know any comments / concerns related to the above. If
> people
> > are friendly to the idea, I'd like to cut a branch-3 and start working on
> > the first alpha.
> >
> > Best,
> > Andrew
> >
>



-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: timsort bug in the JDK

2015-03-04 Thread Colin P. McCabe
Tsuyoshi Ozawa sent out an email to the common-dev list about this
recently.  It seems like the bug only bites when the number of
elements is larger than 67108864, which may limit its impact (to state
it mildly).  Also, the flawed sorting algorithm is not used on arrays
of primitives, just on arrays of Objects.  We should probably file a
JIRA to track this, though, just in case there is an impact.  And
maybe look at some of the uses of sort() in the code.

best,
Colin


On Tue, Mar 3, 2015 at 8:56 AM, Steve Loughran  wrote:
> One other late-breaking issue may we "what to do about the fact that Java 7 & 
> 8 have a broken sort algorithm?, which has surfaced 
> recently
>
> I believe some other OSS projects have tried to address this.
>
> Looking at LUCENE–6293, they weren’t clear whether it was worth the effort 
> for a problem that didn’t corrupt their data. I’m fairly tempted to argue the 
> same for doing something for 2.7, especially as a switch throughout the code 
> base could be expensive. Except: what if Oracle don’t ship a patch for JDK7?
>
> -Steve


[jira] [Created] (HADOOP-11672) test

2015-03-04 Thread xiangqian.xu (JIRA)
xiangqian.xu created HADOOP-11672:
-

 Summary: test
 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return

2015-03-04 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HADOOP-11673:
--

 Summary: Use org.junit.Assume to skip tests instead of return
 Key: HADOOP-11673
 URL: https://issues.apache.org/jira/browse/HADOOP-11673
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Reporter: Akira AJISAKA
Priority: Minor


We see the following code many times:
{code:title=TestCodec.java}
if (!ZlibFactory.isNativeZlibLoaded(conf)) {
  LOG.warn("skipped: native libs not loaded");
  return;
}
{code}
If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, 
with a warn log. I'd like to *skip* this test case by using 
{{org.junit.Assume}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2015-03-04 Thread Allen Wittenauer

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based 
upon JIRA.  While the interest level appeared to be approaching zero, I kept 
playing with it a bit and eventually also started playing with the release 
notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github 
repos if anyone was wanting a quick reference as to JIRA’s opinion on the 
matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0




[jira] [Created] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Sean Busbey (JIRA)
Sean Busbey created HADOOP-11674:


 Summary: data corruption for parallel CryptoInputStream and 
CryptoOutputStream
 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical


A common optimization in the io classes for Input/Output Streams is to save a 
single length-1 byte array to use in single byte read/write calls.

CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
but mistakenly mark the array as static. That means that only a single instance 
of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11672) test

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved HADOOP-11672.
---
Resolution: Not a Problem

> test
> 
>
> Key: HADOOP-11672
> URL: https://issues.apache.org/jira/browse/HADOOP-11672
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: xiangqian.xu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11675) tiny exception log with checking storedBlock is null or not

2015-03-04 Thread Liang Xie (JIRA)
Liang Xie created HADOOP-11675:
--

 Summary: tiny exception log with checking storedBlock is null or 
not
 Key: HADOOP-11675
 URL: https://issues.apache.org/jira/browse/HADOOP-11675
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor


Found this log at our product cluster:
{code}
2015-03-05,10:33:31,778 ERROR 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction 
failed 
regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
 storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 M, 
24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
java.io.IOException: 
BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
exist or is not under Constructionnull
{code}

let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng resolved HADOOP-11643.

  Resolution: Fixed
Target Version/s: HDFS-7285
Hadoop Flags: Reviewed

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: HDFS-7285
>
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: [RFE] Support MIT Kerberos localauth plugin API

2015-03-04 Thread Sunny Cheung
Sorry I was not clear enough about the problem. Let me explain more here.

Our problem is that normal user principal names can be very different from 
their Unix login. Some customers simply have arbitrary mapping between their 
Kerberos principals and Unix user accounts. For example, one customer has over 
200K users on AD with Kerberos principals in format ".@REALM" (e.g. john@example.com). But their Unix names are in format 
"user" or just "" (e.g. user123456, 123456).  

So, when Kerberos security is enabled on Hadoop clusters, how should we 
configure to authenticate these users from Hadoop clients?

The current way is to use the hadoop.security.auth_to_local setting, e.g. from 
core-site.xml:


hadoop.security.auth_to_local

RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/
DEFAULT 
   The mapping from kerberos principal names
to local OS user names.


These name translation rules can handle cases like mapping service accounts' 
principals (e.g. nn/@REALM or dn/@REALM to hdfs). But that is not 
scalable for normal users. There are just too many users to handle (as compared 
to the finite amount of service accounts).

Therefore, we would like to ask if alternative name resolution plugin interface 
can be supported by Hadoop. It could be similar to the way alternative 
authentication plugin is supported for HTTP web-consoles [1]:


hadoop.http.authentication.type
org.my.subclass.of.AltKerberosAuthenticationHandler


And the plugin interface can be as simple as this function (error handling 
ignored here):

String auth_to_local (String krb5Principal)
{
...
return unixName;
}

If this plugin interface is supported by Hadoop, then everyone can provide a 
plugin to support arbitrary mapping. This will be extremely useful when 
administrators need to tighten security on Hadoop with existing Kerberos 
infrastructure.

References:
[1] Authentication for Hadoop HTTP web-consoles
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html


-Original Message-
From: Allen Wittenauer [mailto:a...@altiscale.com] 
Sent: Tuesday, February 24, 2015 12:47 AM
To: common-dev@hadoop.apache.org
Subject: Re: [RFE] Support MIT Kerberos localauth plugin API


The big question is whether or not Java's implementation of Kerberos 
supports it. If so, which JDK release.  Java's implementation tends to run a 
bit behind MIT.  Additionally, there is a general reluctance to move Hadoop's 
baseline Java version to something even supported until user outcry demands it. 
 So I'd expect support to be a long way off.

It's worth noting that trunk exposes the hadoop kerbname command to 
help out with auth_to_local mapping, BTW.

On Feb 23, 2015, at 2:12 AM, Sunny Cheung  wrote:

> Hi Hadoop Common developers,
> 
> I am writing to seek your opinion about a feature request: support MIT 
> Kerberos localauth plugin API [1].
> 
> Hadoop currently provides the hadoop.security.auth_to_local setting to map 
> Kerberos principal to OS user account [2][3]. However, the regex-based 
> mappings (which mimics krb5.conf auth_to_local) could be difficult to use in 
> complex scenarios. Therefore, MIT Kerberos 1.12 added a plugin interface to 
> control krb5_aname_to_localname and krb5_kuserok behavior. And system daemon 
> SSSD (RHEL/Fedora) has already implemented a plugin to leverage this feature 
> [4].
> 
> Is that possible for Hadoop to support a plugin API similar to localauth 
> (when Kerberos security is enabled)? Thanks.
> 
> References:
> [1] Local authorization interface (localauth) 
> http://web.mit.edu/kerberos/krb5-1.12/doc/plugindev/localauth.html
> [2] Hadoop in Secure Mode - Mapping from Kerberos principal to OS user 
> account 
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-commo
> n/SecureMode.html#Mapping_from_Kerberos_principal_to_OS_user_account
> [3] Need mapping from long principal names to local OS user names
> https://issues.apache.org/jira/browse/HADOOP-6526
> [4] Allow Kerberos Principals in getpwnam() calls 
> https://fedorahosted.org/sssd/wiki/DesignDocs/NSSWithKerberosPrincipal