Re: Looking to a Hadoop 3 release

2015-03-04 Thread Andrew Wang
Let's not dismiss this quite so handily.

Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we
could make classpath isolation opt-in via configuration, what we really
want longer term is to have it on by default (or just always on). Stack in
particular points out the practical difficulties in using an opt-in method
in 2.x from a downstream project perspective. It's not pretty.

The plan that both Sean and Jason propose (which I support) is to have an
opt-in solution in 2.x, bake it there, then turn it on by default
(incompatible) in a new major release. I think this lines up well with my
proposal of some alphas and betas leading up to a GA 3.x. I'm also willing
to help with 2.x release management if that would help with testing this
feature.

Even setting aside classpath isolation, a new major release is still
justified by JDK8. Somehow this is being ignored in the discussion. Allen,
historically the voice of the user in our community, just highlighted it as
a major compatibility issue, and myself and Tucu have also expressed our
very strong concerns about bumping this in a minor release. 2.7's bump is a
unique exception, but this is not something to be cited as precedent or
policy.

Where does this resistance to a new major release stem from? As I've
described from the beginning, this will look basically like a 2.x release,
except for the inclusion of classpath isolation by default and target
version JDK8. I've expressed my desire to maintain API and wire
compatibility, and we can audit the set of incompatible changes in trunk to
ensure this. My proposal for doing alpha and beta releases leading up to GA
also gives downstreams a nice amount of time for testing and validation.

Regards,
Andrew

On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote:

 Awesome, looks like we can just do this in a compatible manner - nothing
 else on the list seems like it warrants a (premature) major release.

 Thanks Vinod.

 Arun

 
 From: Vinod Kumar Vavilapalli vino...@hortonworks.com
 Sent: Tuesday, March 03, 2015 2:30 PM
 To: common-dev@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
 yarn-...@hadoop.apache.org
 Subject: Re: Looking to a Hadoop 3 release

 I started pitching in more on that JIRA.

 To add, I think we can and should strive for doing this in a compatible
 manner, whatever the approach. Marking and calling it incompatible before
 we see proposal/patch seems premature to me. Commented the same on JIRA:
 https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875
 .

 Thanks
 +Vinod

 On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto:
 andrew.w...@cloudera.com wrote:

 Regarding classpath isolation, based on what I hear from our customers,
 it's still a big problem (even after the MR classloader work). The latest
 Jackson version bump was quite painful for our downstream projects, and the
 HDFS client still leaks a lot of dependencies. Would welcome more
 discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already
 chimed in.




Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Karthik Kambatla
Thanks for reviving this on email, Vinod. Newer folks like me might not be
aware of this JIRA/effort.

This would be wonderful to have so (1) we know the status of release
branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
Jonathan's or Matt's proposal for including branch name looks reasonable to
me.

If none has any objections, I think we can continue on JIRA and get this
in.

On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Hi all,

 I'd like us to revive the effort at
 https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
 builds being able to work with branches. Having the Jenkins verify patches
 on branches is very useful even if there may be relaxed review oversight on
 the said-branch.

 Unless there are objections, I'd request help from Giri who already has a
 patch sitting there for more than a year before. This may need us to
 collectively agree on some convention - the last comment says that the
 branch patch name should be in some format for this to work.

 Thanks,
 +Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Sean Busbey
+1

If we can make things look like HBase support for precommit testing on
branches (HBASE-12944), that would make it easier for new and occasional
contributors who might end up working in other ecosystem projects. AFAICT,
Jonathan's proposal for branch names in patch names does this.



On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla ka...@cloudera.com wrote:

 Thanks for reviving this on email, Vinod. Newer folks like me might not be
 aware of this JIRA/effort.

 This would be wonderful to have so (1) we know the status of release
 branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
 Jonathan's or Matt's proposal for including branch name looks reasonable to
 me.

 If none has any objections, I think we can continue on JIRA and get this
 in.

 On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

  Hi all,
 
  I'd like us to revive the effort at
  https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
  builds being able to work with branches. Having the Jenkins verify
 patches
  on branches is very useful even if there may be relaxed review oversight
 on
  the said-branch.
 
  Unless there are objections, I'd request help from Giri who already has a
  patch sitting there for more than a year before. This may need us to
  collectively agree on some convention - the last comment says that the
  branch patch name should be in some format for this to work.
 
  Thanks,
  +Vinod
 



 --
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es




-- 
Sean


Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Vinod Kumar Vavilapalli
Hi all,

I'd like us to revive the effort at 
https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit builds 
being able to work with branches. Having the Jenkins verify patches on branches 
is very useful even if there may be relaxed review oversight on the said-branch.

Unless there are objections, I'd request help from Giri who already has a patch 
sitting there for more than a year before. This may need us to collectively 
agree on some convention - the last comment says that the branch patch name 
should be in some format for this to work.

Thanks,
+Vinod


Re: Looking to a Hadoop 3 release

2015-03-04 Thread Stack
In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 HBase server MR tools are
broken on Hadoop 2.5+ Yarn, among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew



[jira] [Reopened] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened HADOOP-11668:
---
  Assignee: Allen Wittenauer  (was: Vinayakumar B)

Re-opening.  The problem here isn't start/stop, it's *-daemons.sh, which are 
now broken.

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Allen Wittenauer
 Attachments: HADOOP-11668-01.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: 2.7 status

2015-03-04 Thread Zheng, Kai
Thanks Vinod for the hints. 

I have updated the both patches aligning with latest codes, and added more unit 
tests. The building results look reasonable. Thanks anyone that would give them 
more review and I would update in timely manner. 

Regards,
Kai

-Original Message-
From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] 
Sent: Tuesday, March 03, 2015 11:31 AM
To: Zheng, Kai
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; Hadoop Common; 
yarn-...@hadoop.apache.org
Subject: Re: 2.7 status

Kai, please ping the reviewers that were already looking at your patches 
before. If the patches go in by end of this week, we can include them.

Thanks,
+Vinod

On Mar 2, 2015, at 7:04 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Is it interested to get the following issues in the release ? Thanks !
 
 HADOOP-10670
 HADOOP-10671
 
 Regards,
 Kai
 
 -Original Message-
 From: Yongjun Zhang [mailto:yzh...@cloudera.com]
 Sent: Monday, March 02, 2015 4:46 AM
 To: hdfs-...@hadoop.apache.org
 Cc: Vinod Kumar Vavilapalli; Hadoop Common; 
 mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
 Subject: Re: 2.7 status
 
 Hi,
 
 Thanks for working on 2.7 release.
 
 Currently the fallback from KerberosAuthenticator to PseudoAuthenticator  is 
 enabled by default in a hardcoded way. HAOOP-10895 changes the default and 
 requires applications (such as oozie) to set a config property or call an API 
 to enable the fallback.
 
 This jira has been reviewed, and almost ready to get in. However, there is 
 a concern that we have to change the relevant applications. Please see my 
 comment here:
 
 https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14
 321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
 bpanel#comment-14321823
 
 Any of your comments will be highly appreciated. This jira was postponed from 
 2.6. I think it should be no problem to skip 2.7. But your comments would 
 help us to decide what to do with this jira for future releases.
 
 Thanks.
 
 --Yongjun
 
 
 On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote:
 
 Sounds good, thanks for the help Vinod!
 
 Arun
 
 
 From: Vinod Kumar Vavilapalli
 Sent: Sunday, March 01, 2015 11:43 AM
 To: Hadoop Common; Jason Lowe; Arun Murthy
 Subject: Re: 2.7 status
 
 Agreed. How about we roll an RC end of this week? As a Java 7+ 
 release with features, patches that already got in?
 
 Here's a filter tracking blocker tickets - 
 https://issues.apache.org/jira/issues/?filter=12330598. Nine open now.
 
 +Arun
 Arun, I'd like to help get 2.7 out without further delay. Do you mind 
 me taking over release duties?
 
 Thanks,
 +Vinod
 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: 2.7 status
 
 I'd like to see a 2.7 release sooner than later.  It has been almost 
 3 months since Hadoop 2.6 was released, and there have already been 
 634 JIRAs committed to 2.7.  That's a lot of changes waiting for an official 
 release.
 
 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2
 C 
 hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolut
 i
 on%3DFixed
 Jason
 
  From: Sangjin Lee sj...@apache.org
 To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org
 Sent: Tuesday, February 10, 2015 1:30 PM
 Subject: 2.7 status
 
 Folks,
 
 What is the current status of the 2.7 release? I know initially it 
 started out as a java-7 only release, but looking at the JIRAs that 
 is very much not the case.
 
 Do we have a certain timeframe for 2.7 or is it time to discuss it?
 
 Thanks,
 Sangjin
 
 



[jira] [Created] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)

2015-03-04 Thread Adam Budde (JIRA)
Adam Budde created HADOOP-11670:
---

 Summary: Fix IAM instance profile auth for s3a (broken in 
HADOOP-11446)
 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). 

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Looking to a Hadoop 3 release

2015-03-04 Thread Zheng, Kai
Might I have some comments for this, just providing my thought. Thanks.

 If we start now, it might make it out by 2016. If we start now, 
 downstreamers can start aligning themselves to land versions that suit at 
 about the same time.
Not only for down streamers to align with the long term release, but also for 
contributors like me to align with their future effort, maybe.

In addition to the JDK8 support and classpath isolation, might we add more 
possible candidate considerations. 
How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ?
https://issues.apache.org/jira/browse/HADOOP-9797

The benefits: 
1) allow multiple login sessions/contexts and authentication methods to be used 
in the same Java application/process without conflicts, providing good 
isolation by getting rid of globals and statics.
2) allow to pluggable new authentication methods for UGI, in modular, 
manageable and maintainable manner.

Another, we would also push the first release of Apache Kerby, preparing for a 
strong dedicated and clean Kerberos library in Java for both client and KDC 
sides, and by leveraging the library, 
update Hadoop-MiniKDC and perform more security tests.
https://issues.apache.org/jira/browse/DIRKRB-102

Hope this makes sense. Thanks.

Regards,
Kai

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Thursday, March 05, 2015 2:47 AM
To: common-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Looking to a Hadoop 3 release

In general +1 on 3.0.0. Its time. If we start now, it might make it out by 
2016. If we start now, downstreamers can start aligning themselves to land 
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes, and 
there is ongoing discussion as to whether they are or not*, is there any chance 
of getting a longer list of big differences between the branches? In particular 
I'd be interested in improvements that are 'off' by default that would be 
better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept 
seemingly open to interpretation with a definition that is other than prevails 
elsewhere in software. See Allen's list above, and in our downstream project, 
the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, 
among others.  Let 3.x be incompatible with 2.x if only so we can leave behind 
all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about 
 due for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that 
 will have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been 
 a long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to 
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two 
 months from now). In the past, we've had issues with our dependencies 
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and 
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish 
 series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM 
 and other cat herding responsibilities. There are already quite a few 
 changes slated for 3.0 besides the above (for instance the shell 
 script rewrite) so there's already value in a 3.0 alpha, and the more 
 time we give downstreams to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm 
 hoping to freeze incompatible changes after maybe two alphas, do a 
 beta (with no further incompat changes allowed), and then finally a 
 3.x GA. For those keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a 
 big bang release. For instance, it would be great if we could maintain 
 wire compatibility between 2.x and 3.x, so rolling upgrades work. 
 Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're 
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If 
 people are friendly to the idea, I'd like to cut a branch-3 and start 
 working on the first alpha.

 Best,
 Andrew



Re: timsort bug in the JDK

2015-03-04 Thread Colin P. McCabe
Tsuyoshi Ozawa sent out an email to the common-dev list about this
recently.  It seems like the bug only bites when the number of
elements is larger than 67108864, which may limit its impact (to state
it mildly).  Also, the flawed sorting algorithm is not used on arrays
of primitives, just on arrays of Objects.  We should probably file a
JIRA to track this, though, just in case there is an impact.  And
maybe look at some of the uses of sort() in the code.

best,
Colin


On Tue, Mar 3, 2015 at 8:56 AM, Steve Loughran ste...@hortonworks.com wrote:
 One other late-breaking issue may we what to do about the fact that Java 7  
 8 have a broken sort algorithm?, which has surfaced 
 recentlyhttp://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

 I believe some other OSS projects have tried to address this.

 Looking at LUCENE–6293, they weren’t clear whether it was worth the effort 
 for a problem that didn’t corrupt their data. I’m fairly tempted to argue the 
 same for doing something for 2.7, especially as a switch throughout the code 
 base could be expensive. Except: what if Oracle don’t ship a patch for JDK7?

 -Steve


[jira] [Created] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return

2015-03-04 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HADOOP-11673:
--

 Summary: Use org.junit.Assume to skip tests instead of return
 Key: HADOOP-11673
 URL: https://issues.apache.org/jira/browse/HADOOP-11673
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Reporter: Akira AJISAKA
Priority: Minor


We see the following code many times:
{code:title=TestCodec.java}
if (!ZlibFactory.isNativeZlibLoaded(conf)) {
  LOG.warn(skipped: native libs not loaded);
  return;
}
{code}
If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, 
with a warn log. I'd like to *skip* this test case by using 
{{org.junit.Assume}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Zhijie Shen
+1. It¹s really helpful for branch development. To continue Karthik¹s
point, is it good make pre-commit testing against branch-2 as the default
too like that against trunk?

On 3/4/15, 1:47 PM, Sean Busbey bus...@cloudera.com wrote:

+1

If we can make things look like HBase support for precommit testing on
branches (HBASE-12944), that would make it easier for new and occasional
contributors who might end up working in other ecosystem projects. AFAICT,
Jonathan's proposal for branch names in patch names does this.



On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 Thanks for reviving this on email, Vinod. Newer folks like me might not
be
 aware of this JIRA/effort.

 This would be wonderful to have so (1) we know the status of release
 branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
 Jonathan's or Matt's proposal for including branch name looks
reasonable to
 me.

 If none has any objections, I think we can continue on JIRA and get this
 in.

 On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

  Hi all,
 
  I'd like us to revive the effort at
  https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
  builds being able to work with branches. Having the Jenkins verify
 patches
  on branches is very useful even if there may be relaxed review
oversight
 on
  the said-branch.
 
  Unless there are objections, I'd request help from Giri who already
has a
  patch sitting there for more than a year before. This may need us to
  collectively agree on some convention - the last comment says that the
  branch patch name should be in some format for this to work.
 
  Thanks,
  +Vinod
 



 --
 Karthik Kambatla
 Software Engineer, Cloudera Inc.
 
 http://five.sentenc.es




-- 
Sean



[jira] [Created] (HADOOP-11672) test

2015-03-04 Thread xiangqian.xu (JIRA)
xiangqian.xu created HADOOP-11672:
-

 Summary: test
 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Sean Busbey (JIRA)
Sean Busbey created HADOOP-11674:


 Summary: data corruption for parallel CryptoInputStream and 
CryptoOutputStream
 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical


A common optimization in the io classes for Input/Output Streams is to save a 
single length-1 byte array to use in single byte read/write calls.

CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
but mistakenly mark the array as static. That means that only a single instance 
of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng resolved HADOOP-11643.

  Resolution: Fixed
Target Version/s: HDFS-7285
Hadoop Flags: Reviewed

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-7285

 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: [RFE] Support MIT Kerberos localauth plugin API

2015-03-04 Thread Sunny Cheung
Sorry I was not clear enough about the problem. Let me explain more here.

Our problem is that normal user principal names can be very different from 
their Unix login. Some customers simply have arbitrary mapping between their 
Kerberos principals and Unix user accounts. For example, one customer has over 
200K users on AD with Kerberos principals in format first name.last 
name@REALM (e.g. john@example.com). But their Unix names are in format 
userID or just ID (e.g. user123456, 123456).  

So, when Kerberos security is enabled on Hadoop clusters, how should we 
configure to authenticate these users from Hadoop clients?

The current way is to use the hadoop.security.auth_to_local setting, e.g. from 
core-site.xml:

property
namehadoop.security.auth_to_local/name
value
RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/
DEFAULT/value 
   descriptionThe mapping from kerberos principal names
to local OS user names./description
/property

These name translation rules can handle cases like mapping service accounts' 
principals (e.g. nn/host@REALM or dn/host@REALM to hdfs). But that is not 
scalable for normal users. There are just too many users to handle (as compared 
to the finite amount of service accounts).

Therefore, we would like to ask if alternative name resolution plugin interface 
can be supported by Hadoop. It could be similar to the way alternative 
authentication plugin is supported for HTTP web-consoles [1]:

property
namehadoop.http.authentication.type/name
valueorg.my.subclass.of.AltKerberosAuthenticationHandler/value
/property

And the plugin interface can be as simple as this function (error handling 
ignored here):

String auth_to_local (String krb5Principal)
{
...
return unixName;
}

If this plugin interface is supported by Hadoop, then everyone can provide a 
plugin to support arbitrary mapping. This will be extremely useful when 
administrators need to tighten security on Hadoop with existing Kerberos 
infrastructure.

References:
[1] Authentication for Hadoop HTTP web-consoles
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html


-Original Message-
From: Allen Wittenauer [mailto:a...@altiscale.com] 
Sent: Tuesday, February 24, 2015 12:47 AM
To: common-dev@hadoop.apache.org
Subject: Re: [RFE] Support MIT Kerberos localauth plugin API


The big question is whether or not Java's implementation of Kerberos 
supports it. If so, which JDK release.  Java's implementation tends to run a 
bit behind MIT.  Additionally, there is a general reluctance to move Hadoop's 
baseline Java version to something even supported until user outcry demands it. 
 So I'd expect support to be a long way off.

It's worth noting that trunk exposes the hadoop kerbname command to 
help out with auth_to_local mapping, BTW.

On Feb 23, 2015, at 2:12 AM, Sunny Cheung sunny.che...@centrify.com wrote:

 Hi Hadoop Common developers,
 
 I am writing to seek your opinion about a feature request: support MIT 
 Kerberos localauth plugin API [1].
 
 Hadoop currently provides the hadoop.security.auth_to_local setting to map 
 Kerberos principal to OS user account [2][3]. However, the regex-based 
 mappings (which mimics krb5.conf auth_to_local) could be difficult to use in 
 complex scenarios. Therefore, MIT Kerberos 1.12 added a plugin interface to 
 control krb5_aname_to_localname and krb5_kuserok behavior. And system daemon 
 SSSD (RHEL/Fedora) has already implemented a plugin to leverage this feature 
 [4].
 
 Is that possible for Hadoop to support a plugin API similar to localauth 
 (when Kerberos security is enabled)? Thanks.
 
 References:
 [1] Local authorization interface (localauth) 
 http://web.mit.edu/kerberos/krb5-1.12/doc/plugindev/localauth.html
 [2] Hadoop in Secure Mode - Mapping from Kerberos principal to OS user 
 account 
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-commo
 n/SecureMode.html#Mapping_from_Kerberos_principal_to_OS_user_account
 [3] Need mapping from long principal names to local OS user names
 https://issues.apache.org/jira/browse/HADOOP-6526
 [4] Allow Kerberos Principals in getpwnam() calls 
 https://fedorahosted.org/sssd/wiki/DesignDocs/NSSWithKerberosPrincipal



Re: Looking to a Hadoop 3 release

2015-03-04 Thread Allen Wittenauer

One of the questions that keeps popping up is “what exactly is in trunk?”

As some may recall, I had done some experiments creating the change log based 
upon JIRA.  While the interest level appeared to be approaching zero, I kept 
playing with it a bit and eventually also started playing with the release 
notes script (for various reasons I won’t bore you with.)

In any case, I’ve started posting the results of these runs on one of my github 
repos if anyone was wanting a quick reference as to JIRA’s opinion on the 
matter:

https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0




[jira] [Resolved] (HADOOP-11672) test

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved HADOOP-11672.
---
Resolution: Not a Problem

 test
 

 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11675) tiny exception log with checking storedBlock is null or not

2015-03-04 Thread Liang Xie (JIRA)
Liang Xie created HADOOP-11675:
--

 Summary: tiny exception log with checking storedBlock is null or 
not
 Key: HADOOP-11675
 URL: https://issues.apache.org/jira/browse/HADOOP-11675
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor


Found this log at our product cluster:
{code}
2015-03-05,10:33:31,778 ERROR 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction 
failed 
regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
 storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 M, 
24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
java.io.IOException: 
BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
exist or is not under Constructionnull
{code}

let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Hadoop-Common-trunk #1424

2015-03-04 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-trunk/1424/changes