Re: Looking to a Hadoop 3 release
Let's not dismiss this quite so handily. Sean, Jason, and Stack replied on HADOOP-11656 pointing out that while we could make classpath isolation opt-in via configuration, what we really want longer term is to have it on by default (or just always on). Stack in particular points out the practical difficulties in using an opt-in method in 2.x from a downstream project perspective. It's not pretty. The plan that both Sean and Jason propose (which I support) is to have an opt-in solution in 2.x, bake it there, then turn it on by default (incompatible) in a new major release. I think this lines up well with my proposal of some alphas and betas leading up to a GA 3.x. I'm also willing to help with 2.x release management if that would help with testing this feature. Even setting aside classpath isolation, a new major release is still justified by JDK8. Somehow this is being ignored in the discussion. Allen, historically the voice of the user in our community, just highlighted it as a major compatibility issue, and myself and Tucu have also expressed our very strong concerns about bumping this in a minor release. 2.7's bump is a unique exception, but this is not something to be cited as precedent or policy. Where does this resistance to a new major release stem from? As I've described from the beginning, this will look basically like a 2.x release, except for the inclusion of classpath isolation by default and target version JDK8. I've expressed my desire to maintain API and wire compatibility, and we can audit the set of incompatible changes in trunk to ensure this. My proposal for doing alpha and beta releases leading up to GA also gives downstreams a nice amount of time for testing and validation. Regards, Andrew On Tue, Mar 3, 2015 at 2:32 PM, Arun Murthy a...@hortonworks.com wrote: Awesome, looks like we can just do this in a compatible manner - nothing else on the list seems like it warrants a (premature) major release. Thanks Vinod. Arun From: Vinod Kumar Vavilapalli vino...@hortonworks.com Sent: Tuesday, March 03, 2015 2:30 PM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release I started pitching in more on that JIRA. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Marking and calling it incompatible before we see proposal/patch seems premature to me. Commented the same on JIRA: https://issues.apache.org/jira/browse/HADOOP-11656?focusedCommentId=14345875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14345875 . Thanks +Vinod On Mar 2, 2015, at 8:08 PM, Andrew Wang andrew.w...@cloudera.commailto: andrew.w...@cloudera.com wrote: Regarding classpath isolation, based on what I hear from our customers, it's still a big problem (even after the MR classloader work). The latest Jackson version bump was quite painful for our downstream projects, and the HDFS client still leaks a lot of dependencies. Would welcome more discussion of this on HADOOP-11656, Steve, Colin, and Haohui have already chimed in.
Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches
Thanks for reviving this on email, Vinod. Newer folks like me might not be aware of this JIRA/effort. This would be wonderful to have so (1) we know the status of release branches (branch-2, etc.) and also (2) feature branches (YARN-2928). Jonathan's or Matt's proposal for including branch name looks reasonable to me. If none has any objections, I think we can continue on JIRA and get this in. On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Hi all, I'd like us to revive the effort at https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit builds being able to work with branches. Having the Jenkins verify patches on branches is very useful even if there may be relaxed review oversight on the said-branch. Unless there are objections, I'd request help from Giri who already has a patch sitting there for more than a year before. This may need us to collectively agree on some convention - the last comment says that the branch patch name should be in some format for this to work. Thanks, +Vinod -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es
Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches
+1 If we can make things look like HBase support for precommit testing on branches (HBASE-12944), that would make it easier for new and occasional contributors who might end up working in other ecosystem projects. AFAICT, Jonathan's proposal for branch names in patch names does this. On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla ka...@cloudera.com wrote: Thanks for reviving this on email, Vinod. Newer folks like me might not be aware of this JIRA/effort. This would be wonderful to have so (1) we know the status of release branches (branch-2, etc.) and also (2) feature branches (YARN-2928). Jonathan's or Matt's proposal for including branch name looks reasonable to me. If none has any objections, I think we can continue on JIRA and get this in. On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Hi all, I'd like us to revive the effort at https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit builds being able to work with branches. Having the Jenkins verify patches on branches is very useful even if there may be relaxed review oversight on the said-branch. Unless there are objections, I'd request help from Giri who already has a patch sitting there for more than a year before. This may need us to collectively agree on some convention - the last comment says that the branch patch name should be in some format for this to work. Thanks, +Vinod -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es -- Sean
Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches
Hi all, I'd like us to revive the effort at https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit builds being able to work with branches. Having the Jenkins verify patches on branches is very useful even if there may be relaxed review oversight on the said-branch. Unless there are objections, I'd request help from Giri who already has a patch sitting there for more than a year before. This may need us to collectively agree on some convention - the last comment says that the branch patch name should be in some format for this to work. Thanks, +Vinod
Re: Looking to a Hadoop 3 release
In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'. Thanks, St.Ack * Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, among others. Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility' and just start over (as per Allen). On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
[jira] [Reopened] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reopened HADOOP-11668: --- Assignee: Allen Wittenauer (was: Vinayakumar B) Re-opening. The problem here isn't start/stop, it's *-daemons.sh, which are now broken. start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option --- Key: HADOOP-11668 URL: https://issues.apache.org/jira/browse/HADOOP-11668 Project: Hadoop Common Issue Type: Bug Components: scripts Reporter: Vinayakumar B Assignee: Allen Wittenauer Attachments: HADOOP-11668-01.patch After introduction of --slaves option for the scripts, start-dfs.sh and stop-dfs.sh will no longer work in HA mode. This is due to multiple hostnames passed for '--hostnames' delimited with space. These hostnames are treated as commands and script fails. So, instead of delimiting with space, delimiting with comma(,) before passing to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: 2.7 status
Thanks Vinod for the hints. I have updated the both patches aligning with latest codes, and added more unit tests. The building results look reasonable. Thanks anyone that would give them more review and I would update in timely manner. Regards, Kai -Original Message- From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] Sent: Tuesday, March 03, 2015 11:31 AM To: Zheng, Kai Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; Hadoop Common; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Kai, please ping the reviewers that were already looking at your patches before. If the patches go in by end of this week, we can include them. Thanks, +Vinod On Mar 2, 2015, at 7:04 PM, Zheng, Kai kai.zh...@intel.com wrote: Is it interested to get the following issues in the release ? Thanks ! HADOOP-10670 HADOOP-10671 Regards, Kai -Original Message- From: Yongjun Zhang [mailto:yzh...@cloudera.com] Sent: Monday, March 02, 2015 4:46 AM To: hdfs-...@hadoop.apache.org Cc: Vinod Kumar Vavilapalli; Hadoop Common; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: 2.7 status Hi, Thanks for working on 2.7 release. Currently the fallback from KerberosAuthenticator to PseudoAuthenticator is enabled by default in a hardcoded way. HAOOP-10895 changes the default and requires applications (such as oozie) to set a config property or call an API to enable the fallback. This jira has been reviewed, and almost ready to get in. However, there is a concern that we have to change the relevant applications. Please see my comment here: https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14 321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta bpanel#comment-14321823 Any of your comments will be highly appreciated. This jira was postponed from 2.6. I think it should be no problem to skip 2.7. But your comments would help us to decide what to do with this jira for future releases. Thanks. --Yongjun On Sun, Mar 1, 2015 at 11:58 AM, Arun Murthy a...@hortonworks.com wrote: Sounds good, thanks for the help Vinod! Arun From: Vinod Kumar Vavilapalli Sent: Sunday, March 01, 2015 11:43 AM To: Hadoop Common; Jason Lowe; Arun Murthy Subject: Re: 2.7 status Agreed. How about we roll an RC end of this week? As a Java 7+ release with features, patches that already got in? Here's a filter tracking blocker tickets - https://issues.apache.org/jira/issues/?filter=12330598. Nine open now. +Arun Arun, I'd like to help get 2.7 out without further delay. Do you mind me taking over release duties? Thanks, +Vinod From: Jason Lowe jl...@yahoo-inc.com.INVALID Sent: Friday, February 13, 2015 8:11 AM To: common-dev@hadoop.apache.org Subject: Re: 2.7 status I'd like to see a 2.7 release sooner than later. It has been almost 3 months since Hadoop 2.6 was released, and there have already been 634 JIRAs committed to 2.7. That's a lot of changes waiting for an official release. https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2 C hdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolut i on%3DFixed Jason From: Sangjin Lee sj...@apache.org To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org Sent: Tuesday, February 10, 2015 1:30 PM Subject: 2.7 status Folks, What is the current status of the 2.7 release? I know initially it started out as a java-7 only release, but looking at the JIRAs that is very much not the case. Do we have a certain timeframe for 2.7 or is it time to discuss it? Thanks, Sangjin
[jira] [Created] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
Adam Budde created HADOOP-11670: --- Summary: Fix IAM instance profile auth for s3a (broken in HADOOP-11446) Key: HADOOP-11670 URL: https://issues.apache.org/jira/browse/HADOOP-11670 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.6.0 Reporter: Adam Budde Fix For: 2.7.0 One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. The patch submitted to resolve HADOOP-11446 breaks this behavior by using the S3Credentials class to read the value of these two params (this change is unrelated to resolving HADOOP-11446). S3AFileSystem.java, lines 161-170: {code} // Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() ); {code} As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Looking to a Hadoop 3 release
Might I have some comments for this, just providing my thought. Thanks. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. Not only for down streamers to align with the long term release, but also for contributors like me to align with their future effort, maybe. In addition to the JDK8 support and classpath isolation, might we add more possible candidate considerations. How would you like this one, HADOOP-9797 Pluggable and compatible UGI change ? https://issues.apache.org/jira/browse/HADOOP-9797 The benefits: 1) allow multiple login sessions/contexts and authentication methods to be used in the same Java application/process without conflicts, providing good isolation by getting rid of globals and statics. 2) allow to pluggable new authentication methods for UGI, in modular, manageable and maintainable manner. Another, we would also push the first release of Apache Kerby, preparing for a strong dedicated and clean Kerberos library in Java for both client and KDC sides, and by leveraging the library, update Hadoop-MiniKDC and perform more security tests. https://issues.apache.org/jira/browse/DIRKRB-102 Hope this makes sense. Thanks. Regards, Kai -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Thursday, March 05, 2015 2:47 AM To: common-dev@hadoop.apache.org Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Looking to a Hadoop 3 release In general +1 on 3.0.0. Its time. If we start now, it might make it out by 2016. If we start now, downstreamers can start aligning themselves to land versions that suit at about the same time. While two big items have been called out as possible incompatible changes, and there is ongoing discussion as to whether they are or not*, is there any chance of getting a longer list of big differences between the branches? In particular I'd be interested in improvements that are 'off' by default that would be better defaulted 'on'. Thanks, St.Ack * Let me note that 'compatible' around these parts is a trampled concept seemingly open to interpretation with a definition that is other than prevails elsewhere in software. See Allen's list above, and in our downstream project, the recent HBASE-13149 HBase server MR tools are broken on Hadoop 2.5+ Yarn, among others. Let 3.x be incompatible with 2.x if only so we can leave behind all current notions of 'compatibility' and just start over (as per Allen). On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi devs, It's been a year and a half since 2.x went GA, and I think we're about due for a 3.x release. Notably, there are two incompatible changes I'd like to call out, that will have a tremendous positive impact for our users. First, classpath isolation being done at HADOOP-11656, which has been a long-standing request from many downstreams and Hadoop users. Second, bumping the source and target JDK version to JDK8 (related to HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two months from now). In the past, we've had issues with our dependencies discontinuing support for old JDKs, so this will future-proof us. Between the two, we'll also have quite an opportunity to clean up and upgrade our dependencies, another common user and developer request. I'd like to propose that we start rolling a series of monthly-ish series of 3.0 alpha releases ASAP, with myself volunteering to take on the RM and other cat herding responsibilities. There are already quite a few changes slated for 3.0 besides the above (for instance the shell script rewrite) so there's already value in a 3.0 alpha, and the more time we give downstreams to integrate, the better. This opens up discussion about inclusion of other changes, but I'm hoping to freeze incompatible changes after maybe two alphas, do a beta (with no further incompat changes allowed), and then finally a 3.x GA. For those keeping track, that means a 3.x GA in about four months. I would also like to stress though that this is not intended to be a big bang release. For instance, it would be great if we could maintain wire compatibility between 2.x and 3.x, so rolling upgrades work. Keeping branch-2 and branch-3 similar also makes backports easier, since we're likely maintaining 2.x for a while yet. Please let me know any comments / concerns related to the above. If people are friendly to the idea, I'd like to cut a branch-3 and start working on the first alpha. Best, Andrew
Re: timsort bug in the JDK
Tsuyoshi Ozawa sent out an email to the common-dev list about this recently. It seems like the bug only bites when the number of elements is larger than 67108864, which may limit its impact (to state it mildly). Also, the flawed sorting algorithm is not used on arrays of primitives, just on arrays of Objects. We should probably file a JIRA to track this, though, just in case there is an impact. And maybe look at some of the uses of sort() in the code. best, Colin On Tue, Mar 3, 2015 at 8:56 AM, Steve Loughran ste...@hortonworks.com wrote: One other late-breaking issue may we what to do about the fact that Java 7 8 have a broken sort algorithm?, which has surfaced recentlyhttp://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ I believe some other OSS projects have tried to address this. Looking at LUCENE–6293, they weren’t clear whether it was worth the effort for a problem that didn’t corrupt their data. I’m fairly tempted to argue the same for doing something for 2.7, especially as a switch throughout the code base could be expensive. Except: what if Oracle don’t ship a patch for JDK7? -Steve
[jira] [Created] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return
Akira AJISAKA created HADOOP-11673: -- Summary: Use org.junit.Assume to skip tests instead of return Key: HADOOP-11673 URL: https://issues.apache.org/jira/browse/HADOOP-11673 Project: Hadoop Common Issue Type: Improvement Components: test Reporter: Akira AJISAKA Priority: Minor We see the following code many times: {code:title=TestCodec.java} if (!ZlibFactory.isNativeZlibLoaded(conf)) { LOG.warn(skipped: native libs not loaded); return; } {code} If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, with a warn log. I'd like to *skip* this test case by using {{org.junit.Assume}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches
+1. It¹s really helpful for branch development. To continue Karthik¹s point, is it good make pre-commit testing against branch-2 as the default too like that against trunk? On 3/4/15, 1:47 PM, Sean Busbey bus...@cloudera.com wrote: +1 If we can make things look like HBase support for precommit testing on branches (HBASE-12944), that would make it easier for new and occasional contributors who might end up working in other ecosystem projects. AFAICT, Jonathan's proposal for branch names in patch names does this. On Wed, Mar 4, 2015 at 3:41 PM, Karthik Kambatla ka...@cloudera.com wrote: Thanks for reviving this on email, Vinod. Newer folks like me might not be aware of this JIRA/effort. This would be wonderful to have so (1) we know the status of release branches (branch-2, etc.) and also (2) feature branches (YARN-2928). Jonathan's or Matt's proposal for including branch name looks reasonable to me. If none has any objections, I think we can continue on JIRA and get this in. On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Hi all, I'd like us to revive the effort at https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit builds being able to work with branches. Having the Jenkins verify patches on branches is very useful even if there may be relaxed review oversight on the said-branch. Unless there are objections, I'd request help from Giri who already has a patch sitting there for more than a year before. This may need us to collectively agree on some convention - the last comment says that the branch patch name should be in some format for this to work. Thanks, +Vinod -- Karthik Kambatla Software Engineer, Cloudera Inc. http://five.sentenc.es -- Sean
[jira] [Created] (HADOOP-11672) test
xiangqian.xu created HADOOP-11672: - Summary: test Key: HADOOP-11672 URL: https://issues.apache.org/jira/browse/HADOOP-11672 Project: Hadoop Common Issue Type: New Feature Reporter: xiangqian.xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream
Sean Busbey created HADOOP-11674: Summary: data corruption for parallel CryptoInputStream and CryptoOutputStream Key: HADOOP-11674 URL: https://issues.apache.org/jira/browse/HADOOP-11674 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 2.6.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Critical A common optimization in the io classes for Input/Output Streams is to save a single length-1 byte array to use in single byte read/write calls. CryptoInputStream and CryptoOutputStream both attempt to follow this practice but mistakenly mark the array as static. That means that only a single instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng resolved HADOOP-11643. Resolution: Fixed Target Version/s: HDFS-7285 Hadoop Flags: Reviewed Define EC schema API for ErasureCodec - Key: HADOOP-11643 URL: https://issues.apache.org/jira/browse/HADOOP-11643 Project: Hadoop Common Issue Type: Sub-task Components: io Reporter: Kai Zheng Assignee: Kai Zheng Fix For: HDFS-7285 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, HADOOP-11643_v2.patch As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [RFE] Support MIT Kerberos localauth plugin API
Sorry I was not clear enough about the problem. Let me explain more here. Our problem is that normal user principal names can be very different from their Unix login. Some customers simply have arbitrary mapping between their Kerberos principals and Unix user accounts. For example, one customer has over 200K users on AD with Kerberos principals in format first name.last name@REALM (e.g. john@example.com). But their Unix names are in format userID or just ID (e.g. user123456, 123456). So, when Kerberos security is enabled on Hadoop clusters, how should we configure to authenticate these users from Hadoop clients? The current way is to use the hadoop.security.auth_to_local setting, e.g. from core-site.xml: property namehadoop.security.auth_to_local/name value RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT/value descriptionThe mapping from kerberos principal names to local OS user names./description /property These name translation rules can handle cases like mapping service accounts' principals (e.g. nn/host@REALM or dn/host@REALM to hdfs). But that is not scalable for normal users. There are just too many users to handle (as compared to the finite amount of service accounts). Therefore, we would like to ask if alternative name resolution plugin interface can be supported by Hadoop. It could be similar to the way alternative authentication plugin is supported for HTTP web-consoles [1]: property namehadoop.http.authentication.type/name valueorg.my.subclass.of.AltKerberosAuthenticationHandler/value /property And the plugin interface can be as simple as this function (error handling ignored here): String auth_to_local (String krb5Principal) { ... return unixName; } If this plugin interface is supported by Hadoop, then everyone can provide a plugin to support arbitrary mapping. This will be extremely useful when administrators need to tighten security on Hadoop with existing Kerberos infrastructure. References: [1] Authentication for Hadoop HTTP web-consoles http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html -Original Message- From: Allen Wittenauer [mailto:a...@altiscale.com] Sent: Tuesday, February 24, 2015 12:47 AM To: common-dev@hadoop.apache.org Subject: Re: [RFE] Support MIT Kerberos localauth plugin API The big question is whether or not Java's implementation of Kerberos supports it. If so, which JDK release. Java's implementation tends to run a bit behind MIT. Additionally, there is a general reluctance to move Hadoop's baseline Java version to something even supported until user outcry demands it. So I'd expect support to be a long way off. It's worth noting that trunk exposes the hadoop kerbname command to help out with auth_to_local mapping, BTW. On Feb 23, 2015, at 2:12 AM, Sunny Cheung sunny.che...@centrify.com wrote: Hi Hadoop Common developers, I am writing to seek your opinion about a feature request: support MIT Kerberos localauth plugin API [1]. Hadoop currently provides the hadoop.security.auth_to_local setting to map Kerberos principal to OS user account [2][3]. However, the regex-based mappings (which mimics krb5.conf auth_to_local) could be difficult to use in complex scenarios. Therefore, MIT Kerberos 1.12 added a plugin interface to control krb5_aname_to_localname and krb5_kuserok behavior. And system daemon SSSD (RHEL/Fedora) has already implemented a plugin to leverage this feature [4]. Is that possible for Hadoop to support a plugin API similar to localauth (when Kerberos security is enabled)? Thanks. References: [1] Local authorization interface (localauth) http://web.mit.edu/kerberos/krb5-1.12/doc/plugindev/localauth.html [2] Hadoop in Secure Mode - Mapping from Kerberos principal to OS user account http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-commo n/SecureMode.html#Mapping_from_Kerberos_principal_to_OS_user_account [3] Need mapping from long principal names to local OS user names https://issues.apache.org/jira/browse/HADOOP-6526 [4] Allow Kerberos Principals in getpwnam() calls https://fedorahosted.org/sssd/wiki/DesignDocs/NSSWithKerberosPrincipal
Re: Looking to a Hadoop 3 release
One of the questions that keeps popping up is “what exactly is in trunk?” As some may recall, I had done some experiments creating the change log based upon JIRA. While the interest level appeared to be approaching zero, I kept playing with it a bit and eventually also started playing with the release notes script (for various reasons I won’t bore you with.) In any case, I’ve started posting the results of these runs on one of my github repos if anyone was wanting a quick reference as to JIRA’s opinion on the matter: https://github.com/aw-altiscale/hadoop-release-metadata/tree/master/3.0.0
[jira] [Resolved] (HADOOP-11672) test
[ https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved HADOOP-11672. --- Resolution: Not a Problem test Key: HADOOP-11672 URL: https://issues.apache.org/jira/browse/HADOOP-11672 Project: Hadoop Common Issue Type: New Feature Reporter: xiangqian.xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11675) tiny exception log with checking storedBlock is null or not
Liang Xie created HADOOP-11675: -- Summary: tiny exception log with checking storedBlock is null or not Key: HADOOP-11675 URL: https://issues.apache.org/jira/browse/HADOOP-11675 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Found this log at our product cluster: {code} 2015-03-05,10:33:31,778 ERROR org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction failed regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f., storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479 java.io.IOException: BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not exist or is not under Constructionnull {code} let's check storedBlock is null or not to make log pretty -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build is back to normal : Hadoop-Common-trunk #1424
See https://builds.apache.org/job/Hadoop-Common-trunk/1424/changes