[ https://issues.apache.org/jira/browse/HADOOP-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837827#comment-15837827 ]
Hadoop QA commented on HADOOP-13433: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 35s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 35s{color} | {color:green} branch-2 passed with JDK v1.8.0_121 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-2 passed with JDK v1.8.0_121 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 42s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 28s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 28s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 94 unchanged - 2 fixed = 95 total (was 96) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 29s{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_121 Failed junit tests | hadoop.ipc.TestRPCWaitForProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HADOOP-13433 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12849290/HADOOP-13433-branch-2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 85ea8d7b3fa9 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 1af0a4d | | Default Java | 1.7.0_121 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/11506/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | JDK v1.7.0_121 Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11506/testReport/ | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11506/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Race in UGI.reloginFromKeytab > ----------------------------- > > Key: HADOOP-13433 > URL: https://issues.apache.org/jira/browse/HADOOP-13433 > Project: Hadoop Common > Issue Type: Bug > Components: security > Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1 > Reporter: Duo Zhang > Assignee: Duo Zhang > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6 > > Attachments: HADOOP-13433-branch-2.patch, HADOOP-13433.patch, > HADOOP-13433-v1.patch, HADOOP-13433-v2.patch, HADOOP-13433-v4.patch, > HADOOP-13433-v5.patch, HADOOP-13433-v6.patch, HBASE-13433-testcase-v3.patch > > > This is a problem that has troubled us for several years. For our HBase > cluster, sometimes the RS will be stuck due to > {noformat} > 2016-06-20,03:44:12,936 INFO org.apache.hadoop.ipc.SecureClient: Exception > encountered while connecting to the server : > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: The ticket > isn't for us (35) - BAD TGS SERVER NAME)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:140) > at > org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupSaslConnection(SecureClient.java:187) > at > org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.access$700(SecureClient.java:95) > at > org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:325) > at > org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37) > at org.apache.hadoop.hbase.security.User.call(User.java:607) > at org.apache.hadoop.hbase.security.User.access$700(User.java:51) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:461) > at > org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:321) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1164) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1004) > at > org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:107) > at $Proxy24.replicateLogEntries(Unknown Source) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:962) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.runLoop(ReplicationSource.java:466) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:515) > Caused by: GSSException: No valid credentials provided (Mechanism level: The > ticket isn't for us (35) - BAD TGS SERVER NAME) > at > sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:180) > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) > ... 23 more > Caused by: KrbException: The ticket isn't for us (35) - BAD TGS SERVER NAME > at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64) > at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185) > at > sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294) > at > sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106) > at > sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557) > at > sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594) > ... 26 more > Caused by: KrbException: Identifier doesn't match expected value (906) > at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133) > at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58) > at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53) > at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46) > ... 31 more > {noformat} > It rarely happens, but if it happens, the regionserver will be stuck and can > never recover. > Recently we added a log after a successful re-login which prints the private > credentials, and finally catched the direct reason. After a successful > re-login, we have two kerberos tickets in the credentials, one is the TGT, > and the other is a service ticket. The strange thing is that, the service > ticket is placed before TGT. This breaks the assumption of jdk's kerberos > library. See > http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5InitCredential.java, > the {{getTgt}} Method > {code:title=Krb5InitCredential} > return AccessController.doPrivileged( > new PrivilegedExceptionAction<KerberosTicket>() { > public KerberosTicket run() throws Exception { > // It's OK to use null as serverPrincipal. TGT is almost > // the first ticket for a principal and we use list. > return Krb5Util.getTicket( > realCaller, > clientPrincipal, null, acc); > }}); > {code} > So here, the library will use the service ticket as TGT to acquire a service > ticket, and KDC will reject the request since the 'TGT' does not start with > 'krbtgt'. And it can never recover because in UGI, the re-login will check if > there is a valid TGT first and no doubt, we have one... > This usually happens when a secure connection initialization comes along with > the re-login, and the end time indicates that the service ticket is acquired > by the previous TGT. Since UGI does not prevent doAs and re-login happen at > the same time, we believe that there is a race condition. > After reading the code, we found a possible race condition. > See > http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5Context.java, > the {{initSecContext}} method, we will get TGT first, then check if there is > already a service ticket, if not, acquire a service ticket using the TGT, and > put it into the credentials. > And in Krb5LoginModule.logout(the sun version), we will remove the kerberos > tickets from the credentials first, and then destroy them. > Here comes the race condition. Let T1 be the secure connection set up thread, > T2 be the re-login thread. > T1: get TGT > T2: remove all tickets from credentials > T1: check service ticket, none(since all tickets have been removed) > T1: acquire a new service ticket using TGT and put it into the credentials > T2: destroy all tickets > T2: login, i.e., put a new TGT into the credentials. > It is hard to write a UT to produce the problem because the racing code is in > jdk, which is not written by us... > Suggestions are welcomed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org