[ 
https://issues.apache.org/jira/browse/HADOOP-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855543#comment-15855543
 ] 

Duo Zhang commented on HADOOP-13433:
------------------------------------

Any other concerns on the patches for branch-2.8 and branch-2.7? [~xiaochen] 
[~ste...@apache.org].

Thanks.

> Race in UGI.reloginFromKeytab
> -----------------------------
>
>                 Key: HADOOP-13433
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13433
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.9.0, 3.0.0-alpha3
>
>         Attachments: HADOOP-13433-branch-2.7.patch, 
> HADOOP-13433-branch-2.7-v1.patch, HADOOP-13433-branch-2.8.patch, 
> HADOOP-13433-branch-2.8.patch, HADOOP-13433-branch-2.patch, 
> HADOOP-13433.patch, HADOOP-13433-v1.patch, HADOOP-13433-v2.patch, 
> HADOOP-13433-v4.patch, HADOOP-13433-v5.patch, HADOOP-13433-v6.patch, 
> HBASE-13433-testcase-v3.patch
>
>
> This is a problem that has troubled us for several years. For our HBase 
> cluster, sometimes the RS will be stuck due to
> {noformat}
> 2016-06-20,03:44:12,936 INFO org.apache.hadoop.ipc.SecureClient: Exception 
> encountered while connecting to the server :
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: The ticket 
> isn't for us (35) - BAD TGS SERVER NAME)]
>         at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
>         at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:140)
>         at 
> org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupSaslConnection(SecureClient.java:187)
>         at 
> org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.access$700(SecureClient.java:95)
>         at 
> org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:325)
>         at 
> org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:322)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
>         at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
>         at org.apache.hadoop.hbase.security.User.call(User.java:607)
>         at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:461)
>         at 
> org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:321)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1164)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1004)
>         at 
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:107)
>         at $Proxy24.replicateLogEntries(Unknown Source)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:962)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.runLoop(ReplicationSource.java:466)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:515)
> Caused by: GSSException: No valid credentials provided (Mechanism level: The 
> ticket isn't for us (35) - BAD TGS SERVER NAME)
>         at 
> sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
>         at 
> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
>         at 
> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:180)
>         at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
>         ... 23 more
> Caused by: KrbException: The ticket isn't for us (35) - BAD TGS SERVER NAME
>         at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)
>         at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)
>         at 
> sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
>         at 
> sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
>         at 
> sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
>         at 
> sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
>         ... 26 more
> Caused by: KrbException: Identifier doesn't match expected value (906)
>         at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
>         at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
>         at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)
>         at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)
>         ... 31 more​
> {noformat}
> It rarely happens, but if it happens, the regionserver will be stuck and can 
> never recover.
> Recently we added a log after a successful re-login which prints the private 
> credentials, and finally catched the direct reason. After a successful 
> re-login, we have two kerberos tickets in the credentials, one is the TGT, 
> and the other is a service ticket. The strange thing is that, the service 
> ticket is placed before TGT. This breaks the assumption of jdk's kerberos 
> library. See 
> http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5InitCredential.java,
>  the {{getTgt}} Method
> {code:title=Krb5InitCredential}
>             return AccessController.doPrivileged(
>                 new PrivilegedExceptionAction<KerberosTicket>() {
>                 public KerberosTicket run() throws Exception {
>                     // It's OK to use null as serverPrincipal. TGT is almost
>                     // the first ticket for a principal and we use list.
>                     return Krb5Util.getTicket(
>                         realCaller,
>                         clientPrincipal, null, acc);
>                         }});
> {code}
> So here, the library will use the service ticket as TGT to acquire a service 
> ticket, and KDC will reject the request since the 'TGT' does not start with 
> 'krbtgt'. And it can never recover because in UGI, the re-login will check if 
> there is a valid TGT first and no doubt, we have one...
> This usually happens when a secure connection initialization comes along with 
> the re-login, and the end time indicates that the service ticket is acquired 
> by the previous TGT. Since UGI does not prevent doAs and re-login happen at 
> the same time, we believe that there is a race condition.
> After reading the code, we found a possible race condition.
> See 
> http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5Context.java,
>  the {{initSecContext}} method, we will get TGT first, then check if there is 
> already a service ticket, if not, acquire a service ticket using the TGT, and 
> put it into the credentials.
> And in Krb5LoginModule.logout(the sun version), we will remove the kerberos 
> tickets from the credentials first, and then destroy them.
> Here comes the race condition. Let T1 be the secure connection set up thread, 
> T2 be the re-login thread.
> T1: get TGT
> T2: remove all tickets from credentials
> T1: check service ticket, none(since all tickets have been removed)
> T1: acquire a new service ticket using TGT and put it into the credentials
> T2: destroy all tickets
> T2: login, i.e., put a new TGT into the credentials.
> It is hard to write a UT to produce the problem because the racing code is in 
> jdk, which is not written by us...
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to