[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359236#comment-16359236
 ] 

Xiao Chen commented on HDFS-13040:
----------------------------------

Hey [~daryn],

Wei-Chiu is out and I worked with Istvan on this one. (Thanks again for having 
a reproducing env [~pifta]!)

The stacktrace is below:
{noformat}
2018-02-09 17:59:55,779 ERROR 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: caught exception 
initializing 
http://IP:8480/getJournal?jid=nameservice1&segmentTxId=230876&storageInfo=-60%3A1546104427%3A0%3Acluster2
java.io.IOException: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:473)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:465)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1944)
        at 
org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
        at 
org.apache.hadoop.security.SecurityUtil.doAsCurrentUser(SecurityUtil.java:471)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog.getInputStream(EditLogFileInputStream.java:464)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.init(EditLogFileInputStream.java:141)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:192)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:250)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151)
        at 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151)
        at 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1716)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1778)
        at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1944)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214)
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:338)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:206)
        at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
        at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:161)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:470)
        ... 30 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed 
to find any Kerberos tgt)
        at 
sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at 
sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
        at 
sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at 
sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:317)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:288)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:288)
        ... 34 more
{noformat}

Fun fact from git history looking, seems like inotify (or at least the 
{{DFSInotifyEventInputStream}} class) only has 2 real historical changes - 
HDFS-6634 (the initial implementation), and HDFS-7446 (improvement, unrelated 
to security). HDFS-6634 doesn't have a clear security design, and the most 
'secure' comments I can search out is by Hadoop QA. :)

> Kerberized inotify client fails despite kinit properly
> ------------------------------------------------------
>
>                 Key: HDFS-13040
>                 URL: https://issues.apache.org/jira/browse/HDFS-13040
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>         Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-13040.001.patch, 
> TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example....@example.com
>  namenode 2 uses server principal hdfs/nn2.example....@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1&segmentTxId=8662&storageInfo=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1&segmentTxId=8662&storageInfo=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
>         at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
>         at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
>         at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> Typically if NameNode has an expired Kerberos ticket, the error handling for 
> the typical edit log tailing would let NameNode to relogin with its own 
> Kerberos principal. However, when inotify uses the same code path to retrieve 
> edits, since the current user is the inotify client's principal, unless 
> client uses the same principal as the NameNode, NameNode can't do it on 
> behalf of the client.
> Therefore, a more appropriate approach is to use proxy user so that NameNode 
> can retrieving edits on behalf of the client.
> I will attach a patch to fix it. This patch has been verified to work for a 
> CDH5.10.2 cluster, however it seems impossible to craft a unit test for this 
> fix because the way Hadoop UGI handles Kerberos credentials (I can't have a 
> single process that logins as two Kerberos principals simultaneously and let 
> them establish connection)
> A possible workaround is for the inotify client to use the active NameNode's 
> server principal. However, that's not going to work when there's a namenode 
> failover, because then the client's principal will not be consistent with the 
> active NN's one, and then fails to authenticate.
> Credit: this bug was confirmed and reproduced by [~pifta] and [~r1pp3rj4ck]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to