[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449857#comment-15449857 ]
Wei-Chiu Chuang commented on HDFS-10799: ---------------------------------------- Correct me if I am wrong, but wouldn't ugi.doAs() in Server$Handler.run at NameNode throw an PriviledgedActionException if the client expires? > NameNode should use loginUser(hdfs) to serve iNotify requests > ------------------------------------------------------------- > > Key: HDFS-10799 > URL: https://issues.apache.org/jira/browse/HDFS-10799 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0 > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Attachments: HDFS-10799.001.patch > > > When a NameNode serves iNotify requests from a client, it verifies the client > has superuser permission and then uses the client's Kerberos principal to > read edits from journal nodes. > However, if the client does not renew its tgt tickets, the connection from > NameNode to journal nodes may fail. In which case, the NameNode thinks the > edits are corrupt, and prints a scary error message: > "During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever!" > However, the edits are actually good. NameNode _should not freak out when an > iNotify client's tgt ticket expires_. > I think that an easy solution to this bug, is that after NameNode verifies > client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then > read edits. This will make sure the operation does not fail due to an expired > client ticket. > Excerpt of related logs: > {noformat} > 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:h...@example.com (auth:KERBEROS) > cause:java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 112 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client > IP:port] Call#73 Retry#0 > java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org