[ 
https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10799:
-----------------------------------
    Description: 
When a NameNode serves iNotify requests from a client, it verifies the client 
has superuser permission and then uses the client's Kerberos principal to read 
edits from journal nodes.

However, if the client does not renew its tgt tickets, the connection from 
NameNode to journal nodes may fail. In which case, the NameNode thinks the 
edits are corrupt, and prints a scary error message:
"During automatic edit log failover, we noticed that all of the remaining edit 
log streams are shorter than the current one!  The best remaining edit log ends 
at transaction 11577603, but we thought we could read up to transaction 
11577606.  If you continue, metadata will be lost forever!"

However, the edits are actually good. NameNode _should not freak out when an 
iNotify client's tgt ticket expires_.

I think that an easy solution to this bug, is that after NameNode verifies 
client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then 
read edits. This will make sure the operation does not fail due to an expired 
client ticket.

Expert of related logs:
{noformat}
2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:h...@example.com (auth:KERBEROS) 
cause:java.io.IOException: We encountered an error reading 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
  During automatic edit log failover, we noticed that all of the remaining edit 
log streams are shorter than the current one!  The best remaining edit log ends 
at transaction 11577603, but we thought we could read up to transaction 
11577606.  If you continue, metadata will be lost forever!
2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
112 on 8020, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client 
IP:port] Call#73 Retry#0
java.io.IOException: We encountered an error reading 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
  During automatic edit log failover, we noticed that all of the remaining edit 
log streams are shorter than the current one!  The best remaining edit log ends 
at transaction 11577603, but we thought we could read up to transaction 
11577606.  If you continue, metadata will be lost forever!
        at 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
        at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
{noformat}

  was:
When a NameNode serves iNotify requests from a client, it verifies the client 
has superuser permission and then uses the client's Kerberos principal to read 
edits from journal nodes.

However, if the client does not renew its tgt tickets, the connection from 
NameNode to journal nodes may fail. In which case, the NameNode thinks the 
edits are corrupt, and prints a scary error message:
"During automatic edit log failover, we noticed that all of the remaining edit 
log streams are shorter than the current one!  The best remaining edit log ends 
at transaction 11577603, but we thought we could read up to transaction 
11577606.  If you continue, metadata will be lost forever!"

However, the edits are actually good. NameNode _should not freak out when an 
iNotify client's tgt ticket expires_.

I think that an easy solution to this bug, is that after NameNode verifies 
client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then 
read edits. This will make sure the operation does not fail due to an expired 
client ticket.

Appendix:
{noformat}
2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:h...@example.com (auth:KERBEROS) 
cause:java.io.IOException: We encountered an error reading 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
  During automatic edit log failover, we noticed that all of the remaining edit 
log streams are shorter than the current one!  The best remaining edit log ends 
at transaction 11577603, but we thought we could read up to transaction 
11577606.  If you continue, metadata will be lost forever!
2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
112 on 8020, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client 
IP:port] Call#73 Retry#0
java.io.IOException: We encountered an error reading 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
 
http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
  During automatic edit log failover, we noticed that all of the remaining edit 
log streams are shorter than the current one!  The best remaining edit log ends 
at transaction 11577603, but we thought we could read up to transaction 
11577606.  If you continue, metadata will be lost forever!
        at 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
        at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
        at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
{noformat}


> NameNode should use loginUser(hdfs) to serve iNotify requests
> -------------------------------------------------------------
>
>                 Key: HDFS-10799
>                 URL: https://issues.apache.org/jira/browse/HDFS-10799
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>         Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>
> When a NameNode serves iNotify requests from a client, it verifies the client 
> has superuser permission and then uses the client's Kerberos principal to 
> read edits from journal nodes.
> However, if the client does not renew its tgt tickets, the connection from 
> NameNode to journal nodes may fail. In which case, the NameNode thinks the 
> edits are corrupt, and prints a scary error message:
> "During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!"
> However, the edits are actually good. NameNode _should not freak out when an 
> iNotify client's tgt ticket expires_.
> I think that an easy solution to this bug, is that after NameNode verifies 
> client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then 
> read edits. This will make sure the operation does not fail due to an expired 
> client ticket.
> Expert of related logs:
> {noformat}
> 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:h...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: We encountered an error reading 
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
>  
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!
> 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 112 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client 
> IP:port] Call#73 Retry#0
> java.io.IOException: We encountered an error reading 
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
>  
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!
>         at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
>         at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
>         at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to