[ https://issues.apache.org/jira/browse/HDFS-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018834#comment-16018834 ]
Nicolas Fraison commented on HDFS-11590: ---------------------------------------- Hi, any feedback on the attached patch? > Nodemanagers have DDoS our namenode due to HDFS_DELEGATION_TOKEN expired or > not in the cache > -------------------------------------------------------------------------------------------- > > Key: HDFS-11590 > URL: https://issues.apache.org/jira/browse/HDFS-11590 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.6.0 > Environment: Releases: > cloudera release cdh-5.5.0 > openjdk version "1.8.0_91" > linux centos6 servers > Cluster info: > Namenode and resourcemanager in HA with kerberos authentication > More than 1300 datanodes/nodemanagers > Reporter: Nicolas Fraison > Priority: Minor > Attachments: HDFS-11590.patch > > > We have faced some huge slowdowns on our namenode due to all our nodemanagers > continuing to retry to renew a lease and reconnecting to the namenode every > second during 1 hour due to some HDFS_DELEGATION_TOKEN being expired or not > in the cache. > The number of time_wait connection on our namenode was stuck to the maximum > configured of 250k during this period due to the reconnections each time. > {code} > 2017-03-02 11:51:42,817 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for appattempt_1488396860014_156103_000001 > (auth:TOKEN) for protocol=interface > org.apache.hadoop.yarn.api.ContainerManagementProtocolPB > 2017-03-02 11:51:43,414 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for appattempt_1488396860014_156120_000001 > (auth:TOKEN) for protocol=interface > org.apache.hadoop.yarn.api.ContainerManagementProtocolPB > 2017-03-02 11:51:51,994 WARN > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:prediction (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired > 2017-03-02 11:51:51,995 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired > 2017-03-02 11:51:51,995 WARN > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:prediction (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired > 2017-03-02 11:51:51,995 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to > renew lease for [DFSClient_NONMAPREDUCE_1560141256_4187204] for 30 seconds. > Will retry shortly ... > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy20.renewLease(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) > at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy21.renewLease(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921) > at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423) > at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448) > at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) > at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304) > at java.lang.Thread.run(Thread.java:745) > 2017-03-02 12:51:22,032 WARN > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:prediction (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found > in cache > 2017-03-02 12:51:22,032 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found > in cache > 2017-03-02 12:51:22,033 WARN > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:prediction (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found > in cache > 2017-03-02 12:51:22,033 WARN org.apache.hadoop.hdfs.DFSClient: Failed to > renew lease for DFSClient_NONMAPREDUCE_1560141256_4187204 for 3600 seconds > (>= hard-limit =3600 seconds.) Closing all files being written ... > token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found > in cache > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy20.renewLease(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) > at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy21.renewLease(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921) > at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423) > at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448) > at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) > at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304) > at java.lang.Thread.run(Thread.java:745) > 2017-03-02 12:51:27,364 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > rollingMonitorInterval is set as -1. The log rolling mornitoring interval is > disabled. The logs will be aggregated after this application is finished. > {code} > The root cause is the yarn proxy configuration having been removed, which in > turn causes the resource manager to be unable to renew the > HDFS_DELEGATION_TOKEN. > Even though the root cause has been identified, I don't think retrying to > renew a lease every second for an hour when there is an expiry/not found > token issue is normal because this is not an issue that can be recovered. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org