Nicolas Fraison created HADOOP-14252: ----------------------------------------
Summary: Nodemanagers have DDoS our namenode due to HDFS_DELEGATION_TOKEN expired or not in the cache Key: HADOOP-14252 URL: https://issues.apache.org/jira/browse/HADOOP-14252 Project: Hadoop Common Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Environment: Releases: cloudera release cdh-5.5.0 openjdk version "1.8.0_91" linux centos6 servers Cluster info: Namenode and resourcemanager in HA with kerberos authentication More than 1300 datanodes/nodemanagers Reporter: Nicolas Fraison Priority: Minor We have faced some huge slowdowns on our namenode due to all our nodemanagers continuing to retry to renew a lease and reconnecting to the namenode every second during 1 hour due to some HDFS_DELEGATION_TOKEN being expired or not in the cache. The number of time_wait connection on our namenode was stuck to the maximum configured of 250k during this period due to the reconnections each time. {code} 2017-03-02 11:51:42,817 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1488396860014_156103_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB 2017-03-02 11:51:43,414 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1488396860014_156120_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB 2017-03-02 11:51:51,994 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired 2017-03-02 11:51:51,995 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired 2017-03-02 11:51:51,995 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired 2017-03-02 11:51:51,995 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_1560141256_4187204] for 30 seconds. Will retry shortly ... token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy20.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy21.renewLease(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921) at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304) at java.lang.Thread.run(Thread.java:745) 2017-03-02 12:51:22,032 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache 2017-03-02 12:51:22,032 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache 2017-03-02 12:51:22,033 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache 2017-03-02 12:51:22,033 WARN org.apache.hadoop.hdfs.DFSClient: Failed to renew lease for DFSClient_NONMAPREDUCE_1560141256_4187204 for 3600 seconds (>= hard-limit =3600 seconds.) Closing all files being written ... token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy20.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy21.renewLease(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921) at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304) at java.lang.Thread.run(Thread.java:745) 2017-03-02 12:51:27,364 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. {code} The root cause is the yarn proxy configuration having been removed, which in turn causes the resource manager to be unable to renew the HDFS_DELEGATION_TOKEN. Even though the root cause has been identified, I don't think retrying to renew a lease every second for an hour when there is an expiry/not found token issue is normal because this is not an issue that can be recovered. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org