[ https://issues.apache.org/jira/browse/AMBARI-18096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415484#comment-15415484 ]
Hudson commented on AMBARI-18096: --------------------------------- FAILURE: Integrated in Ambari-trunk-Commit #5498 (See [https://builds.apache.org/job/Ambari-trunk-Commit/5498/]) AMBARI-18096. YARN config to fetch new HDFS delegation tokens is not (smohanty: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=84609068e03bf0d9a97be15dcc192d734970e512]) * ambari-server/src/main/resources/stacks/HDP/2.3/services/stack_advisor.py > YARN config to fetch new HDFS delegation tokens is not enabled > -------------------------------------------------------------- > > Key: AMBARI-18096 > URL: https://issues.apache.org/jira/browse/AMBARI-18096 > Project: Ambari > Issue Type: Bug > Components: stacks > Affects Versions: 2.4.0 > Reporter: Sumit Mohanty > Assignee: Sumit Mohanty > Priority: Critical > Fix For: 2.4.0 > > Attachments: AMBARI-18096.patch > > > Scenario: > * set dfs.namenode.delegation.token.max-lifetime=43200000 and > dfs.namenode.delegation.token.renew-interval=28800000 > * Start 2 Spark long running Streaming applications (Yarn-client mode : > 1470217907078_0001 , Yarn-cluster mode : 1470217907078_0002) > * Let these application run for ~2 days > * When application is running, we are injecting RM failover and NN failover > randomly. > * Kill the application > * try to get application logs for above long running apps. > Noticing below error message where it complaints that log aggregation service > failed to init. Thus, app logs could not be gathered. > {code:title=yarn-yarn-nodemanager-u14-spark-lr1-2} > 2016-08-03 23:01:05,568 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(338)) - Failed to setup application log > directory for application_1470217907078_0002 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 4 for hrt_qa) can't be found in cache > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552) > at org.apache.hadoop.ipc.Client.call(Client.java:1496) > at org.apache.hadoop.ipc.Client.call(Client.java:1396) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at com.sun.proxy.$Proxy86.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:816) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) > at com.sun.proxy.$Proxy87.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2158) > at > org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1423) > at > org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:286) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:314) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:405) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:358) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2016-08-03 23:01:05,573 WARN logaggregation.LogAggregationService > (LogAggregationService.java:initApp(363)) - Application failed to init > aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 4 for hrt_qa) can't be found in cache > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)