HA+security: failed to run a mapred job from yarn after a manual failover -------------------------------------------------------------------------
Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Priority: Critical Fix For: 0.24.0, 0.23.3 Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. Checking the job delegation token, it's still indicate the original active namenode. It causes nm failed to obtain a dt for the new nn. (?) {code} $ hdfs dfs -cat hdfs://ns1:8020/tmp/hadoop-yarn/staging/yarn/.staging/job_1331619043691_0001/appTokens HDTS ha-hdfs:ns1@(yarn/nn1.hadoop.local@HADOOP.LOCALDOMAINyarn�6 �L��6.�ЛFs��r�%�B�'��{pR�HDFS_DELEGATION_TOKEN ha-hdfs:ns {code} Exceptions: {code} 12/03/13 06:19:44 INFO mapred.ResourceMgrDelegate: Submitted application application_1331619043691_0002 to ResourceManager at nn1.hadoop.local/10.177.23.38:7090 12/03/13 06:19:45 INFO mapreduce.Job: The url to track the job: http://nn1.hadoop.local:7050/proxy/application_1331619043691_0002/ 12/03/13 06:19:45 INFO mapreduce.Job: Running job: job_1331619043691_0002 12/03/13 06:19:47 INFO mapreduce.Job: Job job_1331619043691_0002 running in uber mode : false 12/03/13 06:19:47 INFO mapreduce.Job: map 0% reduce 0% 12/03/13 06:19:47 INFO mapreduce.Job: Job job_1331619043691_0002 failed with state FAILED due to: Application application_1331619043691_0002 failed 1 times due to AM Container for appattempt_1331619043691_0002_000001 exited with exitCode: -1000 due to: RemoteTrace: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 40 for yarn) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1159) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:188) at $Proxy28.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:622) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy29.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1260) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:718) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: token (HDFS_DELEGATION_TOKEN token 40 for yarn) can't be found in cache at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654) .Failing this attempt.. Failing the application. 12/03/13 06:19:47 INFO mapreduce.Job: Counters: 0 Job ended: Tue Mar 13 06:19:47 UTC 2012 The job took 3 seconds. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira