[ https://issues.apache.org/jira/browse/MAPREDUCE-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jian He resolved MAPREDUCE-5471. -------------------------------- Resolution: Duplicate Closed as a duplicate of YARN-540 > Succeed job tries to restart after RMrestart > -------------------------------------------- > > Key: MAPREDUCE-5471 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5471 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Yesha Vora > Assignee: Jian He > Priority: Blocker > Attachments: MR5471-1AM.log, MR5471-2AM.log > > > Run a job , restart RM when job just finished. It should not restart the job > once it Succeed. > After RM restart, The AM of restarted job fails with below error. > AM log after Rmrestart: > 013-08-19 17:29:21,144 INFO [main] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping > JobHistoryEventHandler. Size of the outstanding queue size is 0 > 2013-08-19 17:29:21,145 INFO [main] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped > JobHistoryEventHandler. super.stop() > 2013-08-19 17:29:21,146 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory > hdfs://host1:port1/user/ABC/.staging/job_1376933101704_0001 > 2013-08-19 17:29:21,156 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File does not exist: > hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1469) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1324) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1291) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:922) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1184) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:995) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323) > Caused by: java.io.FileNotFoundException: File does not exist: > hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1121) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1113) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1113) > at > org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1464) > ... 17 more > 2013-08-19 17:29:21,158 INFO [Thread-2] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a > signal. Signaling RMCommunicator and JobHistoryEventHandler. > 2013-08-19 17:29:21,159 WARN [Thread-2] > org.apache.hadoop.util.ShutdownHookManager: ShutdownHook > 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:805) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1344) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira