Cindy Li created MAPREDUCE-6175:
---
Summary: RM failover lead to failure to recognition of application
attempt
Key: MAPREDUCE-6175
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6175
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Cindy Li
Seeing this from client side (Hive job):
2014-11-25 10:00:50,179 Stage-3 map = 100%, reduce = 99%, Cumulative CPU
136560.72 sec
2014-11-25 10:00:54,776 Stage-3 map = 100%, reduce = 0%, Cumulative CPU
42627.6 sec
2014-11-25 10:01:20,097 Stage-3 map = 0%, reduce = 0%
2014-11-25 10:02:20,348 Stage-3 map = 0%, reduce = 0%
2014-11-25 10:02:30,702 Stage-3 map = 1%, reduce = 0%, Cumulative CPU 1511.98
sec
Seeing this resource manager (rm2):
[14:16]:[hadoop@phxaishdc20en0008-be:logs]# grep
container_1416845430616_0009_01_01 *
hadoop-hadoop-resourcemanager.log.2014-11-25-10:2014-11-25 10:01:09,757 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: received
container statuses on node manager register :[ContainerStatus: [ContainerId:
container_1416845430616_0009_01_01, State: COMPLETE, Diagnostics: ,
ExitStatus: 0, ], ContainerStatus: [ContainerId:
container_1416845430616_0014_01_000241, State: COMPLETE, Diagnostics: Container
Killed by ResourceManager
Seeing this in container log (Logs for container_1416845430616_0009_01_01)
2014-11-25 10:59:17,839 INFO [IPC Server handler 2 on 36552]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 10:59:47,905 INFO [IPC Server handler 26 on 36552]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 11:00:01,672 INFO [RMCommunicator Allocator]
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over
to rm2
2014-11-25 11:00:17,977 INFO [IPC Server handler 5 on 36552]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 11:00:36,882 INFO [RMCommunicator Allocator]
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over
to rm1
2014-11-25 11:00:48,044 INFO [IPC Server handler 2 on 36552]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1416845430616_0009_r_00_0 is : 0.9893525
2014-11-25 11:00:52,901 INFO [RMCommunicator Allocator]
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over
to rm2
2014-11-25 11:00:52,912 ERROR [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error communicating
with RM: Resource Manager doesn't recognize AttemptId:
application_1416845430616_0009
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Resource Manager
doesn't recognize AttemptId: application_1416845430616_0009
at
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:580)
at
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:220)
at
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:268)
at java.lang.Thread.run(Thread.java:745)
2014-11-25 11:00:52,915 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1416845430616_0009Job
Transitioned from RUNNING to REBOOT
2014-11-25 11:00:52,916 INFO [Thread-3928]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator
isAMLastRetry: false
2014-11-25 11:00:52,916 INFO [Thread-3928]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator
notified that shouldUnregistered is: false
2014-11-25 11:00:52,916 INFO [Thread-3928]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: false
2014-11-25 11:00:52,916 INFO [Thread-3928]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
JobHistoryEventHandler notified that forceJobCompletion is false
2014-11-25 11:00:52,916 INFO [Thread-3928]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the
services
2014-11-25 11:00:52,917 INFO [Thread-3928]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping
JobHistoryEventHandler. Size of the outstanding queue size is 0
2014-11-25 11:00:52,946 INFO [Thread-3928]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped
JobHistoryEventHandler. super.stop()
2014-11-25 11:00:52,947 INFO [Thread-3928]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING
attempt_1416845430616_0009_r_00_0
2014-11-25 11:00:52,955 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1416845430616_0009_r_00_0 TaskAttempt Transitioned from RUNNING to
KILLED
2014-11-25