[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy closed MAPREDUCE-3932. ------------------------------------ > MR tasks failing and crashing the AM when available-resources/headRoom > becomes zero > ----------------------------------------------------------------------------------- > > Key: MAPREDUCE-3932 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, mrv2 > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Robert Joseph Evans > Priority: Critical > Fix For: 0.23.3, 2.0.2-alpha > > Attachments: MR-3932.txt, MR-3932.txt > > > [~karams] reported this offline. One reduce task gets preempted because of > zero headRoom and crashes the AM. > {code} > 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 > AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 > containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 > availableResources(headroom):memory: 44544 > 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 > AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 > containersAllocated:4 containersReleased:0 hostLocalAssigned:0 > rackLocalAssigned:4 availableResources(headroom):memory: 44544 > 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: > PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 > AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 > containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 > availableResources(headroom):memory: 0 > 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: > PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 > AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 > containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 > availableResources(headroom):memory: 0 > 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated > containers 3 > 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned > container container_1329995034628_0983_01_000006 to > attempt_1329995034628_0983_r_000000_0 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned > container container_1329995034628_0983_01_000007 to > attempt_1329995034628_0983_r_000001_0 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned > container container_1329995034628_0983_01_000008 to > attempt_1329995034628_0983_r_000002_0 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: > PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 > AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 > containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 > availableResources(headroom):memory: 0 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all > scheduled reduces:20 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1329995034628_0983_r_000002_0 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting > attempt_1329995034628_0983_r_000001_0 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating > schedule... > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 > finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608 > 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0 > 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6 > 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from > UNASSIGNED to ASSIGNED > 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1 > 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from > UNASSIGNED to ASSIGNED > 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9 > 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from > UNASSIGNED to ASSIGNED > 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED > to KILL_CONTAINER_CLEANUP > 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED > to KILL_CONTAINER_CLEANUP > 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing > the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt > attempt_1329995034628_0983_r_000000_0 > 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing > the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt > attempt_1329995034628_0983_r_000002_0 > 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching > attempt_1329995034628_0983_r_000000_0 > 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching > attempt_1329995034628_0983_r_000002_0 > 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing > the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt > attempt_1329995034628_0983_r_000002_0 > 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing > the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt > attempt_1329995034628_0983_r_000001_0 > 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing > the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt > attempt_1329995034628_0983_r_000001_0 > 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching > attempt_1329995034628_0983_r_000001_0 > 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container > was killed before it was launched > 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle > port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : > 53990 > 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from > KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP > 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1329995034628_0983_r_000001_0: Container was killed > before it was launched > 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1329995034628_0983_r_000001_0 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82) > at java.lang.Thread.run(Thread.java:619) > 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: > [attempt_1329995034628_0983_r_000000_0] using containerId: > [container_1329995034628_0983_01_000006 on NM: [$host6:51529] > 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED > to RUNNING > 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle > port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : > 47960 > 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING > attempt_1329995034628_0983_r_000002_0 > 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1329995034628_0983Job Transitioned from RUNNING to ERROR > 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event > at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > JOB_COUNTER_UPDATE at ERROR > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82) > at java.lang.Thread.run(Thread.java:619) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira