[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961773#comment-14961773 ] Varun Saxena commented on MAPREDUCE-6513: - Filed MAPREDUCE-6514 > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6514) Update ask to indicate to RM that it need not allocate for ramped down reducers
Varun Saxena created MAPREDUCE-6514: --- Summary: Update ask to indicate to RM that it need not allocate for ramped down reducers Key: MAPREDUCE-6514 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.7.1 Reporter: Varun Saxena Assignee: Varun Saxena In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled reduces map and put these reducers to pending. This is not updated in ask. So RM keeps on assigning and AM is not able to assign as no reducer is scheduled(check logs below the code). If this is updated immediately, RM will be able to schedule mappers immediately which anyways is the intention when we ramp down reducers. {code} LOG.info("Ramping down all scheduled reduces:" + scheduledRequests.reduces.size()); for (ContainerRequest req : scheduledRequests.reduces.values()) { pendingReduces.add(req); } scheduledRequests.reduces.clear(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961769#comment-14961769 ] Sunil G commented on MAPREDUCE-6513: OK, I also think so . [~rohithsharma] how do you feel? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961767#comment-14961767 ] Varun Saxena commented on MAPREDUCE-6513: - Yes we see rejections in our case too. I am fine with tracking it separately. Will file a JIRA. Can discuss further there. > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961765#comment-14961765 ] Sunil G commented on MAPREDUCE-6513: Hi [~varun_saxena] I feel point 1 can be tracked separately as it may come up with more complexity. I can give an example. Initially AM has placed 10 requests for reducer at timeframe1. Assume in next heartbeat from AM, we are trying to reset this count to 5 because of these new issues what we found. However RM could have already allocated some containers for that already placed request in previous requests. So for the new heartbeat from AM, we will have a updated ask request for 5 reducer at timeframe1, and in the response we may have some newly allocated containers from RM for the previous requests placed. So AM has to reject or update with a new count in next heartbeat and it may go on. But AM will reject the allocated reducer container, however lot of rejection may occur in these corner cases. So we may need to be careful here. > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961762#comment-14961762 ] Varun Saxena commented on MAPREDUCE-6513: - Yes Sunil we need to update the ask to indicate to RM that it need not allocate for these reducers. This is what I talked about in one of my comments yesterday. In short in this JIRA I intend to have a two pronged approach to resolve it. 1. Update the ask to tell RM that it need not allocate for ramped down reducers(ramped down in preemptReducesIfNeeded() method). This change we are currently testing. 2. Introduce a config or reuse MAPREDUCE-6302 config to determine hanging map requests. And do not ramp up reducers if mappers are starved. I have not looked at post MAPREDUCE-6302 codd but this is the basic idea > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961761#comment-14961761 ] Varun Saxena commented on MAPREDUCE-6513: - Yes Sunil we need to update the ask to indicate to RM that it need not allocate for these reducers. This is what I talked about in one of my comments yesterday. In short in this JIRA I intend to have a two pronged approach to resolve it. 1. Update the ask to tell RM that it need not allocate for ramped down reducers(ramped down in preemptReducesIfNeeded() method). This change we are currently testing. 2. Introduce a config or reuse MAPREDUCE-6302 config to determine hanging map requests. And do not ramp up reducers if mappers are starved. I have not looked at post MAPREDUCE-6302 codd but this is the basic idea > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961757#comment-14961757 ] Sunil G commented on MAPREDUCE-6513: Hi [~rohithsharma] Yes. {{getNumHangingRequests}} looks like a correct metric. Just to add a thought to this discussion, already placed existing reducer requests must be served by RM and AM has to reject all those requests. After this part only, newly placed map requests can be served. So as we discussed earlier, could we also spin out a discussion on resetting already placed reducer requests for a faster solution. > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961747#comment-14961747 ] Rohith Sharma K S commented on MAPREDUCE-6513: -- Right. The method {{RMContainerAllocator#getNumHangingRequests}} can be reused to get hanging mapper requests and ramp up if there is no hanging mappers. > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961380#comment-14961380 ] Karthik Kambatla commented on MAPREDUCE-6513: - bq. Maybe some config can be kept to decide how long to wait till we consider that mappers have been starved ? Thoughts ? MAPREDUCE-6302 essentially adds that. Can we re-use the same config? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961191#comment-14961191 ] Varun Saxena commented on MAPREDUCE-6513: - Yes I agree. If there are map requests hanging around for a while we should probably not ramp up the reducers. Maybe some config can be kept to decide how long to wait till we consider that mappers have been starved ? Thoughts ? One more thing which I pointed out above is that we do not update the ask when we ramp down all the reducers(in preemptReducesIfNeeded()). Not sure why we do not do so. > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961156#comment-14961156 ] Rohith Sharma K S commented on MAPREDUCE-6513: -- Oh!! Above thought solution i.e {{rampUp > 0 *&& scheduledMaps == 0*}} breaks ramping up of few reducers:-( But still I feel Ramping up of few intermediate reducers request should not be done. I am not known story behind why Ramping up has been introduced!!? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961113#comment-14961113 ] Rohith Sharma K S commented on MAPREDUCE-6513: -- [~varun_saxena] thanks for your detailed analysis. >From the logs you extracted from previous your comment I see that Ramping up >of reducers is done nevertheless of scheduledMaps is zero or greater than >zero. I think below code blindly should not ramp up the reducers {code} if (rampUp > 0) { rampUp = Math.min(rampUp, numPendingReduces); LOG.info("Ramping up " + rampUp); rampUpReduces(rampUp); } {code} I think checking for {{scheduledMaps==0}} while ramping up should avoid the issue nevertheless of mapper priority. But again questions is what if schduledMaps are failed maps attempts? To handle this better way is check for all scheduledMaps priority. If all the scheduledMaps priority is less than reducers, then ramping up can be done. {code} // if scheduledMaps is non ZERO then neverthless of mapper priority do not ramp up reducers. if (rampUp > 0 && scheduledMaps == 0) { rampUp = Math.min(rampUp, numPendingReduces); LOG.info("Ramping up " + rampUp); rampUpReduces(rampUp); } {code} Any thoughts? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom
[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961050#comment-14961050 ] Wangda Tan commented on MAPREDUCE-6302: --- +1 to backport this issue to 2.6.x and 2.7.x > Preempt reducers after a configurable timeout irrespective of headroom > -- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: mai shurong >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, > mr-6302_branch-2.patch, queue_with_max163cores.png, > queue_with_max263cores.png, queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom
[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961045#comment-14961045 ] Ruslan Dautkhanov commented on MAPREDUCE-6302: -- It would be great to have this backported to 2.6.. We saw so many times a single hive job can self-deadlock because of this problem. Cloudera Support pointed to MAPREDUCE-6302. Thanks! > Preempt reducers after a configurable timeout irrespective of headroom > -- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: mai shurong >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.8.0 > > Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, > mr-6302_branch-2.patch, queue_with_max163cores.png, > queue_with_max263cores.png, queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960900#comment-14960900 ] Karthik Kambatla commented on MAPREDUCE-6513: - Yep, looks like a bug. > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960882#comment-14960882 ] Varun Saxena commented on MAPREDUCE-6513: - cc [~jlowe], [~kasha], [~devaraj.k], your thoughts on this ? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960518#comment-14960518 ] Varun Saxena commented on MAPREDUCE-6513: - One more thing I noticed is that in RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled reduces map and put these reducers to pending. This is not updated in ask. So RM keeps on assigning and AM is not able to assign as no reducer is scheduled(check logs below the code). Although this eventually leads to these reducers not being assigned, but why we are not immediately updating the ask ? {code} LOG.info("Ramping down all scheduled reduces:" + scheduledRequests.reduces.size()); for (ContainerRequest req : scheduledRequests.reduces.values()) { pendingReduces.add(req); } scheduledRequests.reduces.clear(); {code} {noformat} 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not assigned : container_1437451211867_1485_01_000215 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign container Container: [ContainerId: container_1437451211867_1485_01_000216, NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a reduce as either container memory less than required 4096 or no pending reduce tasks - reduces.isEmpty=true 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not assigned : container_1437451211867_1485_01_000216 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign container Container: [ContainerId: container_1437451211867_1485_01_000217, NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a reduce as either container memory less than required 4096 or no pending reduce tasks - reduces.isEmpty=true {noformat} > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960489#comment-14960489 ] Rohith Sharma K S commented on MAPREDUCE-5507: -- We are hitting this issue frequently causing job to hang forever, any update on this issue? > MapReduce reducer ramp down is suboptimal with potential job-hanging issues > --- > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: MAPREDUCE-5507.20130922.1.patch > > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by releasing such a reducer / container because it will reduce its > cluster consumption and thereby may become candidate for an allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated MAPREDUCE-5507: - Priority: Critical (was: Major) > MapReduce reducer ramp down is suboptimal with potential job-hanging issues > --- > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: MAPREDUCE-5507.20130922.1.patch > > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by releasing such a reducer / container because it will reduce its > cluster consumption and thereby may become candidate for an allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated MAPREDUCE-5507: - Component/s: applicationmaster > MapReduce reducer ramp down is suboptimal with potential job-hanging issues > --- > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi >Priority: Critical > Attachments: MAPREDUCE-5507.20130922.1.patch > > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by releasing such a reducer / container because it will reduce its > cluster consumption and thereby may become candidate for an allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960484#comment-14960484 ] Varun Saxena commented on MAPREDUCE-6513: - Configuration of yarn.app.mapreduce.am.job.reduce.rampup.limit is the default value of 0.5 Because of this value, it is deemed that map has enough resources and reducers are ramped up. Should we really be ramping up if we have hanging map requests irrespective of configuration value ? > MR job got hanged forever when one NM unstable for some time > > > Key: MAPREDUCE-6513 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, resourcemanager >Affects Versions: 2.7.0 >Reporter: Bob >Assignee: Varun Saxena >Priority: Critical > > when job is in-progress which is having more tasks,one node became unstable > due to some OS issue.After the node became unstable, the map on this node > status changed to KILLED state. > Currently maps which were running on unstable node are rescheduled, and all > are in scheduled state and wait for RM assign container.Seen ask requests for > map till Node is good (all those failed), there are no ask request after > this. But AM keeps on preempting the reducers (it's recycling). > Finally reducers are waiting for complete mappers and mappers did n't get > container.. > My Question Is: > > why map requests did not sent AM ,once after node recovery.? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960478#comment-14960478 ] Varun Saxena commented on MAPREDUCE-6513: - The headroom is not very high(sometimes comes as 0 in response too) as other heavy apps are running. We notice that we always ramp up and ramping down never happens which schedules reducers too aggressively. As can be seen below, there is no ramp down(except first time - 651 ramp downs). And we always find ramp up happening. {noformat} 2015-10-13 04:36:53,038 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:42,132 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:651 2015-10-13 04:53:43,135 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:44,137 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:45,140 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:46,143 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:47,146 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:48,149 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:49,152 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:50,155 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:51,158 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:52,161 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:53,164 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:54,167 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:55,170 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:56,181 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:57,184 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:58,187 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:53:59,190 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:00,193 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:01,205 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:02,208 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:03,211 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:04,213 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:05,216 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:06,219 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:07,221 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:08,225 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0 2015-10-13 04:54:09,228 INFO [RMCommunicator Allocat
[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960468#comment-14960468 ] Varun Saxena commented on MAPREDUCE-6513: - Took logs for analysis from Bob offline. The scenario is as under : 1. All the maps have completed. {panel} 2015-10-13 04:38:42,229 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: *After Scheduling:* PendingReds:0 {color:red}ScheduledMaps:0{color} ScheduledReds:651 {color:red}AssignedMaps:0{color} AssignedReds:0 {color:red}CompletedMaps:78{color} CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 {panel} 2. One node becomes unstable and hence some of the succeeded map tasks which ran on that node are killed {noformat} 2015-10-13 04:53:41,127 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node hdszzdcxdat6g05u06p:26009. AttemptId:attempt_1437451211867_1485_m_77_0 2015-10-13 04:53:41,128 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node hdszzdcxdat6g05u06p:26009. AttemptId:attempt_1437451211867_1485_m_26_0 2015-10-13 04:53:41,128 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node hdszzdcxdat6g05u06p:26009. AttemptId:attempt_1437451211867_1485_m_07_0 2015-10-13 04:53:41,128 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node hdszzdcxdat6g05u06p:26009. AttemptId:attempt_1437451211867_1485_m_34_0 2015-10-13 04:53:41,128 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node hdszzdcxdat6g05u06p:26009. AttemptId:attempt_1437451211867_1485_m_15_0 2015-10-13 04:53:41,128 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node hdszzdcxdat6g05u06p:26009. AttemptId:attempt_1437451211867_1485_m_36_0 {noformat} 3. As can be seen below 16 maps are now scheduled {panel} 2015-10-13 04:53:42,128 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: *Before Scheduling:* PendingReds:0 {color:red}ScheduledMaps:16{color} ScheduledReds:651 {color:red}AssignedMaps:0{color} AssignedReds:0 {color:red}CompletedMaps:62{color} CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 {panel} 4. Node comes back up again after a while. 5. After this we keep on seeing that reducers keep on getting preempted, scheduled and this goes on and on in a cycle. And mappers are never assigned(due to lower priority). {noformat} 2015-10-13 04:38:40,219 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:651 AssignedMaps:2 AssignedReds:0 CompletedMaps:78 CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 2015-10-13 04:38:40,223 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:651 AssignedMaps:1 AssignedReds:0 CompletedMaps:78 CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 2015-10-13 04:38:42,229 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:651 AssignedMaps:0 AssignedReds:0 CompletedMaps:78 CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 2015-10-13 04:53:42,128 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:16 ScheduledReds:651 AssignedMaps:0 AssignedReds:0 CompletedMaps:62 CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 2015-10-13 04:53:42,132 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:651 ScheduledMaps:16 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:62 CompletedReds:0 ContAlloc:79 ContRel:1 HostLocal:64 RackLocal:14 2015-10-13 04:54:49,433 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:651 ScheduledMaps:16 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:62 CompletedReds:0 ContAlloc:84 ContRel:6 HostLocal:64 RackLocal:14 2015-10-13 04:54:50,451 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:651 ScheduledMaps:16 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps