Lohit Vijayarenu created MAPREDUCE-5689:
-------------------------------------------

             Summary: MRAppMaster does not preempt reducer when scheduled Maps 
cannot be full filled
                 Key: MAPREDUCE-5689
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5689
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.2.0, 3.0.0
            Reporter: Lohit Vijayarenu


We saw corner case where Jobs running on cluster were hung. Scenario was 
something like this. Job was running within a pool which was running at its 
capacity. All available containers were occupied by reducers and last 2 
mappers. There were few more reducers waiting to be scheduled in pipeline. 
At this point two mappers which were running failed and went back to scheduled 
state. two available containers were assigned to reducers, now whole pool was 
full of reducers waiting on two maps to be complete. 2 maps never got scheduled 
because pool was full.

Ideally reducer preemption should have kicked in to make room for Mappers from 
this code in RMContaienrAllocator
{code}
int completedMaps = getJob().getCompletedMaps();
    int completedTasks = completedMaps + getJob().getCompletedReduces();
    if (lastCompletedTasks != completedTasks) {
      lastCompletedTasks = completedTasks;
      recalculateReduceSchedule = true;
    }

    if (recalculateReduceSchedule) {
      preemptReducesIfNeeded();
{code}

But in this scenario lastCompletedTasks is always completedTasks because maps 
were never completed. This would cause job to hang forever. As workaround if we 
kill few reducers, mappers would get scheduled and caused job to complete.





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to