[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973318#comment-13973318
 ] 

Jason Lowe commented on MAPREDUCE-5844:
---------------------------------------

Moved this to MAPREDUCE since the decision to preempt reducers for mappers is 
ultimately an MR AM decision and not a YARN decision.

A headroom of zero should mean there is literally no more room in the queue, 
and I would expect the job would need to take action in those cases to make 
progress in light of fetch failures.  (e.g.: think of a scenario where all the 
other jobs taking up resources are long-running and won't release resources 
anytime soon)

If you are seeing cases where reducers are shot then immediately relaunched 
along with the failed maps then that implies that either the headroom 
calculation is wrong or resources happened to be freed right at the time the 
new containers were requested.  Note that there are a number of issues with 
headroom calculations, see YARN-1198 and related JIRAs.

Assuming those are fixed, there might be some usefulness to a grace period 
where we wait for other apps to free up resources in the queue to avoid 
shooting reducers.  A proper value for that probably depends upon how much work 
would be lost by the reducers in question, how long we can tolerate waiting to 
try to preserve that work, and how likely it is that another app will free up 
resources anytime soon.  If we wait and still don't get our resources then 
that's purely worse than a job that took decisive action as soon as a map 
retroactively failed and there's no more space left in the queue.   Also if the 
headroom is zero because a single job has hit user limits within the queue then 
waiting serves no purpose -- it has to shoot a reducer in that case to make 
progress.  In that latter case we'd need additional information in the allocate 
response from the scheduler to know that waiting for resources to be released 
from other applications in the queue isn't going to work.

It would be good to verify from the RM logs what is happening in your case.  If 
the headroom calculation is wrong then we should fix that, otherwise if 
resources are churning quickly then a grace period before preempting reducers 
may make sense.

> Reducer Preemption is too aggressive
> ------------------------------------
>
>                 Key: MAPREDUCE-5844
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>
> We observed cases where the reducer preemption makes the job finish much 
> later, and the preemption does not seem to be necessary since after 
> preemption both the preempted reducer and the mapper are assigned 
> immediately--meaning that there was already enough space for the mapper.
> The logic for triggering preemption is at 
> RMContainerAllocator::preemptReducesIfNeeded
> The preemption is triggered if the following is true:
> {code}
> headroom +  am * |m| + pr * |r| < mapResourceRequest
> {code} 
> where am: number of assigned mappers, |m| is mapper size, pr is number of 
> reducers being preempted, and |r| is the reducer size.
> The original idea apparently was that if headroom is not big enough for the 
> new mapper requests, reducers should be preempted. This would work if the job 
> is alone in the cluster. Once we have queues, the headroom calculation 
> becomes more complicated and it would require a separate headroom calculation 
> per queue/job.
> So, as a result headroom variable is kind of given up currently: *headroom is 
> always set to 0* What this implies to the speculation is that speculation 
> becomes very aggressive, not considering whether there is enough space for 
> the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to