[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105664#comment-14105664
 ] 

Maysam Yabandeh commented on MAPREDUCE-6043:
--------------------------------------------

The value of this variables in the first case were this:
{code}
headroom = 0; 
pr = 0; //reducer preemption was never called at this app
{code}
so the triggering condition for reducer preemption is summarized to:
{code}
am * |m| < |m|
{code}
or 
{code}
am < 1
{code}
In this erroneous case we had two assigned mappers that were not successfully 
removed from the list and hence prevent preemption from kicking in. Those 
mappers were finished but the MRAppMaster did not hear anything about them 
afterwards so called them successful after one minute timeout:
{code}
2014-08-20 04:25:21,665 INFO [Ping Checker] 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:attempt_xxx_yyy_m_000288_0 Timed out after 60 secs
2014-08-20 04:25:21,665 WARN [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Task attempt 
attempt_xxx_yyy_m_000288_0 is done fromTaskUmbilicalProtocol's point of view. 
However, it stays in finishing state for too long
2014-08-20 04:25:21,665 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_xxx_yyy_m_000288_0 TaskAttempt Transitioned from FINISHING_CONTAINER to 
SUCCESS_CONTAINER_CLEANUP
{code}

> Reducer-preemption does not kick in
> -----------------------------------
>
>                 Key: MAPREDUCE-6043
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>
> We have seen various cases that reducer-preemption does not kick in and the 
> scheduled mappers wait behind running reducers forever. Each time there seems 
> to be a different scenario. So far we have tracked down two of such cases and 
> the common element between them is that the variables in RMContainerAllocator 
> go out of sync since they only get updated when completed container is 
> reported by RM. However there are many corner cases that such report is not 
> received from RM and yet the MapReduce app moves forward. Perhaps one 
> possible fix would be to update such variables also after exceptional cases.
> The logic for triggering preemption is at 
> RMContainerAllocator::preemptReducesIfNeeded
> The preemption is triggered if the following is true:
> {code}
> headroom +  am * |m| + pr * |r| < mapResourceRequest
> {code} 
> where am: number of assigned mappers, |m| is mapper size, pr is number of 
> reducers being preempted, and |r| is the reducer size. Each of these 
> variables going out of sync will cause the preemption not to kick in. In the 
> following comment, we explain two of such cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to