[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948179#comment-14948179
 ] 

Anubhav Dhoot commented on MAPREDUCE-6302:
------------------------------------------

Looks like the whitespace error is genuine while the checkstyle and release 
audit can be ignored.
bq. MR_JOB_REDUCER_PREEMPT_DELAY_SEC delays the preemption; a positive value 
leads to waiting until it is done. The config we are adding here is more a 
timeout: if we don't get resources by this time, we preempt.
Aren't both values waiting to get resources by the configured time and will not 
do any preemption if it gets resources by then? Even 
MR_JOB_REDUCER_PREEMPT_DELAY_SEC will not preempt if the resources were 
obtained before the timeout. I am concerned we are introducing an inconsistency 
in this patch that will burden administrators. It would be good to at least 
update the doc comments in yarn-default to indicate the effect of negative 
values and zero for both configs.

> Incorrect headroom can lead to a deadlock between map and reduce allocations 
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6302
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: mai shurong
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: AM_log_head100000.txt.gz, AM_log_tail100000.txt.gz, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, 
> mr-6302-5.patch, mr-6302-6.patch, mr-6302-prelim.patch, 
> queue_with_max163cores.png, queue_with_max263cores.png, 
> queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to