[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948179#comment-14948179 ]
Anubhav Dhoot commented on MAPREDUCE-6302: ------------------------------------------ Looks like the whitespace error is genuine while the checkstyle and release audit can be ignored. bq. MR_JOB_REDUCER_PREEMPT_DELAY_SEC delays the preemption; a positive value leads to waiting until it is done. The config we are adding here is more a timeout: if we don't get resources by this time, we preempt. Aren't both values waiting to get resources by the configured time and will not do any preemption if it gets resources by then? Even MR_JOB_REDUCER_PREEMPT_DELAY_SEC will not preempt if the resources were obtained before the timeout. I am concerned we are introducing an inconsistency in this patch that will burden administrators. It would be good to at least update the doc comments in yarn-default to indicate the effect of negative values and zero for both configs. > Incorrect headroom can lead to a deadlock between map and reduce allocations > ----------------------------------------------------------------------------- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: mai shurong > Assignee: Karthik Kambatla > Priority: Critical > Attachments: AM_log_head100000.txt.gz, AM_log_tail100000.txt.gz, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-5.patch, mr-6302-6.patch, mr-6302-prelim.patch, > queue_with_max163cores.png, queue_with_max263cores.png, > queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)