[ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939976#comment-14939976 ]
Jason Lowe commented on MAPREDUCE-6302: --------------------------------------- I think it's reasonable. There were a number of separate bugs in this area because it was complicated, would be nice to see it simplified and easier to understand. Do we really want to avoid any kind of preemption if there's a map running? Thinking of a case where a node failure causes 20 maps to line up for scheduling due to fetch failures and we only have one running. Do we really want to feed those 20 maps through the one map hole? Hope they don't run very long. ;-) I haven't studied what the original code did in this case, but I noticed it did not early-out if maps were running, hence the question. I think the preemption logic could benefit from knowing whether reducers have reported whether they're past the SHUFFLE phase and exempt them from preemption. Seems we would want to preempt as many reducers in the SHUFFLE phase as necessary to run most or all pending maps in parallel if possible to minimize job latency for most cases. Other minor comments on the patch: - docs for mapreduce.job.reducer.unconditional-preempt.delay.sec should be clear on how to disable the functionality if desired, since setting it to zero does some pretty bad things. - preemtping s/b preempting > Incorrect headroom can lead to a deadlock between map and reduce allocations > ----------------------------------------------------------------------------- > > Key: MAPREDUCE-6302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: mai shurong > Assignee: Karthik Kambatla > Priority: Critical > Attachments: AM_log_head100000.txt.gz, AM_log_tail100000.txt.gz, > log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, > mr-6302-prelim.patch, queue_with_max163cores.png, queue_with_max263cores.png, > queue_with_max333cores.png > > > I submit a big job, which has 500 maps and 350 reduce, to a > queue(fairscheduler) with 300 max cores. When the big mapreduce job is > running 100% maps, the 300 reduces have occupied 300 max cores in the queue. > And then, a map fails and retry, waiting for a core, while the 300 reduces > are waiting for failed map to finish. So a deadlock occur. As a result, the > job is blocked, and the later job in the queue cannot run because no > available cores in the queue. > I think there is the similar issue for memory of a queue . -- This message was sent by Atlassian JIRA (v6.3.4#6332)