[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973430#comment-13973430 ]
Maysam Yabandeh commented on MAPREDUCE-5844: -------------------------------------------- Thanks [~jlowe] for your detailed comment. # As I explained in the description of the jira the printed headroom in the logs is always zero. e.g., {code} org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_x: ask=8 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=x {code} And this is not because there is no headroom (I know that by checking the available resources when job was running). # I actually was not surprised by headroom set always to zero since I found the the headroom field being abandoned in the source code of fairscheduler: in SchedulerApplicationAttempt#getHeadroom() is the one with which the headroom field in the response is set, while SchedulerApplicationAttempt#setHeadroom() is never invoked in FairScheduler (it is invoked in capacity and fifo scheduler though) # I assumed that not invoking setHeadroom in fair scheduler was intentional perhaps due to complications of computing the headroom when fair share is taken into account. But based on your comment, I understand that this could be a "forgotten" case rather than "abandoned" one. # At least in the observed case that we suffered from this problem, the headroom was available and both the preempted reducer and the mapper were assigned immediately (less than a few seconds). So, delaying the preemption even for a period as short as 1 minute could prevent this problem, while not having a tangible negative impact in cases that the preemption was actually required. I agree that there are tradeoffs with the this preemption delay (specially when it is high) but even a short value will suffice to cover this special case that the headroom is already available. # Weather we will have a fix for headroom calculation in fairschedualr or not, it seems to me that allowing the user to configure the preemption to be postponed for a short delay would not be hurtful, if it is not beneficial. > Reducer Preemption is too aggressive > ------------------------------------ > > Key: MAPREDUCE-5844 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Maysam Yabandeh > Assignee: Maysam Yabandeh > > We observed cases where the reducer preemption makes the job finish much > later, and the preemption does not seem to be necessary since after > preemption both the preempted reducer and the mapper are assigned > immediately--meaning that there was already enough space for the mapper. > The logic for triggering preemption is at > RMContainerAllocator::preemptReducesIfNeeded > The preemption is triggered if the following is true: > {code} > headroom + am * |m| + pr * |r| < mapResourceRequest > {code} > where am: number of assigned mappers, |m| is mapper size, pr is number of > reducers being preempted, and |r| is the reducer size. > The original idea apparently was that if headroom is not big enough for the > new mapper requests, reducers should be preempted. This would work if the job > is alone in the cluster. Once we have queues, the headroom calculation > becomes more complicated and it would require a separate headroom calculation > per queue/job. > So, as a result headroom variable is kind of given up currently: *headroom is > always set to 0* What this implies to the speculation is that speculation > becomes very aggressive, not considering whether there is enough space for > the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)