[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973430#comment-13973430
 ] 

Maysam Yabandeh commented on MAPREDUCE-5844:
--------------------------------------------

Thanks [~jlowe] for your detailed comment.

# As I explained in the description of the jira the printed headroom in the 
logs is always zero. e.g.,
{code}
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for 
application_x: ask=8 release= 0 newContainers=0 finishedContainers=0 
resourcelimit=<memory:0, vCores:0> knownNMs=x
{code}
And this is not because there is no headroom (I know that by checking the 
available resources when job was running).
# I actually was not surprised by headroom set always to zero since I found the 
the headroom field being abandoned in the source code of fairscheduler: in 
SchedulerApplicationAttempt#getHeadroom() is the one with which the headroom 
field in the response is set, while SchedulerApplicationAttempt#setHeadroom() 
is never invoked in FairScheduler (it is invoked in capacity and fifo scheduler 
though)
# I assumed that not invoking setHeadroom in fair scheduler was intentional 
perhaps due to complications of computing the headroom when fair share is taken 
into account. But based on your comment, I understand that this could be a 
"forgotten" case rather than "abandoned" one.
# At least in the observed case that we suffered from this problem, the 
headroom was available and both the preempted reducer and the mapper were 
assigned immediately (less than a few seconds). So, delaying the preemption 
even for a period as short as 1 minute could prevent this problem, while not 
having a tangible negative impact in cases that the preemption was actually 
required. I agree that there are tradeoffs with the this preemption delay 
(specially when it is high) but even a short value will suffice to cover this 
special case that the headroom is already available.
# Weather we will have a fix for headroom calculation in fairschedualr or not, 
it seems to me that allowing the user to configure the preemption to be 
postponed for a short delay would not be hurtful, if it is not beneficial.

> Reducer Preemption is too aggressive
> ------------------------------------
>
>                 Key: MAPREDUCE-5844
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>
> We observed cases where the reducer preemption makes the job finish much 
> later, and the preemption does not seem to be necessary since after 
> preemption both the preempted reducer and the mapper are assigned 
> immediately--meaning that there was already enough space for the mapper.
> The logic for triggering preemption is at 
> RMContainerAllocator::preemptReducesIfNeeded
> The preemption is triggered if the following is true:
> {code}
> headroom +  am * |m| + pr * |r| < mapResourceRequest
> {code} 
> where am: number of assigned mappers, |m| is mapper size, pr is number of 
> reducers being preempted, and |r| is the reducer size.
> The original idea apparently was that if headroom is not big enough for the 
> new mapper requests, reducers should be preempted. This would work if the job 
> is alone in the cluster. Once we have queues, the headroom calculation 
> becomes more complicated and it would require a separate headroom calculation 
> per queue/job.
> So, as a result headroom variable is kind of given up currently: *headroom is 
> always set to 0* What this implies to the speculation is that speculation 
> becomes very aggressive, not considering whether there is enough space for 
> the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to