[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262799#comment-13262799
 ] 

Thomas Graves commented on MAPREDUCE-4191:
------------------------------------------

I'm still following this through to fully understand, but there is a comment in 
the code in LeafQueue that tries to explain this:

   // Note: We aren't considering the current request since there is a fixed
   // overhead of the AM, but it's a > check, not a >= check, so... 

Which I don't totally follow, I guess if you have one job in the queue that is 
taking the entire capacity, it allows the job to be more like it was in mrv1 
and tries not to penalize you for the AM overhead. The AM however is doing the 
setup and clean tasks where as in mrv1 it would need to allocate a slot for 
those.  The AM may have fixed overhead but that overhead is configurable. I 
could create an AM with 24G of memory or use the default of 1.5G. Or on the 
flip side, I have an AM that uses 1.5G, but have a map task that now gets 
scheduled and uses 24G which puts it way over its capacity.  That could affect 
the queue current usage greatly and seems to break the capacity guarantee. 

In the case where you say have 2 jobs in the queue, you have 2 app masters, one 
of which is "counted' against your queue and then the other one is not.

I do see it beneficial to queues with very small capacities though, as without 
this they could be stuck without enough resources to run a task.

Arun or anyone else familiar with capacity scheduler, if you could provide 
explanation that would be great.
                
> capacity scheduler: job unexpectedly exceeds queue capacity limit by one task
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4191
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4191
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, scheduler
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> While testing the queue capacity limits, it appears that the job can exceed 
> the
> queue capacity limit by one task while the user limit factor is 1. It's not
> clear to me why this is. 
> Here is the steps to reproduce:
> 1) set yarn.app.mapreduce.am.resource.mb to 2048 (default value)
> 2) set yarn.scheduler.capacity.root.default.user-limit-factor to 1.0 (default)
> 3) set yarn.scheduler.capacity.root.default.capacity to 90 (%)
> 4) For a cluster with capacity of 56G, 90% rounded up is 51.
> 5) submit a job with large number of tasks, each task using 1G memory. 
> 6) webui shows that the used resource is 52 G, which is 92.9% of the cluster
> capacity (instead of the expected 90%), and 103.2% of the queue capacity
> (instead of the expected 100%).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to