[ 
https://issues.apache.org/jira/browse/HADOOP-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647552#action_12647552
 ] 

Amar Kamat commented on HADOOP-4658:
------------------------------------

Looking at the logs, it seems that there are 2 problems
- when job3 finishes (pending = 0), it takes some time for it to exit the 
scheduler. While job3 from user3 is actually done (doesnt require any 
scheduling cycles), user3 is still counted as a valid user and thus affects the 
_limit_ computation
- Since the limit computation is slow in catching up, job1 always has the 
benefit and schedules more tasks. The problem is that it sometimes goes ahead 
and schedules speculative tasks even when  job2 has genuine tasks to run.

limit computation works as follows :
{code}
cap = min (running_tasks + 1, guaranteed_cap)
limit = max( cap/num_users, cap*ulimit)
{code}

I think whatever is extra should always be equally given back to all the 
contenders. This can be achieved if we update _limits_ immediately based on how 
many users actually require slots rather than waiting for the user to be 
removed from the scheduler. Also we should make sure that speculative tasks 
should be run last else we will end up wasting resources.

new limit computation :
{code}
cap = min (running_tasks, guaranteed_cap)
num_actual_users = users with slot requirements // avoids users from jobs that 
are done with their scheduling
limit = max( cap/num_actual_users, cap*ulimit)
{code}

> User limit is not expanding back properly.
> ------------------------------------------
>
>                 Key: HADOOP-4658
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4658
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: GC=100% nodes=104, map_capacity=208, 
> reduce_capacity=208, user-limit=25%;
>            Reporter: Karam Singh
>            Assignee: Amar Kamat
>
> User limit is not expanding back properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to