[ 
https://issues.apache.org/jira/browse/HADOOP-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641405#action_12641405
 ] 

Hemanth Yamijala commented on HADOOP-4471:
------------------------------------------

Though this patch looks correct, we realized when reviewing it, that there are 
conditions when this can lead to a 'priority inversion' - where jobs of a lower 
priority which started running first actually block reduce slots from a higher 
priority job, and hence the higher priority job can effectively not complete. 
Indeed in some of the initial discussions when doing capacity scheduler, we had 
said that due to this issue, we will not bump jobs over other running jobs, 
which is what is currently implemented.

I think this needs some discussion and consensus before closing.

> Capacity Scheduler should maintain the right ordering of jobs in its running 
> queue
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4471
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4471
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4471-v1.patch
>
>
> Currently, the Capacity Scheduler maintains a simple linked list of jobs 
> which are running. This implies that running jobs are sorted by when they 
> started running (i.e., when they were added to the queue). The Scheduler 
> should maintain the same ordering among running jobs that it does for waiting 
> jobs. Jobs should be sorted by priority (if the queue supports priorities) 
> and by their submit time. 
> This sorting would be more fair in deciding which running jobs get access to 
> a free TT. It also does not penalize jobs that have a longer setup task, 
> which affects when they enter the run queue. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to