deficit computation is biased by historical load
------------------------------------------------

                 Key: HADOOP-4872
                 URL: https://issues.apache.org/jira/browse/HADOOP-4872
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/fair-share
            Reporter: Joydeep Sen Sarma


going through the code over the weekend - one of the things that struck me 
about the fair scheduler was how the deficit accumulated was dependent on 
number of concurrent tasks and how that can hurt other tasks later on.

for example:
- two jobs are running, one came before and is occupying all the slots. so the 
second job accumulates deficit at the rate of half the cluster size until some 
time. this can be a fairly high deficit (since only two tasks are running)
- a large number of jobs now arrive. the second job, by virtue of it's high 
deficit, out-schedules the new wave of jobs.

what i dont like here is that deficit is based largely on the history of the 
jobs in the cluster - and the current set of jobs has had no influence on that 
history - and may be unduly penalized for it. jobs that happen to be pending 
when there are a low number of jobs in the system - benefit a lot.

what i find more helpful is what the deficit is in the context of the current 
load. for example - let's say that a job has used 'C' compute units (slots * 
time) over it's lifetime of T. ie. it has effectively been using C/T slots over 
its lifetime. at any given point in time - it's fair share may be an 
alternative number F (based on jobs running at that point in time). So one can 
now compute deficit as (F - C/T)*T as of this point in time.

clearly - this would address the scenario outlined positively by not granting a 
large deficit to the second large job once the wave of jobs arrives. consider 
another alternative (and corollary) scenario:
- large job running with lots of other small jobs
- the small job finishes and another large job arrives (so two large jobs 
running now)

in this case also - the current algorithm does not do well. the first large job 
does not accumulate a lot of deficit if fair sharing is working well. once the 
small jobs disappear and the second large job comes in - both the first and 
second large job get roughly 50-50 of the cluster. but this is 
counter-intuitive. in such a case - one would think that the first large job - 
by virtue of running longer at a slot pace - should get more slots than the 
second large job.

again - recomputing deficit as mentioned in the context of the current load 
would fix this situation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to