deficit computation is biased by historical load
------------------------------------------------
Key: HADOOP-4872
URL: https://issues.apache.org/jira/browse/HADOOP-4872
Project: Hadoop Core
Issue Type: Bug
Components: contrib/fair-share
Reporter: Joydeep Sen Sarma
going through the code over the weekend - one of the things that struck me
about the fair scheduler was how the deficit accumulated was dependent on
number of concurrent tasks and how that can hurt other tasks later on.
for example:
- two jobs are running, one came before and is occupying all the slots. so the
second job accumulates deficit at the rate of half the cluster size until some
time. this can be a fairly high deficit (since only two tasks are running)
- a large number of jobs now arrive. the second job, by virtue of it's high
deficit, out-schedules the new wave of jobs.
what i dont like here is that deficit is based largely on the history of the
jobs in the cluster - and the current set of jobs has had no influence on that
history - and may be unduly penalized for it. jobs that happen to be pending
when there are a low number of jobs in the system - benefit a lot.
what i find more helpful is what the deficit is in the context of the current
load. for example - let's say that a job has used 'C' compute units (slots *
time) over it's lifetime of T. ie. it has effectively been using C/T slots over
its lifetime. at any given point in time - it's fair share may be an
alternative number F (based on jobs running at that point in time). So one can
now compute deficit as (F - C/T)*T as of this point in time.
clearly - this would address the scenario outlined positively by not granting a
large deficit to the second large job once the wave of jobs arrives. consider
another alternative (and corollary) scenario:
- large job running with lots of other small jobs
- the small job finishes and another large job arrives (so two large jobs
running now)
in this case also - the current algorithm does not do well. the first large job
does not accumulate a lot of deficit if fair sharing is working well. once the
small jobs disappear and the second large job comes in - both the first and
second large job get roughly 50-50 of the cluster. but this is
counter-intuitive. in such a case - one would think that the first large job -
by virtue of running longer at a slot pace - should get more slots than the
second large job.
again - recomputing deficit as mentioned in the context of the current load
would fix this situation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.