[
https://issues.apache.org/jira/browse/HADOOP-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689127#action_12689127
]
Matei Zaharia commented on HADOOP-4665:
---------------------------------------
The patch actually makes sure that no job is ever brought below its fair or min
share. This happens in preemptTasks, where we compute tasksToLeave and make
sure we leave the job with at least that many. Since each pool's min and fair
share is distributed among the jobs in that pool, this will ensure that pool
shares are also kept. So the deficit thing is just a global ordering to service
the jobs that have been starving the most first, but in following it, we also
ensure that we never bring jobs below their shares.
Actually when we remove deficits from the scheduler (which I already have
working code for; it's just too big a change to push that, 4665 and 4667 in the
same JIRA), the logic will be simpler. We'll just look for jobs that are far
below their fair share (as a ratio of share) and preempt from jobs that are far
above their fair share (again as a ratio). Then there won't be this confusion
about whether we can have a service order based on deficits or not. This patch
is just an attempt to get some of that code into the scheduler before pushing
the big change.. If you'd prefer that I post a patch for the new
non-deficit-based fair scheduler instead, I can do that too.
> Add preemption to the fair scheduler
> ------------------------------------
>
> Key: HADOOP-4665
> URL: https://issues.apache.org/jira/browse/HADOOP-4665
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/fair-share
> Reporter: Matei Zaharia
> Assignee: Matei Zaharia
> Fix For: 0.21.0
>
> Attachments: fs-preemption-v0.patch, hadoop-4665-v1.patch,
> hadoop-4665-v1b.patch, hadoop-4665-v2.patch, hadoop-4665-v3.patch,
> hadoop-4665-v4.patch
>
>
> Task preemption is necessary in a multi-user Hadoop cluster for two reasons:
> users might submit long-running tasks by mistake (e.g. an infinite loop in a
> map program), or tasks may be long due to having to process large amounts of
> data. The Fair Scheduler (HADOOP-3746) has a concept of guaranteed capacity
> for certain queues, as well as a goal of providing good performance for
> interactive jobs on average through fair sharing. Therefore, it will support
> preempting under two conditions:
> 1) A job isn't getting its _guaranteed_ share of the cluster for at least T1
> seconds.
> 2) A job is getting significantly less than its _fair_ share for T2 seconds
> (e.g. less than half its share).
> T1 will be chosen smaller than T2 (and will be configurable per queue) to
> meet guarantees quickly. T2 is meant as a last resort in case non-critical
> jobs in queues with no guaranteed capacity are being starved.
> When deciding which tasks to kill to make room for the job, we will use the
> following heuristics:
> - Look for tasks to kill only in jobs that have more than their fair share,
> ordering these by deficit (most overscheduled jobs first).
> - For maps: kill tasks that have run for the least amount of time (limiting
> wasted time).
> - For reduces: similar to maps, but give extra preference for reduces in the
> copy phase where there is not much map output per task (at Facebook, we have
> observed this to be the main time we need preemption - when a job has a long
> map phase and its reducers are mostly sitting idle and filling up slots).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.