[
https://issues.apache.org/jira/browse/AURORA-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Farner updated AURORA-121:
-------------------------------
Description:
When {{TaskSchedulerImpl}} fails to find an open slot for a task, it falls back
to the preemptor:
{code}
if (!offerQueue.launchFirst(getAssignerFunction(taskId, task))) {
// Task could not be scheduled.
maybePreemptFor(taskId);
return TaskSchedulerResult.TRY_AGAIN;
}
{code}
This can be problematic when the task store is large (O(10k tasks)) and there
is a steady supply of PENDING tasks not satisfied by open slots. This will
manifest as an overall degraded/slow scheduler, and logs of slow queries used
for preemption:
{noformat}
I0125 17:47:36.970 THREAD23
org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took 107
ms: TaskQuery(owner:null, environment:null, jobName:null,
taskIds:null, statuses:[KILLING, ASSIGNED, STARTING, RUNNING, RESTARTING],
slaveHost:null, instanceIds:null)
{noformat}
Several approaches come to mind to improve this situation (not mutually
exclusive):
- (easy) More aggressively back off on tasks that cannot be satisfied
- (easy) Fall back to preemption less frequently
- (easy) Gather the list of slaves from {{AttributeStore}} rather than
{{TaskStore}}. This breaks the operation up into many smaller queries and
reduces the amount of work in cases where a match is found. However, this
would actually create more work when a match is not found, so this approach is
probably not helpful by itself.
- (harder) Scan for preemption candidates asynchronously, freeing up the
TaskScheduler thread and the storage write lock. Scans could be kicked off by
the task scheduler, ideally in a way that doesn't dogpile. This could also be
done in a weakly-consistent way to minimally contribute to storage contention.
was:
When {{TaskSchedulerImpl}} fails to find an open slot for a task, it falls back
to the preemptor:
{code}
if (!offerQueue.launchFirst(getAssignerFunction(taskId, task))) {
// Task could not be scheduled.
maybePreemptFor(taskId);
return TaskSchedulerResult.TRY_AGAIN;
}
{code}
This can be problematic when the task store is large (O(10k tasks)) and there
is a steady supply of PENDING tasks not satisfied by open slots. This will
manifest as an overall degraded/slow scheduler, and logs of slow queries used
for preemption:
{noformat}
I0125 17:47:36.970 THREAD23
org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took 107
ms: TaskQuery(owner:null, environment:null, jobName:null,
taskIds:null, statuses:[KILLING, ASSIGNED, STARTING, RUNNING, RESTARTING],
slaveHost:null, instanceIds:null)
{noformat}
Several approaches come to mind to improve this situation:
- (easy) More aggressively back off on tasks that cannot be satisfied
- (easy) Fall back to preemption less frequently
- (harder) Scan for preemption candidates asynchronously, freeing up the
TaskScheduler thread and the storage write lock. Scans could be kicked off by
the task scheduler, ideally in a way that doesn't dogpile. This could also be
done in a weakly-consistent way to minimally contribute to storage contention.
> Make the preemptor more efficient
> ---------------------------------
>
> Key: AURORA-121
> URL: https://issues.apache.org/jira/browse/AURORA-121
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: Bill Farner
>
> When {{TaskSchedulerImpl}} fails to find an open slot for a task, it falls
> back to the preemptor:
> {code}
> if (!offerQueue.launchFirst(getAssignerFunction(taskId, task))) {
> // Task could not be scheduled.
> maybePreemptFor(taskId);
> return TaskSchedulerResult.TRY_AGAIN;
> }
> {code}
> This can be problematic when the task store is large (O(10k tasks)) and there
> is a steady supply of PENDING tasks not satisfied by open slots. This will
> manifest as an overall degraded/slow scheduler, and logs of slow queries used
> for preemption:
> {noformat}
> I0125 17:47:36.970 THREAD23
> org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took
> 107 ms: TaskQuery(owner:null, environment:null, jobName:null,
> taskIds:null, statuses:[KILLING, ASSIGNED, STARTING, RUNNING, RESTARTING],
> slaveHost:null, instanceIds:null)
> {noformat}
> Several approaches come to mind to improve this situation (not mutually
> exclusive):
> - (easy) More aggressively back off on tasks that cannot be satisfied
> - (easy) Fall back to preemption less frequently
> - (easy) Gather the list of slaves from {{AttributeStore}} rather than
> {{TaskStore}}. This breaks the operation up into many smaller queries and
> reduces the amount of work in cases where a match is found. However, this
> would actually create more work when a match is not found, so this approach
> is probably not helpful by itself.
> - (harder) Scan for preemption candidates asynchronously, freeing up the
> TaskScheduler thread and the storage write lock. Scans could be kicked off
> by the task scheduler, ideally in a way that doesn't dogpile. This could
> also be done in a weakly-consistent way to minimally contribute to storage
> contention.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)