On Apr 28, 2009, at 12:14 PM, David Anderson wrote:
At this point I'm interested in reports of bad scheduling
or work-fetch decisions, or bad debt calculation.
When these are all fixed we can address the efficiency
of the scheduling calculations (if it's an issue).
Ok, I will append an RTF file with the pseudo-code below because the
mailing list seems to want to reformat everything and remove pretty
print.
It would be nice if we could start with talking to the high end logic
first and then quibble over the details of the minutia in the rules
later.
as suggestions to the code are made and agreed upon I will create
updates and circulate those if that is agreeable.
The reason I think this a more rational route is that I feel that the
current rules are so inadequate to the task on fast/wide systems that
a new approach is needed.
The basic logic of this should match to the textual description of
April 27, 2009 2:20:56 PM PDT where I list the problems and a broad
outline of the conceptual model of what I think needs to be done.
There ARE lots of details still missing. But the fundamentals are
there. the intent is to address the issues listed in that prior e-
mail, to wit:
1) We do it too often (event driven)
2) All currently running tasks are eligible for preemption
3) TSI is not respected as a limiting factor
4) TSI is used in calculating deadline peril
5) Work mix is not kept "interesting"
6) Resource Share is used in calculating run time allocations
7) Work "batches" (tasks with roughly similar deadlines) are not "bank
teller queued"
8) History of work scheduling is not preserved and all deadlines are
calculated fresh each invocation.
9) True deadline peril is rare, but "false positives" are common
10) Some of the sources of work peril may be caused by a defective
work fetch allocation
11) Other factors either obscured by the above, I forgot them, or
maybe nothing else ...
see the e-mail for the rest
{\rtf1\ansi\ansicpg1252\cocoartf949\cocoasubrtf430
{\fonttbl\f0\froman\fcharset0 TimesNewRomanPSMT;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww25340\viewh29740\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural
\f0\fs48 \cf0 \
If I understand the logic this is the BOINC client in a nutshell.\
\
GLOBAL List\
task-list // list of all tasks, unknown order\
task-queue // tasks in ordered list, highest priority task first\
projects-with-tasks-running // list of projects with tasks running\
projects-with-tasks-queued // list of projects with tasks queued\
task-completion-interval // the running average of time between task completions\
task-checkpoint-interval // the running average of time between task checkpoints\
rs-scoreboard // scoreboard of time spent per project vs. time\
// that should be spent on a project on a per unit time basis\
// shortfalls would increase the priority of the tasks in the task-queue\
\
main ()\
begin\
initialize\
loop forever\
do what needs to be done\
detect-events;\
if events \
if number of resources < 4 then\
use old scheduler\
else\
schedule-resources (event list)\
do whatever else needs to be done\
sleep; // isn't this where the 60 second limit should occur?\
end loop; // forever\
end // main\
\
The new routines would look something like this:\
\
schedule-resources (event list)\
iterate through event list\
\
// note first for speed\
case checkpointing:\
if task.runtime < TSI\
do nothing\
else\
halt task\
update projects-with-tasks-running\
update rs-scoreboard\
do update-task-queue\
do schedule-task-to-resource\
update task-checkpoint-interval\
\
case task complete:\
update projects-with-tasks-running\
update task-completion-interval\
update rs-scoreboard\
do schedule-task-to-resource\
\
case download complete:\
do update-task-queue // did we create deadline peril?\
\
case project/task resume/suspend\
do update-task-queue // did we create deadline peril?\
\
case RPC complete:\
if server asks to drop WU\
if task not started \
drop task (this may logically belong in update-task-queue)\
do update-task-queue\
else\
do nothing\
\
case deadline-peril:\
// on a nominal system this case should never occur\
// because queues are reasonable and speed / width \
// allows deliberative scheduling\
// this event is raised in update-task-queue\
do determine-task-to-preempt\
preempt task // frees resource\
update projects-with-tasks-running\
do schedule-task-to-resource\
\
end iterate // event list\
end schedule-resources\
\
\
update-task-queue\
\
iterate through task-list\
update information using best available estimates and rs-scoreboard\
// establishes the baselines on each task for priority ranking later\
// I would not use rr-sim for rs-scoreboarding\
\
drop task-queue\
iterate through task-list\
if task is running mark task as queued\
// no need to queue already running tasks\
if task was preempted insert it into the task queue and mark as queued\
\
loop until all tasks queued\
iterate through task-list\
find the highest priority (deadline order essentially) task\
insert into task-queue\
mark task as queued\
end loop // all tasks queued\
\
test head task for deadline peril\
// test will calculate based on task-completion-interval and task-checkpoint-interval\
// estimated task run time, safety margin, etc. My current bad example of IBERCIVIS tasks\
// would not raise DP events, but would merely increase the task priority\
if deadline peril exists\
raise deadline-peril event\
\
end update-task-queue\
\
\
schedule-task-to-resource\
\
if deadline-peril\
assign task-queue.head-task to available resource\
update projects-with-tasks-running\
else\
// try to keep work mix "interesting"\
compare projects-with-tasks-running to projects-with-tasks-queued\
assign highest priority task where task.project not in projects-with-tasks-running to the available resource\
// this is the most likely route for tasks with long deadlines will be scheduled as \
// their rs-scoreboard allocation of time shows a shortfall\
else \
assign task-queue.head-task to available resource\
\
end schedule-task-to-resource\
\
\
determine-task-to-preempt\
\
iterate running tasks\
find task with longest time to completion and deadline\
mark as preempted\
end determine-task-to-preempt\
\
}
Pseudo code:
If I understand the logic this is the BOINC client in a nutshell.
GLOBAL List
task-list // list of all
tasks, unknown order
task-queue // tasks in ordered
list, highest priority task first
projects-with-tasks-running // list of projects with tasks running
projects-with-tasks-queued // list of projects with tasks queued
task-completion-interval // the running average of time between task
completions
task-checkpoint-interval // the running average of time between task
checkpoints
rs-scoreboard // scoreboard of time
spent per project vs. time
// that
should be spent on a project on a per unit time basis
// shortfalls would increase the priority of the tasks in the
task-queue
main ()
begin
initialize
loop forever
do what needs to be done
detect-events;
if events
if number of resources < 4 then
use old scheduler
else
schedule-resources (event list)
do whatever else needs to be done
sleep; // isn't this where the 60 second limit should
occur?
end loop; // forever
end // main
The new routines would look something like this:
schedule-resources (event list)
iterate through event list
// note first for speed
case checkpointing:
if task.runtime < TSI
do nothing
else
halt task
update projects-with-tasks-running
update rs-scoreboard
do update-task-queue
do schedule-task-to-resource
update task-checkpoint-interval
case task complete:
update projects-with-tasks-running
update task-completion-interval
update rs-scoreboard
do schedule-task-to-resource
case download complete:
do update-task-queue // did we create deadline peril?
case project/task resume/suspend
do update-task-queue // did we create deadline peril?
case RPC complete:
if server asks to drop WU
if task not started
drop task (this may logically belong in
update-task-queue)
do update-task-queue
else
do nothing
case deadline-peril:
// on a nominal system this case should never occur
// because queues are reasonable and speed / width
// allows deliberative scheduling
// this event is raised in update-task-queue
do determine-task-to-preempt
preempt task // frees resource
update projects-with-tasks-running
do schedule-task-to-resource
end iterate // event list
end schedule-resources
update-task-queue
iterate through task-list
update information using best available estimates and
rs-scoreboard
// establishes the baselines on each task for priority ranking
later
// I would not use rr-sim for rs-scoreboarding
drop task-queue
iterate through task-list
if task is running mark task as queued
// no need to queue already running tasks
if task was preempted insert it into the task queue and mark as
queued
loop until all tasks queued
iterate through task-list
find the highest priority (deadline order essentially)
task
insert into task-queue
mark task as queued
end loop // all tasks queued
test head task for deadline peril
// test will calculate based on task-completion-interval and task-
checkpoint-interval
// estimated task run time, safety margin, etc. My current bad
example of IBERCIVIS tasks
// would not raise DP events, but would merely increase the task
priority
if deadline peril exists
raise deadline-peril event
end update-task-queue
schedule-task-to-resource
if deadline-peril
assign task-queue.head-task to available resource
update projects-with-tasks-running
else
// try to keep work mix "interesting"
compare projects-with-tasks-running to
projects-with-tasks-queued
assign highest priority task where task.project not in projects-
with-tasks-running to the available resource
// this is the most likely route for tasks with long deadlines will
be scheduled as
// their rs-scoreboard allocation of time shows a
shortfall
else
assign task-queue.head-task to available resource
end schedule-task-to-resource
determine-task-to-preempt
iterate running tasks
find task with longest time to completion and deadline
mark as preempted
end determine-task-to-preempt
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.