Re: [boinc_dev] 6.6.20 and work scheduling

Paul D. Buck Tue, 28 Apr 2009 15:07:10 -0700


On Apr 28, 2009, at 12:14 PM, David Anderson wrote:

At this point I'm interested in reports of bad scheduling
or work-fetch decisions, or bad debt calculation.
When these are all fixed we can address the efficiency
of the scheduling calculations (if it's an issue).

Ok, I will append an RTF file with the pseudo-code below because themailing list seems to want to reformat everything and remove prettyprint.

It would be nice if we could start with talking to the high end logicfirst and then quibble over the details of the minutia in the ruleslater.

as suggestions to the code are made and agreed upon I will createupdates and circulate those if that is agreeable.

The reason I think this a more rational route is that I feel that thecurrent rules are so inadequate to the task on fast/wide systems thata new approach is needed.

The basic logic of this should match to the textual description ofApril 27, 2009 2:20:56 PM PDT where I list the problems and a broadoutline of the conceptual model of what I think needs to be done.

There ARE lots of details still missing. But the fundamentals arethere. the intent is to address the issues listed in that prior e-mail, to wit:


1) We do it too often (event driven)
2) All currently running tasks are eligible for preemption
3) TSI is not respected as a limiting factor
4) TSI is used in calculating deadline peril
5) Work mix is not kept "interesting"
6) Resource Share is used in calculating run time allocations
7) Work "batches" (tasks with roughly similar deadlines) are not "bank
teller queued"
8) History of work scheduling is not preserved and all deadlines are
calculated fresh each invocation.
9) True deadline peril is rare, but "false positives" are common
10) Some of the sources of work peril may be caused by a defective
work fetch allocation
11) Other factors either obscured by the above, I forgot them, or
maybe nothing else ...

see the e-mail for the rest

{\rtf1\ansi\ansicpg1252\cocoartf949\cocoasubrtf430
{\fonttbl\f0\froman\fcharset0 TimesNewRomanPSMT;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww25340\viewh29740\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural

\f0\fs48 \cf0 \
If I understand the logic this is the BOINC client in a nutshell.\
\
GLOBAL List\
	task-list						// list of all tasks, unknown order\
	task-queue 					// tasks in ordered list, highest priority task first\
	projects-with-tasks-running 	// list of projects with tasks running\
	projects-with-tasks-queued	// list of projects with tasks queued\
	task-completion-interval		// the running average of time between task completions\
	task-checkpoint-interval		// the running average of time between task checkpoints\
	rs-scoreboard					// scoreboard of time spent per project vs. time\
									// that should be spent on a project on a per unit time basis\
									// shortfalls would increase the priority of the tasks in the task-queue\
\
main ()\
	begin\
		initialize\
		loop forever\
			do what needs to be done\
			detect-events;\
			if events \
				if number of resources < 4 then\
					use old scheduler\
				else\
					schedule-resources (event list)\
			do whatever else needs to be done\
			sleep; // isn't this where the 60 second limit should occur?\
		end loop; // forever\
	end // main\
\
The new routines would look something like this:\
\
schedule-resources (event list)\
	iterate through event list\
\
		// note first for speed\
		case checkpointing:\
			if task.runtime < TSI\
				do nothing\
			else\
				halt task\
				update projects-with-tasks-running\
				update rs-scoreboard\
				do update-task-queue\
				do schedule-task-to-resource\
				update task-checkpoint-interval\
\
		case task complete:\
			update projects-with-tasks-running\
			update task-completion-interval\
			update rs-scoreboard\
			do schedule-task-to-resource\
\
		case download complete:\
			do update-task-queue // did we create deadline peril?\
\
		case project/task resume/suspend\
			do update-task-queue // did we create deadline peril?\
\
		case RPC complete:\
			if server asks to drop WU\
				if task not started \
					drop task (this may logically belong in update-task-queue)\
					do update-task-queue\
				else\
					do nothing\
\
		case deadline-peril:\
			// on a nominal system this case should never occur\
			// because queues are reasonable and speed / width \
			// allows deliberative scheduling\
			// this event is raised in update-task-queue\
			do determine-task-to-preempt\
			preempt task  // frees resource\
			update projects-with-tasks-running\
			do schedule-task-to-resource\
			\
	end iterate // event list\
end schedule-resources\
\
\
update-task-queue\
\
	iterate through task-list\
		update information using best available estimates and rs-scoreboard\
		// establishes the baselines on each task for priority ranking later\
		// I would not use rr-sim for rs-scoreboarding\
\
	drop task-queue\
	iterate through task-list\
		if task is running mark task as queued\
		// no need to queue already running tasks\
		if task was preempted insert it into the task queue and mark as queued\
\
	loop until all tasks queued\
		iterate through task-list\
			find the highest priority (deadline order essentially) task\
		insert into task-queue\
		mark task as queued\
	end loop // all tasks queued\
\
	test head task for deadline peril\
	// test will calculate based on task-completion-interval and task-checkpoint-interval\
	// estimated task run time, safety margin, etc. My current bad example of IBERCIVIS tasks\
	// would not raise DP events, but would merely increase the task priority\
	if deadline peril exists\
		raise deadline-peril event\
\
end update-task-queue\
\
\
schedule-task-to-resource\
\
	if deadline-peril\
		assign task-queue.head-task to available resource\
		update projects-with-tasks-running\
	else\
		// try to keep work mix "interesting"\
		compare projects-with-tasks-running to projects-with-tasks-queued\
			assign highest priority task where task.project not in projects-with-tasks-running to the available resource\
			// this is the most likely route for tasks with long deadlines will be scheduled as \
			// their rs-scoreboard allocation of time shows a shortfall\
		else \
			assign task-queue.head-task to available resource\
				\
end schedule-task-to-resource\
\
\
determine-task-to-preempt\
\
	iterate running tasks\
		find task with longest time to completion and deadline\
		mark as preempted\
end determine-task-to-preempt\
\
}




Pseudo code:


If I understand the logic this is the BOINC client in a nutshell.

GLOBAL List
        task-list                                               // list of all 
tasks, unknown order
        task-queue                                      // tasks in ordered 
list, highest priority task first
        projects-with-tasks-running     // list of projects with tasks running
        projects-with-tasks-queued      // list of projects with tasks queued

task-completion-interval // the running average of time between taskcompletionstask-checkpoint-interval // the running average of time between taskcheckpoints

        rs-scoreboard                                   // scoreboard of time 
spent per project vs. time
                                                                        // that 
should be spent on a project on a per unit time basis

// shortfalls would increase the priority of the tasks in thetask-queue


main ()
        begin
                initialize
                loop forever
                        do what needs to be done
                        detect-events;
                        if events
                                if number of resources < 4 then
                                        use old scheduler
                                else
                                        schedule-resources (event list)
                        do whatever else needs to be done
                        sleep; // isn't this where the 60 second limit should 
occur?
                end loop; // forever
        end // main

The new routines would look something like this:

schedule-resources (event list)
        iterate through event list

                // note first for speed
                case checkpointing:
                        if task.runtime < TSI
                                do nothing
                        else
                                halt task
                                update projects-with-tasks-running
                                update rs-scoreboard
                                do update-task-queue
                                do schedule-task-to-resource
                                update task-checkpoint-interval

                case task complete:
                        update projects-with-tasks-running
                        update task-completion-interval
                        update rs-scoreboard
                        do schedule-task-to-resource

                case download complete:
                        do update-task-queue // did we create deadline peril?

                case project/task resume/suspend
                        do update-task-queue // did we create deadline peril?

                case RPC complete:
                        if server asks to drop WU
                                if task not started
                                        drop task (this may logically belong in 
update-task-queue)
                                        do update-task-queue
                                else
                                        do nothing

                case deadline-peril:
                        // on a nominal system this case should never occur
                        // because queues are reasonable and speed / width
                        // allows deliberative scheduling
                        // this event is raised in update-task-queue
                        do determine-task-to-preempt
                        preempt task  // frees resource
                        update projects-with-tasks-running
                        do schedule-task-to-resource
                        
        end iterate // event list
end schedule-resources


update-task-queue

        iterate through task-list
                update information using best available estimates and 
rs-scoreboard
                // establishes the baselines on each task for priority ranking 
later
                // I would not use rr-sim for rs-scoreboarding

        drop task-queue
        iterate through task-list
                if task is running mark task as queued
                // no need to queue already running tasks
                if task was preempted insert it into the task queue and mark as 
queued

        loop until all tasks queued
                iterate through task-list
                        find the highest priority (deadline order essentially) 
task
                insert into task-queue
                mark task as queued
        end loop // all tasks queued

        test head task for deadline peril

// test will calculate based on task-completion-interval and task-checkpoint-interval// estimated task run time, safety margin, etc. My current badexample of IBERCIVIS tasks// would not raise DP events, but would merely increase the taskpriority

        if deadline peril exists
                raise deadline-peril event

end update-task-queue


schedule-task-to-resource

        if deadline-peril
                assign task-queue.head-task to available resource
                update projects-with-tasks-running
        else
                // try to keep work mix "interesting"
                compare projects-with-tasks-running to 
projects-with-tasks-queued

assign highest priority task where task.project not in projects-with-tasks-running to the available resource// this is the most likely route for tasks with long deadlines willbe scheduled as

                        // their rs-scoreboard allocation of time shows a 
shortfall
                else
                        assign task-queue.head-task to available resource
                                
end schedule-task-to-resource


determine-task-to-preempt

        iterate running tasks
                find task with longest time to completion and deadline
                mark as preempted
end determine-task-to-preempt

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to