Re: [boinc_dev] 6.6.20 and work scheduling

Paul D. Buck Tue, 28 Apr 2009 11:30:31 -0700

On Apr 28, 2009, at 5:41 AM, [email protected] wrote:

> Changing the time between checks WILL NOT FIX THE PROBLEM.


How many times do I have to agree with you?

I have never said that changing the frequency will fix the problem,  
not once, not ever.

> Log files with no comment, and no attempt at finding the problem are  
> not
> useful.  I did read through one of your log files in excruciating  
> detail,
> only to discover that the only problem area was during the time  
> where debug
> logs were turned off.  I am NOT spending a couple of hours doing that
> again.

Which is part of the problem. I cannot leave the log flags on  
forever.  And, even when I do capture what I think is the essence of  
the problem and provide tailored logs it is never the right thing.   
This is part of the nub you keep glossing over.  If we do something  
more times than is needed.  And we log each and every one of these  
events, most of which are not needed, the events that are needed are  
obscured by the trash.

Brushing your teeth is considered a good thing, so is drinking water.   
But if you drink too much water it is fatal and brushing too much can  
erode the gums.  Too much of a good thing is not always a good thing.

> Yes, you keep giving reasons for increasing the time between the  
> tests.
> NOBODY ELSE CAN UNDERSTAND THEM.

This is a different objection.  And the first time you have raised  
it.  What is unclear about the explanation above?

If, the error condition happens once every 1,000 invocations (for  
example), and you trigger needlessly 10 times more often than  
necessary then the bad thing is going to happen ten time more often.

Secondly, even if we fix all the rules that are creating problems, but  
keep doing the tests at this frenetic rate and you will continue to  
have instability because the basis on which you base your decisions  
does change over time.  And, since all tasks are fair game under all  
scenarios you have proposed then we will still see inappropriate  
choices made at a far higher rate than expected or desired.

And for several of these bad choices both Richard and I have provided  
examples in logs, images, and scenario descriptions.  The simplest:

1) Task is started because a task just completed
2) Task is suspended because upload completes
3) Task is restarted because something else happens
4) Task is stopped again because something else happens and is not  
restarted for 24-48 hours

The fundamental question in that scenario is why if the task was so  
important to start two or more times, why is not important enough to  
be resumed and completed when the intervening events are processed.   
And the answer is that the system does not consider history and always  
calculates the plan on a clean slate each and every invocation.

If you keep a high rate of invocations, you are going to get a high  
rate of unneeded preemption because of the starting from scratch in  
all calculations.  A model which you have yet to reject.

> I have no idea what I am supposed to be seeing here.

What that table shows is that the systems have a high rate of  
interruptions and the causes of those interruptions across a variety  
of systems with a fairly broad set of capabilities.  It also showed to  
my surprise that a "light" load of projects does not necessarily  
equate to a light load of scheduling events.

And, if I might return the favor again, YOU have yet to explain why  
high rates of scheduling and re-scheduling makes sense.

Again, even with your 5 minute deadline "ticking time bomb" scenario  
the fundamental question is why does this scheduling event have to  
occur at this specific instant.  Seemingly there is no lower bound  
below which you would consider excessive.  As systems continue to  
scale up the rate of these scheduling events is going to continue to  
rise.

The other point is that because of the limited run time and the  
current project mix the numbers are actually low.

Again, there are at lest two legs to the instability stool here.  You  
deny one of them has any effect.  I can see it happening before my  
eyes and all my experience tells me that this is a problem.  It is a  
problem of scaling.  Just like checkpointing if you set a range on a  
single core system the checkpoints will happen at about that  
frequency ... on a multi-core system they will happen at a rate that  
is n times faster where n is the number of cores.

My fastest / widest system has effectively 12 processors ... my next  
is going to be at least as wide if not wider.  Meaning that the  
problem is going to be worse.

Lastly, if I run this whole scheduling system more times than I need,  
say 10 times more often than needed, well, then I wasted 9 times the  
time in the process to no net gain.  Waste, no matter how small is  
waste still.

And what is unclear about that?

> There is no guarantee at all what the server is going to hand us.   
> It may
> NOT be reasonable.  There is no check at the server to determine if  
> the
> tasks sent to the client are going to be trouble.

And I don't think any one of us denies this.

Again the question you avoid is this: Why do we have to decide RIGHT  
THIS SECOND if we are in a reasonable or unreasonable situation.

Which is one of the questions I keep asking in all the different ways  
I can because you seem to go into contortions to avoid addressing the  
question.

If we are in a dire situation because of the download of work.  How is  
responding 30 seconds faster going to suddenly make things all  
better?  Especially when the situation is that we have, say, 4 extra  
hours of work to do quickly.

Again, what is the lower bound of reasonableness?  And why.

And if there is a lower bound of reasonableness, why are we always  
scheduling the universe from scratch.  As long as you continue to do  
this you will have instability in the work flow.


> Enforcement is typically much faster than a re-schedule.  It is also  
> called
> more frequently.

Unless I misread the code, always a possibility with the spaghetti  
code extant, there is no real difference in the CPU scheduler in the  
approach it takes ...

> CPU scheduling is not FIFO if there is more than one project.  It is  
> FIFO
> within a project and Round Robin between projects.  There are some  
> slight
> differences in the actual code that starts the task based on whether  
> it is
> in memory or not, but there is no difference in the test to see  
> whether it
> should be running, or needs to preempt now.  I haven't figured out why
> there should be.  The start or re-start of the task does not really  
> have a
> bearing on the scheduling of the task.

Because the logic of how to schedule on a single processor is  
different than it is for a multi-processor.  Or to put it another way,  
if I am processing tasks and a resource comes free once every 6  
minutes, then there is likely little need to preempt anything to meet  
any deadline issue.  You just wait until the next resource comes free.

Which is why I wrote that last discussion that you ignored because in  
it I address the rate question among others.  I know you want to  
believe that I believe (against all evidence) that I am proposing a  
silver bullet that proposal addresses all of the known issues I have.   
Maybe I missed something else, which is where your expertise would be  
so helpful.

> If we do NOT run enforce_schedule at every checkpoint, we have to  
> remove
> the test for has the task checkpointed very recently, which means that
> normal task switches will lose work.

The problem is that as near as I can tell from reading the code and  
from watching the systems run there is absolutely no substantial  
difference between any of the invocations for any of the event  
triggers.  And because the first step is to make EVERYTHING running as  
preemptable, we get the situation that I see all the time.  6-8 tasks  
get preempted and then when the "crisis" provoking tasks are processed  
the tasks preempted are not all resumed.  Because thats the  
consequence of not keeping a history and starting with the assumption  
that everything is preemptable.

> I have no idea what you are trying to say here.  Are you trying to  
> say that
> we should always start the next task in FIFO order?

Almost.  Actually what I am proposing that we use real queuing theory  
to do our work.  And in the proposal that you refused to look at other  
than to yell at me once again that changing the frequency will not  
solve the problem I try to address this.  I wish you would go back and  
reread that discussion and make an honest effort to curb your  
prejudices.

Queuing theory says that the most effective way to process a queue is  
to have a group of resources serviced by a single queue.  I don't  
particularly care how you order that queue, how often you re-juggle  
the contents of that queue.  And I have so stated any number of  
times.  But, unless a task has hit TSI or has ended, tasks running on  
processors should not be preempted on multi-core systems unless there  
is no other way to meet the needs.

Again, typical real-world scenario

1) Download 6-8 IBERCIVIS tasks with 2 day deadline, 30 minute tasks
2) Downloads complete, at some point BOINC panics and decides that all  
8 tasks are in deadline peril
3) 6-8 cores are now running IBERCIVIS tasks even though the deadline  
is still two days in the future.
4) The tasks run off, and since the world has changed many, if not  
all, of the tasks running prior to this panic are now suspended

On my multi-core systems I have little problem with BOINC deciding  
that it wants to run the IBERCIVIS tasks ... NEXT ... and if it held  
its horses and scheduled the tasks one by one on one CPU they would  
all run off in plenty of time.  BUT, because they have deadlines that  
are close, and the priority calculations for each task is done in  
isolation from all other tasks they are all scheduled for a panic  
attack.  This violates the "keep interesting" rule and is an unneeded  
panic by any stretch of calculations.

If we used "bank teller" queuing the tasks would be fed into resources  
as they became available and though I might get a couple CPUs running  
them at the same time, it is likely that they would be burned off by  
one or two cores leaving a good mix on the remaining system.

> The current code.
>
> The list of tasks that should be running.
> Z (pre-empt), A, B,
> The list that IS running
> A, B, C.
> Which is the best target for stopping?  C.
> Stop C, run Z.

A just as likely and common situation (based on observation) is that  
BOINC will decide that:

The list of tasks that should be running
X, Y, Z
The list that is running
A, B, C

Trigger event happens and A, B, C, are suspended and X, Y, Z start  
running regardless of TSI
Next trigger event, A, M, Q start running
Next trigger event, B, Y, M
Next trigger event, A, Y, Z

All because we score all the tasks based on the deadlines and RS and  
what have not right now this second.  With out consideration of  
history or hysterisis effects you will get, and do get, this kind of  
thrashing.

Using queuing theory, Z starts when A, B, or C completes or hits TSI...

I have explained elsewhere how the deadlines should be calculated.

RS is not critical on multi-core for scheduling CPUs, it is far more  
important to use the "interesting work" rule to maximize performance  
(particularly with HT machines) and should only be used as a work  
fetch rule.  Once the work is here, RS can be ignored, and should be  
ignored for scheduling resources to do work.

And, because preemption *IS* expensive, thanks for noting that, we  
should be making it more of a last resort instead of the first thing  
we do when running resource scheduling algorithms.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to