Re: [boinc_dev] 6.6.20 and work scheduling

John . McLeod Tue, 28 Apr 2009 12:01:37 -0700

The problem with ignoring possible triggers is that there is no way of
knowing when the next trigger will occur.  It could be in a second, it
could be hours, there is no way of knowing in advance.

I know I proposed this before.

Normally run in RR.
If there is a deadline issue detected by the rr_sim, schedule some of the
tasks based on EDF.  However, do NOT necessarily preempt immediately.
If there is also a deadline problem detect by EDF, THEN we have to preempt
immediately.

Note that the rr_sim has a bit of a hair trigger.  EDF does not.

As to the checks taking longer on faster machines.  Won't the speed of the
machine speed up the test as well?

jm7

             "Paul D. Buck"                                                
             <p.d.b...@comcast                                             
             .net>                                                      To 
                                       [email protected]              
             04/28/2009 02:30                                           cc 
             PM                        BOINC dev                           
                                       <[email protected]>, David 
                                       Anderson <[email protected]>,  
                                       "Josef W. Segur"                    
                                       <[email protected]>, Rom Walton  
                                       <[email protected]>, Richard         
                                       Haselgrove                          
                                       <[email protected]>       
                                                                   Subject 
                                       Re: [boinc_dev] 6.6.20 and work     
                                       scheduling                          

On Apr 28, 2009, at 5:41 AM, [email protected] wrote:

> Changing the time between checks WILL NOT FIX THE PROBLEM.

How many times do I have to agree with you?

I have never said that changing the frequency will fix the problem,
not once, not ever.

> Log files with no comment, and no attempt at finding the problem are
> not
> useful.  I did read through one of your log files in excruciating
> detail,
> only to discover that the only problem area was during the time
> where debug
> logs were turned off.  I am NOT spending a couple of hours doing that
> again.

Which is part of the problem. I cannot leave the log flags on
forever.  And, even when I do capture what I think is the essence of
the problem and provide tailored logs it is never the right thing.
This is part of the nub you keep glossing over.  If we do something
more times than is needed.  And we log each and every one of these
events, most of which are not needed, the events that are needed are
obscured by the trash.

Brushing your teeth is considered a good thing, so is drinking water.
But if you drink too much water it is fatal and brushing too much can
erode the gums.  Too much of a good thing is not always a good thing.

> Yes, you keep giving reasons for increasing the time between the
> tests.
> NOBODY ELSE CAN UNDERSTAND THEM.

This is a different objection.  And the first time you have raised
it.  What is unclear about the explanation above?

If, the error condition happens once every 1,000 invocations (for
example), and you trigger needlessly 10 times more often than
necessary then the bad thing is going to happen ten time more often.

Secondly, even if we fix all the rules that are creating problems, but
keep doing the tests at this frenetic rate and you will continue to
have instability because the basis on which you base your decisions
does change over time.  And, since all tasks are fair game under all
scenarios you have proposed then we will still see inappropriate
choices made at a far higher rate than expected or desired.

And for several of these bad choices both Richard and I have provided
examples in logs, images, and scenario descriptions.  The simplest:

1) Task is started because a task just completed
2) Task is suspended because upload completes
3) Task is restarted because something else happens
4) Task is stopped again because something else happens and is not
restarted for 24-48 hours

The fundamental question in that scenario is why if the task was so
important to start two or more times, why is not important enough to
be resumed and completed when the intervening events are processed.
And the answer is that the system does not consider history and always
calculates the plan on a clean slate each and every invocation.

If you keep a high rate of invocations, you are going to get a high
rate of unneeded preemption because of the starting from scratch in
all calculations.  A model which you have yet to reject.

> I have no idea what I am supposed to be seeing here.

What that table shows is that the systems have a high rate of
interruptions and the causes of those interruptions across a variety
of systems with a fairly broad set of capabilities.  It also showed to
my surprise that a "light" load of projects does not necessarily
equate to a light load of scheduling events.

And, if I might return the favor again, YOU have yet to explain why
high rates of scheduling and re-scheduling makes sense.

Again, even with your 5 minute deadline "ticking time bomb" scenario
the fundamental question is why does this scheduling event have to
occur at this specific instant.  Seemingly there is no lower bound
below which you would consider excessive.  As systems continue to
scale up the rate of these scheduling events is going to continue to
rise.

The other point is that because of the limited run time and the
current project mix the numbers are actually low.

Again, there are at lest two legs to the instability stool here.  You
deny one of them has any effect.  I can see it happening before my
eyes and all my experience tells me that this is a problem.  It is a
problem of scaling.  Just like checkpointing if you set a range on a
single core system the checkpoints will happen at about that
frequency ... on a multi-core system they will happen at a rate that
is n times faster where n is the number of cores.

My fastest / widest system has effectively 12 processors ... my next
is going to be at least as wide if not wider.  Meaning that the
problem is going to be worse.

Lastly, if I run this whole scheduling system more times than I need,
say 10 times more often than needed, well, then I wasted 9 times the
time in the process to no net gain.  Waste, no matter how small is
waste still.

And what is unclear about that?

> There is no guarantee at all what the server is going to hand us.
> It may
> NOT be reasonable.  There is no check at the server to determine if
> the
> tasks sent to the client are going to be trouble.

And I don't think any one of us denies this.

Again the question you avoid is this: Why do we have to decide RIGHT
THIS SECOND if we are in a reasonable or unreasonable situation.

Which is one of the questions I keep asking in all the different ways
I can because you seem to go into contortions to avoid addressing the
question.

If we are in a dire situation because of the download of work.  How is
responding 30 seconds faster going to suddenly make things all
better?  Especially when the situation is that we have, say, 4 extra
hours of work to do quickly.

Again, what is the lower bound of reasonableness?  And why.

And if there is a lower bound of reasonableness, why are we always
scheduling the universe from scratch.  As long as you continue to do
this you will have instability in the work flow.

> Enforcement is typically much faster than a re-schedule.  It is also
> called
> more frequently.

Unless I misread the code, always a possibility with the spaghetti
code extant, there is no real difference in the CPU scheduler in the
approach it takes ...

> CPU scheduling is not FIFO if there is more than one project.  It is
> FIFO
> within a project and Round Robin between projects.  There are some
> slight
> differences in the actual code that starts the task based on whether
> it is
> in memory or not, but there is no difference in the test to see
> whether it
> should be running, or needs to preempt now.  I haven't figured out why
> there should be.  The start or re-start of the task does not really
> have a
> bearing on the scheduling of the task.

Because the logic of how to schedule on a single processor is
different than it is for a multi-processor.  Or to put it another way,
if I am processing tasks and a resource comes free once every 6
minutes, then there is likely little need to preempt anything to meet
any deadline issue.  You just wait until the next resource comes free.

Which is why I wrote that last discussion that you ignored because in
it I address the rate question among others.  I know you want to
believe that I believe (against all evidence) that I am proposing a
silver bullet that proposal addresses all of the known issues I have.
Maybe I missed something else, which is where your expertise would be
so helpful.

> If we do NOT run enforce_schedule at every checkpoint, we have to
> remove
> the test for has the task checkpointed very recently, which means that
> normal task switches will lose work.

The problem is that as near as I can tell from reading the code and
from watching the systems run there is absolutely no substantial
difference between any of the invocations for any of the event
triggers.  And because the first step is to make EVERYTHING running as
preemptable, we get the situation that I see all the time.  6-8 tasks
get preempted and then when the "crisis" provoking tasks are processed
the tasks preempted are not all resumed.  Because thats the
consequence of not keeping a history and starting with the assumption
that everything is preemptable.

> I have no idea what you are trying to say here.  Are you trying to
> say that
> we should always start the next task in FIFO order?

Almost.  Actually what I am proposing that we use real queuing theory
to do our work.  And in the proposal that you refused to look at other
than to yell at me once again that changing the frequency will not
solve the problem I try to address this.  I wish you would go back and
reread that discussion and make an honest effort to curb your
prejudices.

Queuing theory says that the most effective way to process a queue is
to have a group of resources serviced by a single queue.  I don't
particularly care how you order that queue, how often you re-juggle
the contents of that queue.  And I have so stated any number of
times.  But, unless a task has hit TSI or has ended, tasks running on
processors should not be preempted on multi-core systems unless there
is no other way to meet the needs.

Again, typical real-world scenario

1) Download 6-8 IBERCIVIS tasks with 2 day deadline, 30 minute tasks
2) Downloads complete, at some point BOINC panics and decides that all
8 tasks are in deadline peril
3) 6-8 cores are now running IBERCIVIS tasks even though the deadline
is still two days in the future.
4) The tasks run off, and since the world has changed many, if not
all, of the tasks running prior to this panic are now suspended

On my multi-core systems I have little problem with BOINC deciding
that it wants to run the IBERCIVIS tasks ... NEXT ... and if it held
its horses and scheduled the tasks one by one on one CPU they would
all run off in plenty of time.  BUT, because they have deadlines that
are close, and the priority calculations for each task is done in
isolation from all other tasks they are all scheduled for a panic
attack.  This violates the "keep interesting" rule and is an unneeded
panic by any stretch of calculations.

If we used "bank teller" queuing the tasks would be fed into resources
as they became available and though I might get a couple CPUs running
them at the same time, it is likely that they would be burned off by
one or two cores leaving a good mix on the remaining system.

> The current code.
>
> The list of tasks that should be running.
> Z (pre-empt), A, B,
> The list that IS running
> A, B, C.
> Which is the best target for stopping?  C.
> Stop C, run Z.

A just as likely and common situation (based on observation) is that
BOINC will decide that:

The list of tasks that should be running
X, Y, Z
The list that is running
A, B, C

Trigger event happens and A, B, C, are suspended and X, Y, Z start
running regardless of TSI
Next trigger event, A, M, Q start running
Next trigger event, B, Y, M
Next trigger event, A, Y, Z

All because we score all the tasks based on the deadlines and RS and
what have not right now this second.  With out consideration of
history or hysterisis effects you will get, and do get, this kind of
thrashing.

Using queuing theory, Z starts when A, B, or C completes or hits TSI...

I have explained elsewhere how the deadlines should be calculated.

RS is not critical on multi-core for scheduling CPUs, it is far more
important to use the "interesting work" rule to maximize performance
(particularly with HT machines) and should only be used as a work
fetch rule.  Once the work is here, RS can be ignored, and should be
ignored for scheduling resources to do work.

And, because preemption *IS* expensive, thanks for noting that, we
should be making it more of a last resort instead of the first thing
we do when running resource scheduling algorithms.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to