Re: [boinc_dev] 6.6.20 and work scheduling

John . McLeod Tue, 28 Apr 2009 10:56:55 -0700

jm7


Jonathan Hoser <[email protected]> wrote on 04/28/2009 01:39:55
PM:

> Jonathan Hoser <[email protected]>
> 04/28/2009 01:39 PM
>
> To
>
> [email protected]
>
> cc
>
> BOINC dev <[email protected]>
>
> Subject
>
> Re: [boinc_dev] 6.6.20 and work scheduling
>
>
> I'm merging a few parts in the emails where we actually talk about the
> same stuff...
>
> [email protected] wrote:
> >
> > CPU scheduling is not FIFO if there is more than one project.  It is
FIFO
> > within a project and Round Robin between projects.  There are some
slight
> > differences in the actual code that starts the task based on whether it
is
> > in memory or not, but there is no difference in the test to see whether
it
> > should be running, or needs to preempt now.  I haven't figured out why
> > there should be.  The start or re-start of the task does not really
have a
> > bearing on the scheduling of the task.
> >
> >
> >>> You need a list of tasks that would ideally be running now.
> >>>
> >>>
> >> Nope, because this list is implicitly given by the fifo-ordering mixed
> >> with 'queue-jumping' of soon-to-meet-the-deadline-jobs AND
> >> the resource-share selector.
> >>
> >
> > Except that CPU tasks are run round robin between projects - to give
> > variety.
> Yes, and thus ignoring our project priority/resource sharing value.
> That was one side-effect of my approach: to fix that 'by the way'.
>
Full description:  They run in a modified round robin that is based on a
weight by resource share.  Thus attempting to preserve the resource shares
over a moderately short period if possible.  If there are no detected
deadline problems, then a project with a 66% share would get 2 time slices
in a row followed by one time slice for the project with the 33% share.
>
>
> >> And yes, estimated runtime changes all the time, but do we need to
care
> >> about that every second it reports back?
> >> Why not simply look at it when it comes to (re)scheduling events
driven
> >> by the below stated points?
> > Remaining time may not change by just a little bit.  An example that
occurs
> > sometimes:
> >
> > Project Z is added to the mix of projects on a computer.  The estimate
for
> > all tasks is 5 minutes, and so 50 are downloaded.  After the first task
is
> > run, it is noticed that the estimate is WAY off and the tasks actually
take
> > 5 hours (and these numbers pretty much describe what happened).  Now we
> > really do have some serious deadline problems.  What used to look
> > completely reasonable no longer looks as reasonable.
> >
> > The client cannot tell without testing whether something like this has
> > happened or not.
> >
> Well, if we really want to fix such a bug on the client side,
> we might add a reordering step (moving by deadline) in the fifo-queue.
> But even then,
> by my design, not all jobs may be completed before the deadline, because
> the deadline is not superior to anything else (in this case esp.
> resource share)
>
> I don't want to throw deadlines over board entirely, don't get me wrong
> here - I'm most unhappy as a project admin when folks with a huge huge
> huge cache
> keep the SIMAP workunits well longer than 97% of their brethren... thus
> actually hindering us in completing our monthly batch as fast as
possible...

Use a somewhat shorter deadline if possible.  This is, after all, what
deadlines are for.
>
> But in my opinion we needn't fix projects misbehaviors on the client
> side. Really not!

The client has to do SOMETHING with tasks that are sent to it...
> >
> >> 1. either we have time-slicing or we don't.
> >> If we really got a job with a deadline so close that waiting till the
> >> end of the current timeslice (with more cpu-cores a more regular
event)
> >> will really mean its ultimate failure, then there's something wrong
that
> >> we needn't fix, it shouldn't be the clients aim to fix shortcomings of
> >> supplying
> >> projects.
> >>
> >
> > We have time slicing if we can, on as many CPUs as we can.  Sometimes
an
> > immediate preempt is needed.
> >
> > An example:
> >
> > Task A from Project A never checkpoints.  It was started before Task B
from
> > Project B was downloaded.
> >
> > Task A has 48 hours remaining CPU time.
> > Task B has 24 hours till deadline, but only 1 hour of run time.
> >
> > Sometime during the next 23 hours, Task A will have to be pre-empted
NOT AT
> > A CHECKPOINT in order to allow Task B to run before deadline.  Task A
will,
> > of course, have to remain in memory in order to make progress.
> >
> > Believe it or not, there are projects that supply tasks that run for a
week
> > without checkpointing.
> >
> Allright, an example.
> We will catch it however, if we add 'reached TSI' as an event in our
> poll-loop. And the users TSI is sufficient for this case.
> If it isn't - I'd say, tough luck for the project.
> And of course, always using the more and more pathological case of a
> single-CPU box.

However, the request that was implement (a couple of years ago) was to wait
if possible for the next checkpoint AFTER the TSI to do a task switch.  If
the checkpoint never comes, then eventually, we will have to do a
preemption anyway.
> >
> >> 2. yes it does have multiple cpus in mind. Or do you want to tell me,
> >> that every app is asked at the same time (over multiple cores) to
> >> checkpoint/
> >> does checkpoints in complete synchronisation with all other running
> >> apps? I think not.
> >> Thus this event will likely be triggered more often than the others,
but
> >> will actually only do something if the timeslice /TSI of THAT app on
> >> THAT core
> >> is up.
> >>
> >
> > Checkpoints happen at random when the task is ready to checkpoint, not
when
> > the host is ready for it to checkpoint.  It will happen that more than
one
> > task will checkpoint in the same second (the polling interval), not
every
> > time of course, but it is going to happen.
> >
> Yes, so what?
> Our poll-loop catches 3 checkpoints reached:
> Let's say - to keep things interessting, that all three have reached /
> are over our TSI:
> 1. Handle first task.
> reschedule() operation, ultimately running the 'task complete' case;
> either preempt that task and launch another or keep running it, because
> the resourceshare/etc. demands it
> 2. Handle the second task
> reschedule() operation, ultimately running the 'task complete' case;
> either preempt that task and launch another or keep running it, because
> the resourceshare/etc. demands it
> 3. Handle the third task:
> reschedule() operation, ultimately running the 'task complete' case;
> either preempt that task and launch another or keep running it, because
> the resourceshare/etc. demands it
>
Some of these may want to keep running, others may want to be suspended.
It can get to:

Task A has checkpointed.  Task Z needs to run now.  Stop A, run Z
OH, Task B has checkpointed.  Task A has a higher STD than task B.  Stop B,
run A.
OH, Task C has checkpointed.  Task B has a higher STD than task C.  Stop C,
run B.

Starting and stopping tasks is fairly expensive.  Some tasks require
several minutes to spin up completely.  Without a global inspection, this
is going to happen some of the time.

The current code.

The list of tasks that should be running.
Z (pre-empt), A, B,
The list that IS running
A, B, C.
Which is the best target for stopping?  C.
Stop C, run Z.

> All done. Not?
> >>>> task complete:
> >>>> if task.complete
> >>>>    do rescheduling
> >>>>
> >>>> download complete:
> >>>>    do scheduling
> >>>>
> >>>> project/task resume/suspend
> >>>>    do rescheduling
> >>>>
> >>>> maybe (for sake of completeness):
> >>>> RPC complete:
> >>>> if server asks to drop WU
> >>>>    halt job;
> >>>>    do rescheduling (job)
> >>>>
> >>>> The trickier part of course are the
> >>>> scheduling /rescheduling calls, and I'm currently leafing through my
> >>>> notepad looking for the sketch...
> >>>> for my idea we'd need
> >>>>
> >>>> list of jobs run (wct, project)
> >>>> -> containing the wall-clock times for every job run during the
last24
> >>>>
> >>>>
> >>> hours.
> >>>
> >>> How does the last 24 hours help?  Some tasks can run for days or
weeks.
> >>>
> >>>
> >> Elongate it to a fitting period of time. 24h is an idea I picked up
from
> >> Paul 'to keep the mix interessting' - an idea I like.
> >> So, if a task is running 24h 7days ... we needn't have running a
second,
> >> unless this is our ultimate high-priority project with a
> >> priority/resource share of 1000:1 or so.
> >>
> >
> > I am sorry, but I am having no luck figuring out what the problem or
the
> > solution is with the above paragraph.
> >
> This is the fix to the non-bug 'annoyance' of not-respecting
> resource-shares/project priorities.
> And therefore a relatively short window over that we do our
> 'resource-share-enforcement' should be sufficient.
> >>> What about multiple reasons all at the same time?  Why do we
> need to know the reason in the first place
> >> Hm, some ordering could be chosen, I'll think about it; and the reason
> >> does have its place: not all events do have the same event, do they?
> >>
> >
> > Currently events set a few flags.  In the cases that we are interested
in,
> > either a schedule of CPUs, or an enforcement of the schedule, or both.
> > This discussion has focused on schedule and not enforcement.
> >
> Hmhm. But correct me if I'm wrong, but my proposed scheme does not need
> enforcement, does it?
> > BTW, the code is based on a polling loop, not event triggered call
backs.
> > Events are picked up by the polling loop on each iteration - and
multiple
> > events can be picked up at the same time because they all happened
since
> > the last time through the polling loop.
> >
> Yes, that would create the need to order all 'events' to get all for a
> certain job.
> Then we'd have to decide which event takes precedent over the others,
> and either handle them consecutively or discard them.
> (E.g. why would we be interested in a checkpoint-reached + TSI met
> event, if the other event is 'task complete'?)
>
> Best
> -Jonathan
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to