Re: [boinc_dev] 6.6.20 and work scheduling

John . McLeod Tue, 28 Apr 2009 10:01:17 -0700

jm7


[email protected] wrote on 04/28/2009 12:18:56 PM:

> Jonathan Hoser <[email protected]>
> Sent by: [email protected]
>
> 04/28/2009 12:18 PM
>
> To
>
> cc
>
> BOINC dev <[email protected]>
>
> Subject
>
> Re: [boinc_dev] 6.6.20 and work scheduling
>
>
>
> [email protected] wrote:
>
> > Note that scheduling and enforcement are split.  They do not always run
at
> > the same time.  Scheduling consists of finding the preferred set of
tasks
> > that should be running now.  Enforcement consists of comparing the
> > currently running set of tasks with the preferred set of tasks and
possibly
> > changing the set of running tasks.
>
> Yes, but my view is: "Why do we need it more complicated than it needs to
be?"
>
Enforcement is typically much faster than a re-schedule.  It is also called
more frequently.
>
> > I really don't understand the reasoning of schedule(job) or reschedule.
> > There can never be a perfect understanding of what just changed in the
> > system because one of the things that changes all of the time is the
> > estimated remaining runtime of tasks, and this is one of the items that
> > needs to drive the calculation of what is going to miss deadline.  What
is
> > going to miss deadline depends on all of the other tasks on the host,
and a
> > single task cannot be isolated from the rest for this test.
>
>
> Well, I am seeing a difference between first-time scheduling a job (in
> my case putting it into the queue at a fitting position)
> and rescheduling (ie. moving a TSI Job from the cpu to the 'not running'
> department, and refitting that free slot with work etc.)

CPU scheduling is not FIFO if there is more than one project.  It is FIFO
within a project and Round Robin between projects.  There are some slight
differences in the actual code that starts the task based on whether it is
in memory or not, but there is no difference in the test to see whether it
should be running, or needs to preempt now.  I haven't figured out why
there should be.  The start or re-start of the task does not really have a
bearing on the scheduling of the task.
>
> And yes, estimated runtime changes all the time, but do we need to care
> about that every second it reports back?
> Why not simply look at it when it comes to (re)scheduling events driven
> by the below stated points?
>
Remaining time may not change by just a little bit.  An example that occurs
sometimes:

Project Z is added to the mix of projects on a computer.  The estimate for
all tasks is 5 minutes, and so 50 are downloaded.  After the first task is
run, it is noticed that the estimate is WAY off and the tasks actually take
5 hours (and these numbers pretty much describe what happened).  Now we
really do have some serious deadline problems.  What used to look
completely reasonable no longer looks as reasonable.

The client cannot tell without testing whether something like this has
happened or not.



> > jm7
> >
> > Jonathan Hoser <[email protected]> wrote on 04/28/2009
10:25:02
> > AM:
> >
> >
> >> In my eyes this amounts to the following
> >>
> >> checkpointing:
> >> if task.runtime < TSI?
> >>    do nothing
> >> else
> >>    halt job
> >>    do rescheduling (job)
> >>
> >
> > Does not capture the fact that some tasks may need to start before the
task
> > they are pre-empting has completed its time slice.  Also does not seem
to
> > have multi CPU resources in mind.
> >
> 1. either we have time-slicing or we don't.
> If we really got a job with a deadline so close that waiting till the
> end of the current timeslice (with more cpu-cores a more regular event)
> will really mean its ultimate failure, then there's something wrong that
> we needn't fix, it shouldn't be the clients aim to fix shortcomings of
> supplying
> projects.

We have time slicing if we can, on as many CPUs as we can.  Sometimes an
immediate preempt is needed.

An example:

Task A from Project A never checkpoints.  It was started before Task B from
Project B was downloaded.

Task A has 48 hours remaining CPU time.
Task B has 24 hours till deadline, but only 1 hour of run time.

Sometime during the next 23 hours, Task A will have to be pre-empted NOT AT
A CHECKPOINT in order to allow Task B to run before deadline.  Task A will,
of course, have to remain in memory in order to make progress.

Believe it or not, there are projects that supply tasks that run for a week
without checkpointing.

> 2. yes it does have multiple cpus in mind. Or do you want to tell me,
> that every app is asked at the same time (over multiple cores) to
> checkpoint/
> does checkpoints in complete synchronisation with all other running
> apps? I think not.
> Thus this event will likely be triggered more often than the others, but
> will actually only do something if the timeslice /TSI of THAT app on
> THAT core
> is up.

Checkpoints happen at random when the task is ready to checkpoint, not when
the host is ready for it to checkpoint.  It will happen that more than one
task will checkpoint in the same second (the polling interval), not every
time of course, but it is going to happen.
> >> task complete:
> >> if task.complete
> >>    do rescheduling
> >>
> >> download complete:
> >>    do scheduling
> >>
> >> project/task resume/suspend
> >>    do rescheduling
> >>
> >> maybe (for sake of completeness):
> >> RPC complete:
> >> if server asks to drop WU
> >>    halt job;
> >>    do rescheduling (job)
> >>
> >> The trickier part of course are the
> >> scheduling /rescheduling calls, and I'm currently leafing through my
> >> notepad looking for the sketch...
> >> for my idea we'd need
> >>
> >> list of jobs run (wct, project)
> >> -> containing the wall-clock times for every job run during the last24
> >>
> > hours.
> >
> > How does the last 24 hours help?  Some tasks can run for days or weeks.
> >
> Elongate it to a fitting period of time. 24h is an idea I picked up from
> Paul 'to keep the mix interessting' - an idea I like.
> So, if a task is running 24h 7days ... we needn't have running a second,
> unless this is our ultimate high-priority project with a
> priority/resource share of 1000:1 or so.

I am sorry, but I am having no luck figuring out what the problem or the
solution is with the above paragraph.

> >> per resource (CPU/ GPU Type1 / GPU Type2 / Coproc) two(2) linked
> >> list of jobs eligible to run on that resource
> >> -> linked list1 (CPU) (containing all jobs in order of getting them)
> >> -> linked list2 (CPU) (short, only working'state)
> >> -> linked list1 (GPU)
> >> -> linked list2 (GPU)
> >> -> linked list (...)
> >>
> >
> > Those already exist (sort of).  There is a vector of all jobs, and a
vector
> > of all jobs that are started and not yet complete.  It is not broken
out by
> > resource type, but the resource type is included in the data.  Current
> > methodology is to skip over the tasks you don't want in the current
test.
> > Note:  I would have done it differently than the current
implementation.
> >
> Thought so, it seems the most practical way to doing things.
> >> reschedule (job)
> >>    if reason: task/project suspend
> >>       mark job(s) as suspended in linkedlist1(resourcetype)
> >>       addlast to linkedlist2(resourcetype)
> >>
> >>    if reason: task/project resume
> >>       mark job(s) as runable in linkedlist1(resourcetype)
> >>       if job(s) contained in linkedlist2(resourcetype)
> >>          order runnable jobs in linkedlist2(resourcetype) by order
> >> in linkedlist1(resourcetype) (or EDF)
> >>
> >>    if reason: drop WU
> >>       do cleanup
> >>       do reschedule reason 'task complete'
> >>
> >>    if reason: task complete
> >>       do cleanup
> >>       check (via wct-job-list and resourceshare/project priority)
> >> what project shall get resource now.
> >>       traverse linkedlist2(resourcetype) until end or eligible job
found
> >>       if job is found, launch it now.
> >>       else
> >>       traverse linkedlist1(resourcetype) until eligible job found or
end
> >>       if job is found, launch it now.
> >>       else
> >>       redo the above check and choose second/third/...-highest scoring
> >>
> > project
> > I thought linked link list 2 only included tasks that have been
started.
> > You need a list of tasks that would ideally be running now.
> >
> Nope, because this list is implicitly given by the fifo-ordering mixed
> with 'queue-jumping' of soon-to-meet-the-deadline-jobs AND
> the resource-share selector.

Except that CPU tasks are run round robin between projects - to give
variety.

> >>    if reason: checkpointing / TSI
> >>       addfirst (job) to linkedlist2(resourcetype)
> >>       do reschedule reason 'task complete'
> >>
> > But the task isn't complete.
> >
> Yes, but the logic to apply would be the same. Code reusal?
> > What about multiple reasons all at the same time?  Why do we need to
know
> > the reason in the first place?
> >
> Hm, some ordering could be chosen, I'll think about it; and the reason
> does have its place: not all events do have the same event, do they?

Currently events set a few flags.  In the cases that we are interested in,
either a schedule of CPUs, or an enforcement of the schedule, or both.
This discussion has focused on schedule and not enforcement.

> >> schedule(job)
> >>    if new job has Deadline earlier than
> >>       now + (sum of estimatedjobruntime of jobs in
> >>
> > linkedlist1(resourcetype)
> >
> >>          divided by resources available (e.g. CPUs))
> >>       then insert job in linkedlist1(resourcetype) so that
> >>          deadline will be met.
> >>
> >> sofar... my idea in pseudocode...
> >> the second (temporary) list could be omitted if having certain flags
> >> for the jobs in the first list -
> >> but adds some clarity for the first draft.
> >> The 'do cleanup' thingy would also need to add the runtime for the
> >> rescheduled job to the list of wallclocktimes;
> >> other than that... I hope I could outline my idea in a clear fashion;
> >>
> >>
> Best
>
> -Jonathan
>

BTW, the code is based on a polling loop, not event triggered call backs.
Events are picked up by the polling loop on each iteration - and multiple
events can be picked up at the same time because they all happened since
the last time through the polling loop.

> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to