jm7

Jonathan Hoser <[email protected]> wrote on 04/28/2009 10:25:02
AM:

> In my eyes this amounts to the following
>
> checkpointing:
> if task.runtime < TSI?
>    do nothing
> else
>    halt job
>    do rescheduling (job)

Does not capture the fact that some tasks may need to start before the task
they are pre-empting has completed its time slice.  Also does not seem to
have multi CPU resources in mind.
>
> task complete:
> if task.complete
>    do rescheduling
>
> download complete:
>    do scheduling
>
> project/task resume/suspend
>    do rescheduling
>
> maybe (for sake of completeness):
> RPC complete:
> if server asks to drop WU
>    halt job;
>    do rescheduling (job)
>
> The trickier part of course are the
> scheduling /rescheduling calls, and I'm currently leafing through my
> notepad looking for the sketch...
> for my idea we'd need
>
> list of jobs run (wct, project)
> -> containing the wall-clock times for every job run during the last24
hours.

How does the last 24 hours help?  Some tasks can run for days or weeks.
>
> per resource (CPU/ GPU Type1 / GPU Type2 / Coproc) two(2) linked
> list of jobs eligible to run on that resource
> -> linked list1 (CPU) (containing all jobs in order of getting them)
> -> linked list2 (CPU) (short, only working'state)
> -> linked list1 (GPU)
> -> linked list2 (GPU)
> -> linked list (...)

Those already exist (sort of).  There is a vector of all jobs, and a vector
of all jobs that are started and not yet complete.  It is not broken out by
resource type, but the resource type is included in the data.  Current
methodology is to skip over the tasks you don't want in the current test.
Note:  I would have done it differently than the current implementation.
>
> reschedule (job)
>    if reason: task/project suspend
>       mark job(s) as suspended in linkedlist1(resourcetype)
>       addlast to linkedlist2(resourcetype)
>
>    if reason: task/project resume
>       mark job(s) as runable in linkedlist1(resourcetype)
>       if job(s) contained in linkedlist2(resourcetype)
>          order runnable jobs in linkedlist2(resourcetype) by order
> in linkedlist1(resourcetype) (or EDF)
>
>    if reason: drop WU
>       do cleanup
>       do reschedule reason 'task complete'
>
>    if reason: task complete
>       do cleanup
>       check (via wct-job-list and resourceshare/project priority)
> what project shall get resource now.
>       traverse linkedlist2(resourcetype) until end or eligible job found
>       if job is found, launch it now.
>       else
>       traverse linkedlist1(resourcetype) until eligible job found or end
>       if job is found, launch it now.
>       else
>       redo the above check and choose second/third/...-highest scoring
project
I thought linked link list 2 only included tasks that have been started.
You need a list of tasks that would ideally be running now.
>
>    if reason: checkpointing / TSI
>       addfirst (job) to linkedlist2(resourcetype)
>       do reschedule reason 'task complete'
But the task isn't complete.

What about multiple reasons all at the same time?  Why do we need to know
the reason in the first place?
>
> schedule(job)
>    if new job has Deadline earlier than
>       now + (sum of estimatedjobruntime of jobs in
linkedlist1(resourcetype)
>          divided by resources available (e.g. CPUs))
>       then insert job in linkedlist1(resourcetype) so that
>          deadline will be met.
>
> sofar... my idea in pseudocode...
> the second (temporary) list could be omitted if having certain flags
> for the jobs in the first list -
> but adds some clarity for the first draft.
> The 'do cleanup' thingy would also need to add the runtime for the
> rescheduled job to the list of wallclocktimes;
> other than that... I hope I could outline my idea in a clear fashion;
>
> Best
> -Jonathan
>
>
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to