Re: [boinc_dev] 6.6.20 and work scheduling

John . McLeod Tue, 28 Apr 2009 08:56:23 -0700

jm7


[email protected] wrote on 04/28/2009 11:37:32 AM:

> Martin <[email protected]>
> Sent by: [email protected]
>
> 04/28/2009 11:37 AM
>
> To
>
> BOINC dev <[email protected]>
>
> cc
>
> Subject
>
> Re: [boinc_dev] 6.6.20 and work scheduling
>
> [email protected] wrote:
> [...]
> > I really don't understand the reasoning of schedule(job) or reschedule.
> > There can never be a perfect understanding of what just changed in the
> > system because one of the things that changes all of the time is the
> > estimated remaining runtime of tasks, and this is one of the items that
> > needs to drive the calculation of what is going to miss deadline.  What
is
> > going to miss deadline depends on all of the other tasks on the host,
and a
> > single task cannot be isolated from the rest for this test.
>
> There appears to be a phenomenal amount of effort both in programming
> and in scheduler CPU time in trying to meet exactly all deadlines down
> to the last millisecond and for all eventualities.

At least not exceed the deadlines in as many cases as possible.
>
> We should never run a system to be so finely deadline critical. It
> becomes overly difficult and unstable.
>
> Perhaps the deadline exactness and trying to meet all scenarios
> precisely should be relaxed somewhat? "Chill-out" on the scheduler and
> development effort?

We do not try to meet the deadlines to the exact millisecond.  What gave
you that idea?  Not meeting a deadline carries the risk that the work will
be useless, be discarded as such, and will not grant credit.  There are
some people that care about credit.  In other words, intentionally late
work is unacceptable.
>
> In other words, move to a KISS solution?

As long as it is not too simple.  Sometimes the obvious simple solution
does not work.
>
>
> New rule:
>
> If we are going to accept that the project servers are going to be or
> can be unreasonable, then the client must have the option to be equally
> obnoxious (but completely honest) and reject WUs that are unworkable,
> rather than attempting a critical futility (and failing days later).
>
> Add a bit of margin and then you can have only the TSI, and user
> suspend/release as your scheduler trigger events. The work fetch and
> send just independently keeps the WU cache filled but not overfilled.
>
Tasks come in discrete units, they do not come in convenient sized
packages.  CPDN can run for months on end, and because of this, it was
decided that some variety on the host (if at all possible) would be a good
idea.  This limits the ability to "not overfill".
>
> The next question is whether it is better/simpler to keep (a linked
> list) FIFO for all resource instances of the same type or a FIFO per
> each resource instance.
>
> Immediately junk unstarted WUs for resend elsewhere if deadline trouble
> ensues, rather than panic to try to scrape them in late.
>
> That will also implement a natural feedback loop for project admins to
> get in tune with what deadlines are reasonable for their WUs. They will
> see by their returns rate whether they are being too aggressive when
> compared to the capability and cache sizes/lengths of their volunteers.
>
> In my view, the present system is wide open for queue jumping abuse by a
> project demanding short deadlines... (Just set a silly short deadline,
> put everyone into EDF, and accept all the results back regardless of
> whenever they come.)
>
>
> Simple?
>
> Regards,
> Martin
>
Servers treat aborted tasks as errors.  Your daily quota is reduced by one
for each one of these.  This leads to the problem:

Request a second of work from project A.
Run into deadline trouble.
Abort the task.
Since A is STILL the project with the highest LTD:  Request a second of
work from project A.
Run into deadline trouble.
Abort the task.

Repeat once every 10 seconds until the daily quota has been met (half of
what it was at the beginning of the series).  This also tends to DOS the
server.

Pick up tomorrow at the same place.  Do the same thing.

The help desk gets panicked queries about why.

We have already seen this in s...@h where there is a custom application that
cherry picks CUDA tasks (rejecting the ones that are known to take a very
long time on CUDA).  This has driven the daily quota of some people well
below what their computer can actually do.  We do not want a repeat of that
intentionally in the client.
>
> --
> --------------------
> Martin Lomas
> m_boincdev ml1 co uk.ddSPAM.dd
> --------------------
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to