jm7
[email protected] wrote on 04/28/2009 11:37:32 AM: > Martin <[email protected]> > Sent by: [email protected] > > 04/28/2009 11:37 AM > > To > > BOINC dev <[email protected]> > > cc > > Subject > > Re: [boinc_dev] 6.6.20 and work scheduling > > [email protected] wrote: > [...] > > I really don't understand the reasoning of schedule(job) or reschedule. > > There can never be a perfect understanding of what just changed in the > > system because one of the things that changes all of the time is the > > estimated remaining runtime of tasks, and this is one of the items that > > needs to drive the calculation of what is going to miss deadline. What is > > going to miss deadline depends on all of the other tasks on the host, and a > > single task cannot be isolated from the rest for this test. > > There appears to be a phenomenal amount of effort both in programming > and in scheduler CPU time in trying to meet exactly all deadlines down > to the last millisecond and for all eventualities. At least not exceed the deadlines in as many cases as possible. > > We should never run a system to be so finely deadline critical. It > becomes overly difficult and unstable. > > Perhaps the deadline exactness and trying to meet all scenarios > precisely should be relaxed somewhat? "Chill-out" on the scheduler and > development effort? We do not try to meet the deadlines to the exact millisecond. What gave you that idea? Not meeting a deadline carries the risk that the work will be useless, be discarded as such, and will not grant credit. There are some people that care about credit. In other words, intentionally late work is unacceptable. > > In other words, move to a KISS solution? As long as it is not too simple. Sometimes the obvious simple solution does not work. > > > New rule: > > If we are going to accept that the project servers are going to be or > can be unreasonable, then the client must have the option to be equally > obnoxious (but completely honest) and reject WUs that are unworkable, > rather than attempting a critical futility (and failing days later). > > Add a bit of margin and then you can have only the TSI, and user > suspend/release as your scheduler trigger events. The work fetch and > send just independently keeps the WU cache filled but not overfilled. > Tasks come in discrete units, they do not come in convenient sized packages. CPDN can run for months on end, and because of this, it was decided that some variety on the host (if at all possible) would be a good idea. This limits the ability to "not overfill". > > The next question is whether it is better/simpler to keep (a linked > list) FIFO for all resource instances of the same type or a FIFO per > each resource instance. > > Immediately junk unstarted WUs for resend elsewhere if deadline trouble > ensues, rather than panic to try to scrape them in late. > > That will also implement a natural feedback loop for project admins to > get in tune with what deadlines are reasonable for their WUs. They will > see by their returns rate whether they are being too aggressive when > compared to the capability and cache sizes/lengths of their volunteers. > > In my view, the present system is wide open for queue jumping abuse by a > project demanding short deadlines... (Just set a silly short deadline, > put everyone into EDF, and accept all the results back regardless of > whenever they come.) > > > Simple? > > Regards, > Martin > Servers treat aborted tasks as errors. Your daily quota is reduced by one for each one of these. This leads to the problem: Request a second of work from project A. Run into deadline trouble. Abort the task. Since A is STILL the project with the highest LTD: Request a second of work from project A. Run into deadline trouble. Abort the task. Repeat once every 10 seconds until the daily quota has been met (half of what it was at the beginning of the series). This also tends to DOS the server. Pick up tomorrow at the same place. Do the same thing. The help desk gets panicked queries about why. We have already seen this in s...@h where there is a custom application that cherry picks CUDA tasks (rejecting the ones that are known to take a very long time on CUDA). This has driven the daily quota of some people well below what their computer can actually do. We do not want a repeat of that intentionally in the client. > > -- > -------------------- > Martin Lomas > m_boincdev ml1 co uk.ddSPAM.dd > -------------------- > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
