These are more than tweaks. In any case, like I said, properly handling 24-hour jobs that don't checkpoint requires more than client changes: we'd need a mechanism to avoid sending such jobs to clients that will never finish them (e.g. because they shut down every night).
So for now: BOINC doesn't handle long jobs that don't checkpoint. The project in question can contact me for advice on how to add checkpointing. -- David [email protected] wrote: > I believe that some client tweaks can mostly, if not entirely, fix the > problem. The project in question is, I believe using the wrapper, and the > wrapped executable does not do its own checkpoints. This makes it > extremely difficult for them to supply checkpoints. > > What does not get fixed with the following (what case am I missing?): > > 1) Track the active wall time between checkpoints. (Time spent actually > running, not time spend swapped out). > 2) Subtract the longest wall time between checkpoints for any project (or > preferably app version) that has work assigned to the host that has yet to > complete, from the computation deadline. Do not subtract the wall time > since the last check point unless there are no other tasks from this app > version / project. > 3) Add the shortest wall time between checkpoints for any actively running > tasks to the estimated start delay as reported to the server. Slightly > better would be to subtract the wall time since the last checkpoint while > determining this. > > Addendum: For all new projects, assume some fairly long time between > checkpoints - until noted otherwise. Say 2 days. The default would be set > to the actual value for this app version / project as soon as the first > checkpoint was reached.. > > For all new app versions defer to the project value. > > Item #1 is required for both #2 and #3 to work. > > Item #2 fixes the client so that tasks that are already on the host will > start in EDF earlier if needed based on a long duration between checkpoints > for any task on the system. > > Item #3 fixes work requests so that tasks from very short deadline projects > will not be sent to a system that is busy with all tasks with long > durations between checkpoints. > > jm7 > > > > David Anderson > <[email protected] > ey.edu> To > Sent by: [email protected] > <boinc_alpha-boun cc > [email protected]. BOINC Alpha list > edu> <[email protected]> > Subject > Re: [boinc_alpha] 6.10.35 failure > 03/03/2010 06:04 to start a task on time to > PM meetdeadline, and no start of task > even after deadline. > > > > > > > > > > > Dealing effectively with long jobs that don't checkpoint > will require more than just some client tweaks. > Maybe we'll take this on at some point, but not now. > For now, let's encourage projects not to supply such jobs. > -- David > > [email protected] wrote: >> OK, this is now happening on a third computer (out of 11). That is > almost >> 30% of my ocmputers are exhibiting the behavior of returning work late >> because we are neither accounting for time between benchmarks, nor are we >> preempting when needed in order to meet other deadlines. > > _______________________________________________ > boinc_alpha mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > > > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
