These are more than tweaks.
In any case, like I said, properly handling 24-hour jobs
that don't checkpoint requires more than client changes:
we'd need a mechanism to avoid sending such jobs
to clients that will never finish them
(e.g. because they shut down every night).

So for now: BOINC doesn't handle long jobs that don't checkpoint.
The project in question can contact me for advice on
how to add checkpointing.

-- David

[email protected] wrote:
> I believe that some client tweaks can mostly, if not entirely, fix the
> problem.  The project in question is, I believe using the wrapper, and the
> wrapped executable does not do its own checkpoints.  This makes it
> extremely difficult for them to supply checkpoints.
> 
> What does not get fixed with the following (what case am I missing?):
> 
> 1)  Track the active wall time between checkpoints.  (Time spent actually
> running, not time spend swapped out).
> 2)  Subtract the longest wall time between checkpoints for any project (or
> preferably app version) that has work assigned to the host that has yet to
> complete, from the computation deadline.  Do not subtract the wall time
> since the last check point unless there are no other tasks from this app
> version / project.
> 3)  Add the shortest wall time between checkpoints for any actively running
> tasks to the estimated start delay as reported to the server.  Slightly
> better would be to subtract the wall time since the last checkpoint while
> determining this.
> 
> Addendum:  For all new projects, assume some fairly long time between
> checkpoints - until noted otherwise.  Say 2 days.  The default would be set
> to the actual value for this app version / project as soon as the first
> checkpoint was reached..
> 
> For all new app versions defer to the project value.
> 
> Item #1 is required for both #2 and #3 to work.
> 
> Item #2 fixes the client so that tasks that are already on the host will
> start in EDF earlier if needed based on a long duration between checkpoints
> for any task on the system.
> 
> Item #3 fixes work requests so that tasks from very short deadline projects
> will not be sent to a system that is busy with all tasks with long
> durations between checkpoints.
> 
> jm7
> 
> 
>                                                                            
>              David Anderson                                                
>              <[email protected]                                             
>              ey.edu>                                                    To 
>              Sent by:                  [email protected]              
>              <boinc_alpha-boun                                          cc 
>              [email protected].         BOINC Alpha list                    
>              edu>                      <[email protected]>      
>                                                                    Subject 
>                                        Re: [boinc_alpha] 6.10.35 failure   
>              03/03/2010 06:04          to start a task on time to          
>              PM                        meetdeadline, and no start of task  
>                                        even after deadline.                
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
> 
> 
> 
> 
> Dealing effectively with long jobs that don't checkpoint
> will require more than just some client tweaks.
> Maybe we'll take this on at some point, but not now.
> For now, let's encourage projects not to supply such jobs.
> -- David
> 
> [email protected] wrote:
>> OK, this is now happening on a third computer (out of 11).  That is
> almost
>> 30% of my ocmputers are exhibiting the behavior of returning work late
>> because we are neither accounting for time between benchmarks, nor are we
>> preempting when needed in order to meet other deadlines.
> 
> _______________________________________________
> boinc_alpha mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
> 
> 

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to