John: Can you put the following into a Trac ticket? Thanks -- David On 17-Jul-2012 6:30 AM, [email protected] wrote: > There are two cases where we do not wait for a checkpoint when > time-slicing. > > 1) When a task is being pre-empted by a high priority task. > 2) When a task is being swapped out because of a multi threaded task can > get swapped in. (OK, we wait for one of the tasks we are pre-empting). > > There are some projects that can write a checkpoint file every few seconds > to a minute or two. If BOINC set the "OK to checkpoint" flag and then > waited for a short time we might get more projects to checkpoint. How much > time might depend on which event is occurring. The rule of thumb is that > UI lag of more than 7 seconds is usually unacceptable unless there is some > warning and possibly some way out. Some of the operations have to happen > instantly (hibernation). And some there can be a delay. > > I would propose that the following wait times might be appropriate: > 1) When swapping out for a high priority task. Up to 1 minute. > 2) When swapping out for a multi threaded task. Up to the checkpoint > interval. Suspend each task as it checkpoints. > 3) Hibernation. Instant. > 4) Shutting down. I believe we currently wait for 30 seconds. If so, we > could set the checkpoint flag at the beginning of the 30 seconds. > 5) User suspending activity. 7 seconds. > > These will catch a different number of processes in each case. > jm7 > > > |------------> > | From: | > |------------> > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |David Anderson <[email protected]> > | > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | To: | > |------------> > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |<[email protected]> > | > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | Date: | > |------------> > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |07/16/2012 04:55 PM > | > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | Subject: | > |------------> > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |Re: [boinc_dev] [boinc_alpha] Tasks resume with same fraction done > | > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | Sent by: | > |------------> > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |<[email protected]> > | > > >--------------------------------------------------------------------------------------------------------------------------------------------------| > > > > > > Many (most?) applications can checkpoint only at specific moments > (e.g. completion of an outer loop) that may occur only every few minutes. > > When a job is ready to be preempted because of time-slicing, > the scheduler waits until it checkpoints. > So that's not an issue. > > The other cases are preempting because the user suspended activity, > the client is exiting, or the system is hibernating. > We could add a mechanism to request apps to checkpoint then, > but it would benefit only those apps that can checkpoint at any time. > > -- David > > On 16-Jul-2012 12:45 PM, Jon Sonntag wrote: >> Shouldn't time_to_checkpoint return true prior to BOINC suspending the > task? >> Then, only after checkpoint_completed is set, actually suspend the task? >> Or, is it up to the application to do a checkpoint by checking the >> boinc_status and doing a checkpoint even it not asked to do so when BOINC > is >> suspending the task? Otherwise, a couple minutes of work would be lost > on >> average every time it suspends. (5 minutes per checkpoint and switching >> between 2 projects every hour = 2.5 minutes lost on average. For long >> tasks, that could add up to a lot of time.) If it is the developers job > to >> checkpoint on suspend, I would suggest adding that code to the sample > apps >> as startup projects often use the uppercase sample apps as a template for >> their own code. >> >> Note: This topic started on BOINC_ALPHA, but I felt BOINC_DEV was a more >> appropriate place to get more clarification and/or expand the discussion. >> >> Jon Sonntag >> [email protected] >> >>> -----Original Message----- >>> From: [email protected] [mailto:boinc_alpha- >>> [email protected]] On Behalf Of David Anderson >>> Sent: Monday, July 16, 2012 2:28 PM >>> To: [email protected] >>> Subject: Re: [boinc_alpha] Tasks resume with same fraction done >>> >>> Tasks resume from wherever they last checkpointed. >>> This is true whether you install a new version, or simply stop/start the >> client. >>> -- David >>> >>> On 16-Jul-2012 7:21 AM, Jan Pillár wrote: >>>> Hi, >>>> >>>> >>>> I have a question about installing new version of BOINC over older one >>>> - tasks should resume with same fraction done. Should the tasks start >>>> with exactly the same fraction done or is there an acceptable > tolerance? >>>> >>>> For example, before installation of new version my tasks were at 34 %, >>>> after installation they were at 31 %. Is that OK? Does it have >>>> anything to do with "Task checkpoint to disk" settings? >>>> >>>> Kind Regards, >>>> >>>> Jan _______________________________________________ >>> boinc_alpha >>>> mailing list [email protected] >>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha To >>>> unsubscribe, visit the above URL and (near bottom of page) enter your >>> email address. >>>> >>> >>> _______________________________________________ >>> boinc_alpha mailing list >>> [email protected] >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha >>> To unsubscribe, visit the above URL and >>> (near bottom of page) enter your email address. >> >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. >> > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > > >
_______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
