I should have read this email before answering the other.

So opal_crs.checkpoint() is used to checkpoint the process as well as
restart the process? I would have expected opal_crs.restart() is used
for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to
only checkpoint and restart() seems to only restart. The comment in
opal/mca/crs/crs.h says the same as you say.


On Mon, Feb 17, 2014 at 03:43:08PM -0600, Josh Hursey wrote:
> These values indicate the current state of the checkpointing lifecycle. In
> particular CONTINUE/RESTART are set by the checkpointer in the CRS (all
> others are used by the INC mechanism). In the opal_crs.checkpoint() call
> the checkpointer will capture the program state and it is possible to
> emerge from this function in one of two scenarios. Either we are continuing
> execution in the original process (Continue state), or we are resuming
> execution from a checkpointed state (Restart state).
> 
> So if the checkpoint was successful, and you are not restarting the process
> then you want OPAL_CRS_CONTINUE.
> 
> If the process is being restarted from a checkpoint file, then we should
> emerge from this function setting the state to OPAL_CRS_RESTART.
> 
> The OPAL_CR_CHECKPOINT state is used in the INC mechanism to notify all of
> the components to prepare for checkpoint (we probably should have called it
> OPAL_CR_PREPARE_FOR_CKPT). So not really used by the CRS mechanisms at all.
> You can see it used in the opal_cr_inc_core_prep() function in
> opal/runtime/opal_cr.c
> 
> -- Josh
> 
> 
> 
> On Mon, Feb 17, 2014 at 9:28 AM, Adrian Reber <adr...@lisas.de> wrote:
> 
> > This is probably for Josh. What is the meaning of the OPAL_CRS_* enums?
> >
> > They are probably used to communicate the state of the CRS modules.
> > OPAL_CRS_ERROR seems to be used in case an error happened. What is the
> > CRS module supposed to set this to if the checkpoint was successful.
> >
> > OPAL_CRS_CONTINUE or OPAL_CRS_CHECKPOINT?
> >
> >                 Adrian
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> 
> 
> 
> -- 
> Joshua Hursey
> Assistant Professor of Computer Science
> University of Wisconsin-La Crosse
> http://cs.uwlax.edu/~jjhursey

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to