I should have read this email before answering the other. So opal_crs.checkpoint() is used to checkpoint the process as well as restart the process? I would have expected opal_crs.restart() is used for restart. I am confused. Looking at CRS/BLCR checkpoint() seems to only checkpoint and restart() seems to only restart. The comment in opal/mca/crs/crs.h says the same as you say.
On Mon, Feb 17, 2014 at 03:43:08PM -0600, Josh Hursey wrote: > These values indicate the current state of the checkpointing lifecycle. In > particular CONTINUE/RESTART are set by the checkpointer in the CRS (all > others are used by the INC mechanism). In the opal_crs.checkpoint() call > the checkpointer will capture the program state and it is possible to > emerge from this function in one of two scenarios. Either we are continuing > execution in the original process (Continue state), or we are resuming > execution from a checkpointed state (Restart state). > > So if the checkpoint was successful, and you are not restarting the process > then you want OPAL_CRS_CONTINUE. > > If the process is being restarted from a checkpoint file, then we should > emerge from this function setting the state to OPAL_CRS_RESTART. > > The OPAL_CR_CHECKPOINT state is used in the INC mechanism to notify all of > the components to prepare for checkpoint (we probably should have called it > OPAL_CR_PREPARE_FOR_CKPT). So not really used by the CRS mechanisms at all. > You can see it used in the opal_cr_inc_core_prep() function in > opal/runtime/opal_cr.c > > -- Josh > > > > On Mon, Feb 17, 2014 at 9:28 AM, Adrian Reber <adr...@lisas.de> wrote: > > > This is probably for Josh. What is the meaning of the OPAL_CRS_* enums? > > > > They are probably used to communicate the state of the CRS modules. > > OPAL_CRS_ERROR seems to be used in case an error happened. What is the > > CRS module supposed to set this to if the checkpoint was successful. > > > > OPAL_CRS_CONTINUE or OPAL_CRS_CHECKPOINT? > > > > Adrian > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > -- > Joshua Hursey > Assistant Professor of Computer Science > University of Wisconsin-La Crosse > http://cs.uwlax.edu/~jjhursey > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel