On Thu, Mar 06, 2014 at 07:47:22PM -0800, Ralph Castain wrote:
> >>>>> Sorry for delay - yes, that looks like the right direction. I would 
> >>>>> suggest doing it via the current state machine, though, by simply 
> >>>>> defining another job or proc state in orte/mca/plm/plm_types.h, and 
> >>>>> then registering a callback function using the 
> >>>>> orte_state.add_job[proc]_state(state, function to be called, 
> >>>>> ORTE_ERR_PRI). Then you can activate it by calling 
> >>>>> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in 
> >>>>> the proper order.
> >>>> 
> >>>> What is a job/proc in the Open MPI context.
> >>> 
> >>> A "job" is the entire application, while a "proc" is just one process in 
> >>> that application. In this case you could use either one as you are 
> >>> checkpointing the entire job, but all this activity is occurring inside 
> >>> each proc. So I'd suggest defining it as a proc state since it only 
> >>> really involves local actions.
> >>> 
> >>> If you like, I can define the required code in the trunk and let you fill 
> >>> in the event functionality.
> >> 
> >> That would be great.
> > 
> > Thanks for your changes. When using --with-ft there are a few compiler
> > errors which I tried to fix with following patch:
> > 
> > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c
> 
> That looks okay, with the only caveat being that you wouldn't ordinarily pass 
> the state_caddy_t into a function. It's just there to pass along the job etc 
> in case the callback function needs to reference something. In this case, I 
> can't think of anything the FT event function would need to know - you just 
> want it to quiet all messaging.

I need to pass the type of state to the ft_event() functions:

enum opal_crs_state_type_t {
    OPAL_CRS_NONE        = 0,
    OPAL_CRS_CHECKPOINT  = 1,
    OPAL_CRS_RESTART_PRE = 2,
    OPAL_CRS_RESTART     = 3, /* RESTART_POST */

so an int is all I need. So I probably need to encode it into *cbdata. Do I
just use an int directly in *cbdata or should it be part of a struct?

                Adrian

Reply via email to