On Mar 6, 2014, at 1:02 PM, Adrian Reber <adr...@lisas.de> wrote: > On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote: >>>>>> I tried to implement something like you described. It is not yet event >>>>>> driven, but before continuing I wanted to get some feedback if it is at >>>>>> least the right start: >>>>>> >>>>>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706 >>>>>> >>>>>> I looked at the other ORTE_OOB_* macros and tried to model my >>>>>> functionality a bit after what I have seen there. Right now it is still >>>>>> a simple function which just tries to call ft_event() on all oob >>>>>> components. Does this look right so far? >>>>> >>>>> Sorry for delay - yes, that looks like the right direction. I would >>>>> suggest doing it via the current state machine, though, by simply >>>>> defining another job or proc state in orte/mca/plm/plm_types.h, and then >>>>> registering a callback function using the >>>>> orte_state.add_job[proc]_state(state, function to be called, >>>>> ORTE_ERR_PRI). Then you can activate it by calling >>>>> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the >>>>> proper order. >>>> >>>> What is a job/proc in the Open MPI context. >>> >>> A "job" is the entire application, while a "proc" is just one process in >>> that application. In this case you could use either one as you are >>> checkpointing the entire job, but all this activity is occurring inside >>> each proc. So I'd suggest defining it as a proc state since it only really >>> involves local actions. >>> >>> If you like, I can define the required code in the trunk and let you fill >>> in the event functionality. >> >> That would be great. > > Thanks for your changes. When using --with-ft there are a few compiler > errors which I tried to fix with following patch: > > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c
That looks okay, with the only caveat being that you wouldn't ordinarily pass the state_caddy_t into a function. It's just there to pass along the job etc in case the callback function needs to reference something. In this case, I can't think of anything the FT event function would need to know - you just want it to quiet all messaging. > > Adrian > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14309.php