On Mar 6, 2014, at 1:02 PM, Adrian Reber <adr...@lisas.de> wrote:

> On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote:
>>>>>> I tried to implement something like you described. It is not yet event
>>>>>> driven, but before continuing I wanted to get some feedback if it is at
>>>>>> least the right start:
>>>>>> 
>>>>>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706
>>>>>> 
>>>>>> I looked at the other ORTE_OOB_* macros and tried to model my
>>>>>> functionality a bit after what I have seen there. Right now it is still
>>>>>> a simple function which just tries to call ft_event() on all oob
>>>>>> components. Does this look right so far?
>>>>> 
>>>>> Sorry for delay - yes, that looks like the right direction. I would 
>>>>> suggest doing it via the current state machine, though, by simply 
>>>>> defining another job or proc state in orte/mca/plm/plm_types.h, and then 
>>>>> registering a callback function using the 
>>>>> orte_state.add_job[proc]_state(state, function to be called, 
>>>>> ORTE_ERR_PRI). Then you can activate it by calling 
>>>>> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the 
>>>>> proper order.
>>>> 
>>>> What is a job/proc in the Open MPI context.
>>> 
>>> A "job" is the entire application, while a "proc" is just one process in 
>>> that application. In this case you could use either one as you are 
>>> checkpointing the entire job, but all this activity is occurring inside 
>>> each proc. So I'd suggest defining it as a proc state since it only really 
>>> involves local actions.
>>> 
>>> If you like, I can define the required code in the trunk and let you fill 
>>> in the event functionality.
>> 
>> That would be great.
> 
> Thanks for your changes. When using --with-ft there are a few compiler
> errors which I tried to fix with following patch:
> 
> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c

That looks okay, with the only caveat being that you wouldn't ordinarily pass 
the state_caddy_t into a function. It's just there to pass along the job etc in 
case the callback function needs to reference something. In this case, I can't 
think of anything the FT event function would need to know - you just want it 
to quiet all messaging.


> 
>               Adrian
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/03/14309.php

Reply via email to