Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Ralph Castain
Hi George et al

I have begun documenting the RecoS operation on the OMPI wiki:

https://svn.open-mpi.org/trac/ompi/wiki/RecoS

I'll continue to work on this over the next few days by adding a section 
explaining what was changed outside of the new framework to make it all work. 
In addition, I am revising the recos.h API documentation.

Hope to have all that done over the weekend.


On Feb 23, 2010, at 4:00 PM, Ralph Castain wrote:

> 
> On Feb 23, 2010, at 3:32 PM, George Bosilca wrote:
> 
>> Ralph, Josh,
>> 
>> We have some comments about the API of the new framework, mostly 
>> clarifications needed to better understand how this new framework is 
>> supposed to be used. And a request for a deadline extension, to delay the 
>> code merge from the Recos branch in the trunk by a week.
>> 
>> We have our own FT branch, with a totally different approach than what is 
>> described in your RFC. Unfortunately, it diverged from the trunk about a 
>> year ago, and merging back had proven to be a quite difficult task. Some of 
>> the functionality in the Recos framework is clearly beneficial for what we 
>> did, and has the potential to facilitate the porting of most of the features 
>> from our brach back in trunk. We would like the deadline extension in order 
>> to deeply analyze the impact of the Recos framework on our work, and see how 
>> we can fit everything together back in the trunk of Open MPI.
> 
> No problem with the extension - feel free to suggest modifications to make 
> the merge easier. This is by no means cast in stone, but rather a starting 
> point.
> 
>> 
>> Here are some comments about the code:
>> 
>> 1. The documentation in recos.h is not very clear. Most of the functions use 
>> only IN arguments, and are not supposed to return any values. We don't see 
>> how the functions are supposed to be used, and what is supposed to be their 
>> impact on the ORTE framework data.
> 
> I'll try to clarify the comments tonight (I know Josh is occupied right now). 
> The recos APIs are called from two locations:
> 
> 1. The errmgr calls recos whenever it receives a report of an aborted process 
> (via the errmgr.proc_aborted API). The idea was for recos to determine what 
> (if anything) to do about the failed process. 
> 
> 2. The rmaps modules can call the recos "suggest_map_targets" API to get a 
> list of suggested nodes for the process that is to be restarted. At the 
> moment, only the resilient mapper module does this. However, Josh and I are 
> looking at reorganizing some functionality currently in that mapper module 
> and making all of the existing mappers be "resilient".
> 
> So basically, the recos modules determine the recovery procedure and execute 
> it. For example, in the "orcm" module, we actually update the various 
> proc/job objects to prep them for restart and call plm.spawn from within that 
> module. If instead you use the ignore module, it falls through to the recos 
> base functions which call "abort" to kill the job. Again, the action is taken 
> local to recos, so nothing need be returned.
> 
> The functions generally don't return values (other than success/error) 
> because we couldn't think of anything useful to return to the errmgr. 
> Whatever recos does about an aborted proc, the errmgr doesn't do anything 
> further - if you look in that code, you'll see that if recos is enabled, all 
> the errmgr does is call recos and return.
> 
> Again, this can be changed if desired.
> 
>> 
>> 2. Why do we have all the char***? Why are they only declared as IN 
>> arguments?
> 
> I take it you mean in the predicted fault API? I believe Josh was including 
> that strictly as a placeholder. As you undoubtedly recall, I removed the fddp 
> framework from the trunk (devel continues off-line), so Josh wasn't sure what 
> I might want to input here. If you look at the modules themselves, you will 
> see the implementation is essentially empty at this time.
> 
> We had discussed simply removing that API for now until we determined if/when 
> fault prediction would return to the OMPI trunk. It was kind of a tossup - so 
> we left if for now. Could just as easily be removed until a later date - 
> either way is fine with us.
> 
>> 
>> 3. The orte_recos_base_process_fault_fn_t function use the node_list as an 
>> IN/OUT argument. Why? If the list is modified, then we have a scalability 
>> problem, as the list will have to be rebuilt before each call.
> 
> Looking...looking...hmm.
> 
> typedef int (*orte_recos_base_process_fault_fn_t)
> (orte_job_t *jdata, orte_process_name_t *proec_name, orte_proc_state_t 
> state, int *stack_state);
> 
> There is no node list, or list of any type, going in or out of that function. 
> I suspect you meant the one below it:
> 
> typedef int (*orte_recos_base_suggest_map_targets_fn_t)
> (orte_proc_t *proc, orte_node_t *oldnode, opal_list_t *node_list);
> 
> I concur with your concern about scalability here. However, I believe the 
> idea was that we 

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Leonardo Fialho
Hi Ralph,

Very interesting the "composite framework" idea. Regarding to the schema 
represented by the picture, I didn't understand the RecoS' behaviour in a node 
failure situation.

In this case, will mpirun consider the daemon failure as a normal proc failure? 
If it is correct, should mpirun update the global procs state for all jobs 
running under the failed daemon?

Best regards,
Leonardo

On Feb 25, 2010, at 7:05 AM, Ralph Castain wrote:

> Hi George et al
> 
> I have begun documenting the RecoS operation on the OMPI wiki:
> 
> https://svn.open-mpi.org/trac/ompi/wiki/RecoS
> 
> I'll continue to work on this over the next few days by adding a section 
> explaining what was changed outside of the new framework to make it all work. 
> In addition, I am revising the recos.h API documentation.
> 
> Hope to have all that done over the weekend.
> 
> 
> On Feb 23, 2010, at 4:00 PM, Ralph Castain wrote:
> 
>> 
>> On Feb 23, 2010, at 3:32 PM, George Bosilca wrote:
>> 
>>> Ralph, Josh,
>>> 
>>> We have some comments about the API of the new framework, mostly 
>>> clarifications needed to better understand how this new framework is 
>>> supposed to be used. And a request for a deadline extension, to delay the 
>>> code merge from the Recos branch in the trunk by a week.
>>> 
>>> We have our own FT branch, with a totally different approach than what is 
>>> described in your RFC. Unfortunately, it diverged from the trunk about a 
>>> year ago, and merging back had proven to be a quite difficult task. Some of 
>>> the functionality in the Recos framework is clearly beneficial for what we 
>>> did, and has the potential to facilitate the porting of most of the 
>>> features from our brach back in trunk. We would like the deadline extension 
>>> in order to deeply analyze the impact of the Recos framework on our work, 
>>> and see how we can fit everything together back in the trunk of Open MPI.
>> 
>> No problem with the extension - feel free to suggest modifications to make 
>> the merge easier. This is by no means cast in stone, but rather a starting 
>> point.
>> 
>>> 
>>> Here are some comments about the code:
>>> 
>>> 1. The documentation in recos.h is not very clear. Most of the functions 
>>> use only IN arguments, and are not supposed to return any values. We don't 
>>> see how the functions are supposed to be used, and what is supposed to be 
>>> their impact on the ORTE framework data.
>> 
>> I'll try to clarify the comments tonight (I know Josh is occupied right 
>> now). The recos APIs are called from two locations:
>> 
>> 1. The errmgr calls recos whenever it receives a report of an aborted 
>> process (via the errmgr.proc_aborted API). The idea was for recos to 
>> determine what (if anything) to do about the failed process. 
>> 
>> 2. The rmaps modules can call the recos "suggest_map_targets" API to get a 
>> list of suggested nodes for the process that is to be restarted. At the 
>> moment, only the resilient mapper module does this. However, Josh and I are 
>> looking at reorganizing some functionality currently in that mapper module 
>> and making all of the existing mappers be "resilient".
>> 
>> So basically, the recos modules determine the recovery procedure and execute 
>> it. For example, in the "orcm" module, we actually update the various 
>> proc/job objects to prep them for restart and call plm.spawn from within 
>> that module. If instead you use the ignore module, it falls through to the 
>> recos base functions which call "abort" to kill the job. Again, the action 
>> is taken local to recos, so nothing need be returned.
>> 
>> The functions generally don't return values (other than success/error) 
>> because we couldn't think of anything useful to return to the errmgr. 
>> Whatever recos does about an aborted proc, the errmgr doesn't do anything 
>> further - if you look in that code, you'll see that if recos is enabled, all 
>> the errmgr does is call recos and return.
>> 
>> Again, this can be changed if desired.
>> 
>>> 
>>> 2. Why do we have all the char***? Why are they only declared as IN 
>>> arguments?
>> 
>> I take it you mean in the predicted fault API? I believe Josh was including 
>> that strictly as a placeholder. As you undoubtedly recall, I removed the 
>> fddp framework from the trunk (devel continues off-line), so Josh wasn't 
>> sure what I might want to input here. If you look at the modules themselves, 
>> you will see the implementation is essentially empty at this time.
>> 
>> We had discussed simply removing that API for now until we determined 
>> if/when fault prediction would return to the OMPI trunk. It was kind of a 
>> tossup - so we left if for now. Could just as easily be removed until a 
>> later date - either way is fine with us.
>> 
>>> 
>>> 3. The orte_recos_base_process_fault_fn_t function use the node_list as an 
>>> IN/OUT argument. Why? If the list is modified, then we have a scalability 
>>> problem, as the list will have to be rebuilt before 

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Leonardo Fialho
Hi Ralph,

Very interesting the "composite framework" idea. Regarding to the schema 
represented by the picture, I didn't understand the RecoS' behaviour in a node 
failure situation.

In this case, will mpirun consider the daemon failure as a normal proc failure? 
If it is correct, should mpirun update the global procs state for all jobs 
running under the failed daemon?

Best regards,
Leonardo

On Feb 25, 2010, at 7:05 AM, Ralph Castain wrote:

> Hi George et al
> 
> I have begun documenting the RecoS operation on the OMPI wiki:
> 
> https://svn.open-mpi.org/trac/ompi/wiki/RecoS
> 
> I'll continue to work on this over the next few days by adding a section 
> explaining what was changed outside of the new framework to make it all work. 
> In addition, I am revising the recos.h API documentation.
> 
> Hope to have all that done over the weekend.
> 
> 
> On Feb 23, 2010, at 4:00 PM, Ralph Castain wrote:
> 
>> 
>> On Feb 23, 2010, at 3:32 PM, George Bosilca wrote:
>> 
>>> Ralph, Josh,
>>> 
>>> We have some comments about the API of the new framework, mostly 
>>> clarifications needed to better understand how this new framework is 
>>> supposed to be used. And a request for a deadline extension, to delay the 
>>> code merge from the Recos branch in the trunk by a week.
>>> 
>>> We have our own FT branch, with a totally different approach than what is 
>>> described in your RFC. Unfortunately, it diverged from the trunk about a 
>>> year ago, and merging back had proven to be a quite difficult task. Some of 
>>> the functionality in the Recos framework is clearly beneficial for what we 
>>> did, and has the potential to facilitate the porting of most of the 
>>> features from our brach back in trunk. We would like the deadline extension 
>>> in order to deeply analyze the impact of the Recos framework on our work, 
>>> and see how we can fit everything together back in the trunk of Open MPI.
>> 
>> No problem with the extension - feel free to suggest modifications to make 
>> the merge easier. This is by no means cast in stone, but rather a starting 
>> point.
>> 
>>> 
>>> Here are some comments about the code:
>>> 
>>> 1. The documentation in recos.h is not very clear. Most of the functions 
>>> use only IN arguments, and are not supposed to return any values. We don't 
>>> see how the functions are supposed to be used, and what is supposed to be 
>>> their impact on the ORTE framework data.
>> 
>> I'll try to clarify the comments tonight (I know Josh is occupied right 
>> now). The recos APIs are called from two locations:
>> 
>> 1. The errmgr calls recos whenever it receives a report of an aborted 
>> process (via the errmgr.proc_aborted API). The idea was for recos to 
>> determine what (if anything) to do about the failed process. 
>> 
>> 2. The rmaps modules can call the recos "suggest_map_targets" API to get a 
>> list of suggested nodes for the process that is to be restarted. At the 
>> moment, only the resilient mapper module does this. However, Josh and I are 
>> looking at reorganizing some functionality currently in that mapper module 
>> and making all of the existing mappers be "resilient".
>> 
>> So basically, the recos modules determine the recovery procedure and execute 
>> it. For example, in the "orcm" module, we actually update the various 
>> proc/job objects to prep them for restart and call plm.spawn from within 
>> that module. If instead you use the ignore module, it falls through to the 
>> recos base functions which call "abort" to kill the job. Again, the action 
>> is taken local to recos, so nothing need be returned.
>> 
>> The functions generally don't return values (other than success/error) 
>> because we couldn't think of anything useful to return to the errmgr. 
>> Whatever recos does about an aborted proc, the errmgr doesn't do anything 
>> further - if you look in that code, you'll see that if recos is enabled, all 
>> the errmgr does is call recos and return.
>> 
>> Again, this can be changed if desired.
>> 
>>> 
>>> 2. Why do we have all the char***? Why are they only declared as IN 
>>> arguments?
>> 
>> I take it you mean in the predicted fault API? I believe Josh was including 
>> that strictly as a placeholder. As you undoubtedly recall, I removed the 
>> fddp framework from the trunk (devel continues off-line), so Josh wasn't 
>> sure what I might want to input here. If you look at the modules themselves, 
>> you will see the implementation is essentially empty at this time.
>> 
>> We had discussed simply removing that API for now until we determined 
>> if/when fault prediction would return to the OMPI trunk. It was kind of a 
>> tossup - so we left if for now. Could just as easily be removed until a 
>> later date - either way is fine with us.
>> 
>>> 
>>> 3. The orte_recos_base_process_fault_fn_t function use the node_list as an 
>>> IN/OUT argument. Why? If the list is modified, then we have a scalability 
>>> problem, as the list will have to be rebuilt before 

Re: [OMPI devel] what's the relationship between proc, endpoint and btl?

2010-02-25 Thread hu yaohui
Thanks a lot! i got it.Could you introduce some more materials for me to get
better understood of the following functions:
(1):/ompi/mca/pml/ob1/pml_ob1.c/mca_pml_ob1_add_procs
(2):/ompi/mca/bml/r2/bml_r2.c/mca_bml_r2_add_procs
(3):/ompi/mca/btl/tcp/btl_tcp.c/mca_btl_tcp_add_procs
especially the second function, it's really hard to totally understand these
functions.
Thanks & Regards
Yaohui Hu
On Thu, Feb 25, 2010 at 10:34 AM, Jeff Squyres  wrote:

> On Feb 24, 2010, at 12:16 PM, Aurélien Bouteiller wrote:
>
> > btl is the component responsible for a particular type of fabric.
> Endpoint is somewhat the instantiation of a btl to reach a particular
> destination on a particular fabric, proc is the generic name and properties
> of a destination.
>
> A few more words here...
>
> btl = Byte Transfer Layer.  It's our name for the framework that governs
> one flavor of point-to-point communications in the MPI layer.  Components in
> this framework are used by the ob1 and csum PMLs to effect MPI
> point-to-point communications (they're used in other ways, too, but let's
> start at the beginning here...).  There are several btl components: tcp, sm
> (shared memory), self (process loopback), openib (OpenFabrics), ...etc.
>  Each one of these effects communications over a different network type.
>  For purposes of this discussion, "component" == "plugin".
>
> The btl plugin is loaded into an MPI process and its component open/query
> functions are called.  If the btl component determines that it wants to run,
> it returns one or more modules.  Typically, btls return a module for every
> interface that they find.  For example, if the openib module finds 2
> OpenFabrics device ports, it'll return 2 modules.
>
> Hence, we typically describe components as analogous to a C++ class;
> modules are analogous to instances of that C++ class.
>
> Note that in many BTL component comments and variables/fields, they
> typically use shorthand language such as, "The btl then does this..."  Such
> language almost always refers to a specific module of that btl component.
>
> Modules are marshalled by the bml and ob1/csum to make an ordered list of
> who can talk to whom.
>
> Endpoints are data structures used to represent a module's connection to a
> remote MPI process (proc).  Hence, a BTL component can create multiple
> modules; each module can create lots of endpoints.  Each endpoint is tied to
> a specific remote proc.
>
> > Aurelien
> >
> > Le 24 févr. 2010 à 09:59, hu yaohui a écrit :
> >
> > > Could someone tell me the relationship between proc,endpoint and btl?
> > >  thanks & regards
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Ralph Castain

On Feb 25, 2010, at 1:41 AM, Leonardo Fialho wrote:

> Hi Ralph,
> 
> Very interesting the "composite framework" idea.

Josh is the force behind that idea :-)

> Regarding to the schema represented by the picture, I didn't understand the 
> RecoS' behaviour in a node failure situation.
> 
> In this case, will mpirun consider the daemon failure as a normal proc 
> failure? If it is correct, should mpirun update the global procs state for 
> all jobs running under the failed daemon?

I haven't included the node failure case yet - still on my "to-do" list. In 
brief, the answer is yes/no. :-)

Daemon failure follows the same code path as shown in the flow chart. However, 
it is up to the individual modules to determine a response to that failure. The 
"orcm" RecoS module response is to (a) mark all procs on that node as having 
failed, (b) mark that node as "down" so it won't get reused, and (c) remap and 
restart all such procs on the remaining available nodes, starting new daemon(s) 
as required.

In the orcm environment, nodes that are replaced or rebooted automatically 
start their own daemon. This is detected by orcm, and the node state (if the 
node is rebooted) will automatically be updated to "up" - if it is a new node, 
it is automatically added to the available resources. This allows the node to 
be reused once the problem has been corrected. In other environments (ssh, 
slurm, etc), the node is simply left as "down" as there is no way to know 
if/when the node becomes available again.

If you aren't using the "orcm" module, then the default behavior will abort the 
job.


> 
> Best regards,
> Leonardo
> 
> On Feb 25, 2010, at 7:05 AM, Ralph Castain wrote:
> 
>> Hi George et al
>> 
>> I have begun documenting the RecoS operation on the OMPI wiki:
>> 
>> https://svn.open-mpi.org/trac/ompi/wiki/RecoS
>> 
>> I'll continue to work on this over the next few days by adding a section 
>> explaining what was changed outside of the new framework to make it all 
>> work. In addition, I am revising the recos.h API documentation.
>> 
>> Hope to have all that done over the weekend.
>> 
>> 
>> On Feb 23, 2010, at 4:00 PM, Ralph Castain wrote:
>> 
>>> 
>>> On Feb 23, 2010, at 3:32 PM, George Bosilca wrote:
>>> 
 Ralph, Josh,
 
 We have some comments about the API of the new framework, mostly 
 clarifications needed to better understand how this new framework is 
 supposed to be used. And a request for a deadline extension, to delay the 
 code merge from the Recos branch in the trunk by a week.
 
 We have our own FT branch, with a totally different approach than what is 
 described in your RFC. Unfortunately, it diverged from the trunk about a 
 year ago, and merging back had proven to be a quite difficult task. Some 
 of the functionality in the Recos framework is clearly beneficial for what 
 we did, and has the potential to facilitate the porting of most of the 
 features from our brach back in trunk. We would like the deadline 
 extension in order to deeply analyze the impact of the Recos framework on 
 our work, and see how we can fit everything together back in the trunk of 
 Open MPI.
>>> 
>>> No problem with the extension - feel free to suggest modifications to make 
>>> the merge easier. This is by no means cast in stone, but rather a starting 
>>> point.
>>> 
 
 Here are some comments about the code:
 
 1. The documentation in recos.h is not very clear. Most of the functions 
 use only IN arguments, and are not supposed to return any values. We don't 
 see how the functions are supposed to be used, and what is supposed to be 
 their impact on the ORTE framework data.
>>> 
>>> I'll try to clarify the comments tonight (I know Josh is occupied right 
>>> now). The recos APIs are called from two locations:
>>> 
>>> 1. The errmgr calls recos whenever it receives a report of an aborted 
>>> process (via the errmgr.proc_aborted API). The idea was for recos to 
>>> determine what (if anything) to do about the failed process. 
>>> 
>>> 2. The rmaps modules can call the recos "suggest_map_targets" API to get a 
>>> list of suggested nodes for the process that is to be restarted. At the 
>>> moment, only the resilient mapper module does this. However, Josh and I are 
>>> looking at reorganizing some functionality currently in that mapper module 
>>> and making all of the existing mappers be "resilient".
>>> 
>>> So basically, the recos modules determine the recovery procedure and 
>>> execute it. For example, in the "orcm" module, we actually update the 
>>> various proc/job objects to prep them for restart and call plm.spawn from 
>>> within that module. If instead you use the ignore module, it falls through 
>>> to the recos base functions which call "abort" to kill the job. Again, the 
>>> action is taken local to recos, so nothing need be returned.
>>> 
>>> The functions generally don't return values (other than success/

Re: [OMPI devel] what's the relationship between proc, endpoint and btl?

2010-02-25 Thread Jeff Squyres
On Feb 25, 2010, at 7:14 AM, hu yaohui wrote:

> Thanks a lot! i got it.Could you introduce some more materials for me to get 
> better understood of the following functions:
> (1):/ompi/mca/pml/ob1/pml_ob1.c/mca_pml_ob1_add_procs

This is just the OB1 function to add new peer processes.  It's called by the 
MPI layer -- e.g., during MPI_INIT, MPI_COMM_SPAWN, etc.

> (2):/ompi/mca/bml/r2/bml_r2.c/mca_bml_r2_add_procs

The BML is the BTL Multiplexing Layer.  It's just a multiplexer for marshalling 
multiple BTL's together.  It has no message passing functionality in itself -- 
it just finds and dispatches to underlying BTL's.

> (3):/ompi/mca/btl/tcp/btl_tcp.c/mca_btl_tcp_add_procs

Check out the description of the BTL add_procs function in ompi/mca/btl/btl.h.  
This is the TCP BTL component's add_procs function.  Every BTL has one.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Josh Hursey

On Feb 23, 2010, at 3:00 PM, Ralph Castain wrote:

> 
> On Feb 23, 2010, at 3:32 PM, George Bosilca wrote:
> 
>> Ralph, Josh,
>> 
>> We have some comments about the API of the new framework, mostly 
>> clarifications needed to better understand how this new framework is 
>> supposed to be used. And a request for a deadline extension, to delay the 
>> code merge from the Recos branch in the trunk by a week.
>> 
>> We have our own FT branch, with a totally different approach than what is 
>> described in your RFC. Unfortunately, it diverged from the trunk about a 
>> year ago, and merging back had proven to be a quite difficult task. Some of 
>> the functionality in the Recos framework is clearly beneficial for what we 
>> did, and has the potential to facilitate the porting of most of the features 
>> from our brach back in trunk. We would like the deadline extension in order 
>> to deeply analyze the impact of the Recos framework on our work, and see how 
>> we can fit everything together back in the trunk of Open MPI.
> 
> No problem with the extension - feel free to suggest modifications to make 
> the merge easier. This is by no means cast in stone, but rather a starting 
> point.

Additionally, if you wanted to have a teleconf next week to increase the 
bandwidth of communication we can do that as well. Might help us negotiate some 
modifications that would be mutually beneficial. Unfortunately I am currently 
at a conference so cannot call in until Monday.

> 
>> 
>> Here are some comments about the code:
>> 
>> 1. The documentation in recos.h is not very clear. Most of the functions use 
>> only IN arguments, and are not supposed to return any values. We don't see 
>> how the functions are supposed to be used, and what is supposed to be their 
>> impact on the ORTE framework data.
> 
> I'll try to clarify the comments tonight (I know Josh is occupied right now). 
> The recos APIs are called from two locations:
> 
> 1. The errmgr calls recos whenever it receives a report of an aborted process 
> (via the errmgr.proc_aborted API). The idea was for recos to determine what 
> (if anything) to do about the failed process. 
> 
> 2. The rmaps modules can call the recos "suggest_map_targets" API to get a 
> list of suggested nodes for the process that is to be restarted. At the 
> moment, only the resilient mapper module does this. However, Josh and I are 
> looking at reorganizing some functionality currently in that mapper module 
> and making all of the existing mappers be "resilient".
> 
> So basically, the recos modules determine the recovery procedure and execute 
> it. For example, in the "orcm" module, we actually update the various 
> proc/job objects to prep them for restart and call plm.spawn from within that 
> module. If instead you use the ignore module, it falls through to the recos 
> base functions which call "abort" to kill the job. Again, the action is taken 
> local to recos, so nothing need be returned.
> 
> The functions generally don't return values (other than success/error) 
> because we couldn't think of anything useful to return to the errmgr. 
> Whatever recos does about an aborted proc, the errmgr doesn't do anything 
> further - if you look in that code, you'll see that if recos is enabled, all 
> the errmgr does is call recos and return.
> 
> Again, this can be changed if desired.
> 
>> 
>> 2. Why do we have all the char***? Why are they only declared as IN 
>> arguments?
> 
> I take it you mean in the predicted fault API? I believe Josh was including 
> that strictly as a placeholder. As you undoubtedly recall, I removed the fddp 
> framework from the trunk (devel continues off-line), so Josh wasn't sure what 
> I might want to input here. If you look at the modules themselves, you will 
> see the implementation is essentially empty at this time.
> 
> We had discussed simply removing that API for now until we determined if/when 
> fault prediction would return to the OMPI trunk. It was kind of a tossup - so 
> we left if for now. Could just as easily be removed until a later date - 
> either way is fine with us.

In this version of the components, none of them use the predicted_fault API. I 
have at least one component that will come in as a second step (so soon, but 
different RFC) that does use this interface to do some super nifty things (if I 
say so myself :).

We can remove the interface if people have heartburn about it being there, but 
we will want to add it back in soon enough.

As far as the 'char ***' parameters they really should just be IN parameters. 
They are not passed back to the suggestion/detection agent (though I guess they 
could be). In recognition of some of the broader uses of this interface I am 
considering changing them to a list of RecoS specific structures that would 
allow the caller of this function to pass additional information for each of 
the parameters (like an assurance level of the fault - 75% sure this proc is 
failed).

So we would cha

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Josh Hursey

On Feb 25, 2010, at 4:38 AM, Ralph Castain wrote:

> 
> On Feb 25, 2010, at 1:41 AM, Leonardo Fialho wrote:
> 
>> Hi Ralph,
>> 
>> Very interesting the "composite framework" idea.
> 
> Josh is the force behind that idea :-)

It solves a pretty interesting little problem. Its utility will really shine 
when I move the new components into place in the coming weeks/month.

> 
>> Regarding to the schema represented by the picture, I didn't understand the 
>> RecoS' behaviour in a node failure situation.
>> 
>> In this case, will mpirun consider the daemon failure as a normal proc 
>> failure? If it is correct, should mpirun update the global procs state for 
>> all jobs running under the failed daemon?
> 
> I haven't included the node failure case yet - still on my "to-do" list. In 
> brief, the answer is yes/no. :-)
> 
> Daemon failure follows the same code path as shown in the flow chart. 
> However, it is up to the individual modules to determine a response to that 
> failure. The "orcm" RecoS module response is to (a) mark all procs on that 
> node as having failed, (b) mark that node as "down" so it won't get reused, 
> and (c) remap and restart all such procs on the remaining available nodes, 
> starting new daemon(s) as required.
> 
> In the orcm environment, nodes that are replaced or rebooted automatically 
> start their own daemon. This is detected by orcm, and the node state (if the 
> node is rebooted) will automatically be updated to "up" - if it is a new 
> node, it is automatically added to the available resources. This allows the 
> node to be reused once the problem has been corrected. In other environments 
> (ssh, slurm, etc), the node is simply left as "down" as there is no way to 
> know if/when the node becomes available again.
> 
> If you aren't using the "orcm" module, then the default behavior will abort 
> the job.

Just to echo this response. The orted and process failures use the same error 
path, but can be easily differentiated by their jobids. The 'orcm' component is 
a good example of differentiating these two fault scenarios to correctly 
recover the ORTE job. Soon we may/should/will have the same ability with 
certain MPI jobs. :)

-- Josh

> 
> 
>> 
>> Best regards,
>> Leonardo
>> 
>> On Feb 25, 2010, at 7:05 AM, Ralph Castain wrote:
>> 
>>> Hi George et al
>>> 
>>> I have begun documenting the RecoS operation on the OMPI wiki:
>>> 
>>> https://svn.open-mpi.org/trac/ompi/wiki/RecoS
>>> 
>>> I'll continue to work on this over the next few days by adding a section 
>>> explaining what was changed outside of the new framework to make it all 
>>> work. In addition, I am revising the recos.h API documentation.
>>> 
>>> Hope to have all that done over the weekend.
>>> 
>>> 
>>> On Feb 23, 2010, at 4:00 PM, Ralph Castain wrote:
>>> 
 
 On Feb 23, 2010, at 3:32 PM, George Bosilca wrote:
 
> Ralph, Josh,
> 
> We have some comments about the API of the new framework, mostly 
> clarifications needed to better understand how this new framework is 
> supposed to be used. And a request for a deadline extension, to delay the 
> code merge from the Recos branch in the trunk by a week.
> 
> We have our own FT branch, with a totally different approach than what is 
> described in your RFC. Unfortunately, it diverged from the trunk about a 
> year ago, and merging back had proven to be a quite difficult task. Some 
> of the functionality in the Recos framework is clearly beneficial for 
> what we did, and has the potential to facilitate the porting of most of 
> the features from our brach back in trunk. We would like the deadline 
> extension in order to deeply analyze the impact of the Recos framework on 
> our work, and see how we can fit everything together back in the trunk of 
> Open MPI.
 
 No problem with the extension - feel free to suggest modifications to make 
 the merge easier. This is by no means cast in stone, but rather a starting 
 point.
 
> 
> Here are some comments about the code:
> 
> 1. The documentation in recos.h is not very clear. Most of the functions 
> use only IN arguments, and are not supposed to return any values. We 
> don't see how the functions are supposed to be used, and what is supposed 
> to be their impact on the ORTE framework data.
 
 I'll try to clarify the comments tonight (I know Josh is occupied right 
 now). The recos APIs are called from two locations:
 
 1. The errmgr calls recos whenever it receives a report of an aborted 
 process (via the errmgr.proc_aborted API). The idea was for recos to 
 determine what (if anything) to do about the failed process. 
 
 2. The rmaps modules can call the recos "suggest_map_targets" API to get a 
 list of suggested nodes for the process that is to be restarted. At the 
 moment, only the resilient mapper module does this. However, Josh and I 
>>>

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Leonardo Fialho
Hi Ralph and Josh,

>>> Regarding to the schema represented by the picture, I didn't understand the 
>>> RecoS' behaviour in a node failure situation.
>>> 
>>> In this case, will mpirun consider the daemon failure as a normal proc 
>>> failure? If it is correct, should mpirun update the global procs state for 
>>> all jobs running under the failed daemon?
>> 
>> I haven't included the node failure case yet - still on my "to-do" list. In 
>> brief, the answer is yes/no. :-)
>> 
>> Daemon failure follows the same code path as shown in the flow chart. 
>> However, it is up to the individual modules to determine a response to that 
>> failure. The "orcm" RecoS module response is to (a) mark all procs on that 
>> node as having failed, (b) mark that node as "down" so it won't get reused, 
>> and (c) remap and restart all such procs on the remaining available nodes, 
>> starting new daemon(s) as required.
>> 
>> In the orcm environment, nodes that are replaced or rebooted automatically 
>> start their own daemon. This is detected by orcm, and the node state (if the 
>> node is rebooted) will automatically be updated to "up" - if it is a new 
>> node, it is automatically added to the available resources. This allows the 
>> node to be reused once the problem has been corrected. In other environments 
>> (ssh, slurm, etc), the node is simply left as "down" as there is no way to 
>> know if/when the node becomes available again.
>> 
>> If you aren't using the "orcm" module, then the default behavior will abort 
>> the job.
> 
> Just to echo this response. The orted and process failures use the same error 
> path, but can be easily differentiated by their jobids. The 'orcm' component 
> is a good example of differentiating these two fault scenarios to correctly 
> recover the ORTE job. Soon we may/should/will have the same ability with 
> certain MPI jobs. :)

Hum... I'm really afraid about this. I understand your choice since it is 
really a good solution for fail/stop/restart behaviour, but looking from the 
fail/recovery side, can you envision some alternative for the orted's 
reconfiguration on the fly?

Best regards,
Leonardo


Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread George Bosilca

On Feb 25, 2010, at 11:16 , Leonardo Fialho wrote:

> Hum... I'm really afraid about this. I understand your choice since it is 
> really a good solution for fail/stop/restart behaviour, but looking from the 
> fail/recovery side, can you envision some alternative for the orted's 
> reconfiguration on the fly?

Leonardo,

I don't see why the current code prohibit such behavior. However, I don't see 
right now in this branch how the remaining daemons (and MPI processes) 
reconstruct the communication topology, but this is just a technicality.

Anyway, this is the code that UT will bring in. All our work focus on 
maintaining the exiting environment up and running instead of restarting 
everything. The orted will auto-heal (i.e reshape the underlying topology, 
recreate the connections, and so on), and the fault is propagated to the MPI 
layer who will take the decision on what to do next.

  george.





Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Leonardo Fialho
Hi George,

>> Hum... I'm really afraid about this. I understand your choice since it is 
>> really a good solution for fail/stop/restart behaviour, but looking from the 
>> fail/recovery side, can you envision some alternative for the orted's 
>> reconfiguration on the fly?
> 
> I don't see why the current code prohibit such behavior. However, I don't see 
> right now in this branch how the remaining daemons (and MPI processes) 
> reconstruct the communication topology, but this is just a technicality.
> 
> Anyway, this is the code that UT will bring in. All our work focus on 
> maintaining the exiting environment up and running instead of restarting 
> everything. The orted will auto-heal (i.e reshape the underlying topology, 
> recreate the connections, and so on), and the fault is propagated to the MPI 
> layer who will take the decision on what to do next.


When you say MPI layer, what exactly it means? The MPI interface or the network 
stack which supports the MPI communication (BTL, PML, etc.)?

In my mind I see an orted failure (and all procs running under this deamon) as 
an environment failure which leads to job failures. Thus, to use a 
fail/recovery strategy, this daemons should be recovered (possibly relaunching 
and updating its procs/jobs structures) and after that all failed procs which 
are originally running under this daemon should be recovered also (maybe from a 
checkpoint, log optionally). Of course, in available, an spare orted could be 
used.

Regarding to the MPI application, probably this 'environment reconfiguration' 
requires updates/reconfiguration/whatever on the communication stack which 
supports the MPI communication (BTL, PML, etc.).

Are we thinking in the same direction or I have missed something in the way?

Best regards,
Leonardo


Re: [OMPI devel] question about pids

2010-02-25 Thread Greg Watson
Ralph,

We'd like this to be able to support attaching a debugger to the application. 
Would it be difficult to provide? We don't need the information all at once, 
each PID could be sent as the process launches (as long as the XML is correctly 
formatted) if that makes it any easier.

Greg

On Feb 23, 2010, at 3:58 PM, Ralph Castain wrote:

> I don't see a way to currently do that - the rmaps display comes -before- 
> process launch, so the pid will not be displayed.
> 
> Do you need to see them? We'd have to add that output somewhere post-launch - 
> perhaps when debuggers are initialized.
> 
> On Feb 23, 2010, at 12:58 PM, Greg Watson wrote:
> 
>> Ralph,
>> 
>> I notice that you've got support in the XML output code to display the pids 
>> of the processes, but I can't see how to enable them. Can you give me any 
>> pointers?
>> 
>> Thanks,
>> Greg
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Josh Hursey

On Feb 25, 2010, at 8:32 AM, George Bosilca wrote:

> 
> On Feb 25, 2010, at 11:16 , Leonardo Fialho wrote:
> 
>> Hum... I'm really afraid about this. I understand your choice since it is 
>> really a good solution for fail/stop/restart behaviour, but looking from the 
>> fail/recovery side, can you envision some alternative for the orted's 
>> reconfiguration on the fly?
> 
> Leonardo,
> 
> I don't see why the current code prohibit such behavior. However, I don't see 
> right now in this branch how the remaining daemons (and MPI processes) 
> reconstruct the communication topology, but this is just a technicality.

If you use the 'cm' routed component then the reconstruction of the ORTE level 
communication works for all but the loss of the HNP. Neither Ralph or I have 
looked at supporting other routed components at this time. I know your group at 
UTK has some done work in this area so we wanted to tackle additional support 
for more scalable routed components as a second step, hopefully with 
collaboration from your group.

As far as the MPI layer, I can't say much at this point on how that works. This 
RFC only handles recovery of the ORTE layer, MPI layer recovery is a second 
step and involves much longer discussions. I have a solution for a certain type 
of MPI application, and it sounds like you have something that can be applied 
more generally.

> 
> Anyway, this is the code that UT will bring in. All our work focus on 
> maintaining the exiting environment up and running instead of restarting 
> everything. The orted will auto-heal (i.e reshape the underlying topology, 
> recreate the connections, and so on), and the fault is propagated to the MPI 
> layer who will take the decision on what to do next.

Per my previous suggestion, would it be useful to chat on the phone early next 
week about our various strategies?

-- Josh


> 
>  george.
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] question about pids

2010-02-25 Thread Ashley Pittman

Have you looked at orte-ps?  It contains all the information you'll need to 
attach a debugger to a already running application.

Ashley,

On 25 Feb 2010, at 17:43, Greg Watson wrote:

> Ralph,
> 
> We'd like this to be able to support attaching a debugger to the application. 
> Would it be difficult to provide? We don't need the information all at once, 
> each PID could be sent as the process launches (as long as the XML is 
> correctly formatted) if that makes it any easier.
> 
> Greg
> 
> On Feb 23, 2010, at 3:58 PM, Ralph Castain wrote:
> 
>> I don't see a way to currently do that - the rmaps display comes -before- 
>> process launch, so the pid will not be displayed.
>> 
>> Do you need to see them? We'd have to add that output somewhere post-launch 
>> - perhaps when debuggers are initialized.
>> 
>> On Feb 23, 2010, at 12:58 PM, Greg Watson wrote:
>> 
>>> Ralph,
>>> 
>>> I notice that you've got support in the XML output code to display the pids 
>>> of the processes, but I can't see how to enable them. Can you give me any 
>>> pointers?

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




[OMPI devel] RFC: increase default AC/AM/LT requirements

2010-02-25 Thread Jeff Squyres
WHAT: Bump minimum required versions of GNU autotools up to modern versions.  I 
suggest the following, but could be talked down a version or two:
  Autoconf: 2.65
  Automake: 1.11.1
  Libtool: 2.2.6b

WHY: Stop carrying patches and workarounds for old versions.

WHERE: autogen.sh, make_dist_tarball, various Makefile.am's, configure.ac, *.m4.

WHEN: No real rush.  Somewhere in 1.5.x.

TIMEOUT: Friday March 5, 2010



I was debugging a complex Automake timestamp issue yesterday and discovered 
that it was caused by the fact that we are patching an old version of 
libtool.m4.  It took a little while to figure out both the problem and an 
acceptable workaround.  During this process, I noticed that autogen.sh still 
carries patches to fix bugs in some *really* old versions of Libtool (e.g., 
1.5.22).  Hence, I am send this RFC to increase the minimum required versions.

Keep in mind:

1. This ONLY affects developers.  Those who build from tarballs don't even need 
to have the Autotools installed.
2. Autotool patches should always be pushed upstream.  We should only maintain 
patches for things that have been pushed upstream but have not yet been 
released.
3. We already have much more recent Autotools requirements for official 
distribution tarballs; see the chart here:

http://www.open-mpi.org/svn/building.php

Specifically: although official tarballs require recent Autotools, we allow 
developers to use much older versions.   Why are we still carrying around this 
old kruft?  Does some developer out there have a requirement to use older 
Autotools?

If not, this RFC proposes to only allow recent versions of the Autotools to 
build Open MPI.  I believe there's reasonable m4 these days that can make 
autogen/configure/whatever abort early if the versions are not new enough.  
This would allow us, at a minimum, to drop some of the libtool patches we're 
carrying.  There may be some Makefile.am workarounds that are no longer 
necessary, too.

There's no real rush on this; if this RFC passes, we can set a concrete, fixed 
date some point in the future where we switch over to requiring new versions.  
This should give everyone plenty of time to update if you need to, etc.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: increase default AC/AM/LT requirements

2010-02-25 Thread Barrett, Brian W
I think our last set of minimums was based on being able to use RHEL4 out of 
the box.  Updating to whatever ships with RHEL5 probably makes sense, but I 
think that still leaves you at a LT 1.5.x release.  Being higher than that 
requires new Autotools, which seems like asking for trouble.

Brian

On Feb 25, 2010, at 4:47 PM, Jeff Squyres wrote:

> WHAT: Bump minimum required versions of GNU autotools up to modern versions.  
> I suggest the following, but could be talked down a version or two:
>  Autoconf: 2.65
>  Automake: 1.11.1
>  Libtool: 2.2.6b
> 
> WHY: Stop carrying patches and workarounds for old versions.
> 
> WHERE: autogen.sh, make_dist_tarball, various Makefile.am's, configure.ac, 
> *.m4.
> 
> WHEN: No real rush.  Somewhere in 1.5.x.
> 
> TIMEOUT: Friday March 5, 2010
> 
> 
> 
> I was debugging a complex Automake timestamp issue yesterday and discovered 
> that it was caused by the fact that we are patching an old version of 
> libtool.m4.  It took a little while to figure out both the problem and an 
> acceptable workaround.  During this process, I noticed that autogen.sh still 
> carries patches to fix bugs in some *really* old versions of Libtool (e.g., 
> 1.5.22).  Hence, I am send this RFC to increase the minimum required versions.
> 
> Keep in mind:
> 
> 1. This ONLY affects developers.  Those who build from tarballs don't even 
> need to have the Autotools installed.
> 2. Autotool patches should always be pushed upstream.  We should only 
> maintain patches for things that have been pushed upstream but have not yet 
> been released.
> 3. We already have much more recent Autotools requirements for official 
> distribution tarballs; see the chart here:
> 
>http://www.open-mpi.org/svn/building.php
> 
> Specifically: although official tarballs require recent Autotools, we allow 
> developers to use much older versions.   Why are we still carrying around 
> this old kruft?  Does some developer out there have a requirement to use 
> older Autotools?
> 
> If not, this RFC proposes to only allow recent versions of the Autotools to 
> build Open MPI.  I believe there's reasonable m4 these days that can make 
> autogen/configure/whatever abort early if the versions are not new enough.  
> This would allow us, at a minimum, to drop some of the libtool patches we're 
> carrying.  There may be some Makefile.am workarounds that are no longer 
> necessary, too.
> 
> There's no real rush on this; if this RFC passes, we can set a concrete, 
> fixed date some point in the future where we switch over to requiring new 
> versions.  This should give everyone plenty of time to update if you need to, 
> etc.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

--
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories







Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Ralph Castain
I believe you are thinking parallel to what Josh and I have been doing, and
slightly different to the UTK approach. The "orcm" method follows what you
describe: we maintain operation on the current remaining nodes, see if we
can use another new node to replace the failed one, and redistribute the
affected procs (on the failed node) either to existing nodes or to new ones.

I believe UTK's approach focuses on retaining operation of the existing
nodes, redistributing procs across them. I suspect we will eventually
integrate some of these operations so that users can exploit the best of
both methods.

Josh hasn't exposed his MPI recovery work yet. As he mentioned in his
response, he has done some things in this area that are complementary to the
UTK method. Just needs to finish his thesis before making them public. :-)


On Thu, Feb 25, 2010 at 9:54 AM, Leonardo Fialho
wrote:

> Hi George,
>
> >> Hum... I'm really afraid about this. I understand your choice since it
> is really a good solution for fail/stop/restart behaviour, but looking from
> the fail/recovery side, can you envision some alternative for the orted's
> reconfiguration on the fly?
> >
> > I don't see why the current code prohibit such behavior. However, I don't
> see right now in this branch how the remaining daemons (and MPI processes)
> reconstruct the communication topology, but this is just a technicality.
> >
> > Anyway, this is the code that UT will bring in. All our work focus on
> maintaining the exiting environment up and running instead of restarting
> everything. The orted will auto-heal (i.e reshape the underlying topology,
> recreate the connections, and so on), and the fault is propagated to the MPI
> layer who will take the decision on what to do next.
>
>
> When you say MPI layer, what exactly it means? The MPI interface or the
> network stack which supports the MPI communication (BTL, PML, etc.)?
>
> In my mind I see an orted failure (and all procs running under this deamon)
> as an environment failure which leads to job failures. Thus, to use a
> fail/recovery strategy, this daemons should be recovered (possibly
> relaunching and updating its procs/jobs structures) and after that all
> failed procs which are originally running under this daemon should be
> recovered also (maybe from a checkpoint, log optionally). Of course, in
> available, an spare orted could be used.
>
> Regarding to the MPI application, probably this 'environment
> reconfiguration' requires updates/reconfiguration/whatever on the
> communication stack which supports the MPI communication (BTL, PML, etc.).
>
> Are we thinking in the same direction or I have missed something in the
> way?
>
> Best regards,
> Leonardo
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Ralph Castain
Just to add to Josh's comment: I am working now on recovering from HNP
failure as well. Should have that in a month or so.


On Thu, Feb 25, 2010 at 10:46 AM, Josh Hursey  wrote:

>
> On Feb 25, 2010, at 8:32 AM, George Bosilca wrote:
>
> >
> > On Feb 25, 2010, at 11:16 , Leonardo Fialho wrote:
> >
> >> Hum... I'm really afraid about this. I understand your choice since it
> is really a good solution for fail/stop/restart behaviour, but looking from
> the fail/recovery side, can you envision some alternative for the orted's
> reconfiguration on the fly?
> >
> > Leonardo,
> >
> > I don't see why the current code prohibit such behavior. However, I don't
> see right now in this branch how the remaining daemons (and MPI processes)
> reconstruct the communication topology, but this is just a technicality.
>
> If you use the 'cm' routed component then the reconstruction of the ORTE
> level communication works for all but the loss of the HNP. Neither Ralph or
> I have looked at supporting other routed components at this time. I know
> your group at UTK has some done work in this area so we wanted to tackle
> additional support for more scalable routed components as a second step,
> hopefully with collaboration from your group.
>
> As far as the MPI layer, I can't say much at this point on how that works.
> This RFC only handles recovery of the ORTE layer, MPI layer recovery is a
> second step and involves much longer discussions. I have a solution for a
> certain type of MPI application, and it sounds like you have something that
> can be applied more generally.
>
> >
> > Anyway, this is the code that UT will bring in. All our work focus on
> maintaining the exiting environment up and running instead of restarting
> everything. The orted will auto-heal (i.e reshape the underlying topology,
> recreate the connections, and so on), and the fault is propagated to the MPI
> layer who will take the decision on what to do next.
>
> Per my previous suggestion, would it be useful to chat on the phone early
> next week about our various strategies?
>
> -- Josh
>
>
> >
> >  george.
> >
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] question about pids

2010-02-25 Thread Ralph Castain
Easy to do. I'll dump all the pids at the same time when the launch
completes - effectively, it will be at the same point used by other
debuggers to attach.

Have it for you in the trunk this weekend. Can you suggest an xml format you
would like? Otherwise, I'll just use the current proc output (used in the
map output) and add a "pid" field to it.

On Thu, Feb 25, 2010 at 10:43 AM, Greg Watson  wrote:

> Ralph,
>
> We'd like this to be able to support attaching a debugger to the
> application. Would it be difficult to provide? We don't need the information
> all at once, each PID could be sent as the process launches (as long as the
> XML is correctly formatted) if that makes it any easier.
>
> Greg
>
> On Feb 23, 2010, at 3:58 PM, Ralph Castain wrote:
>
> > I don't see a way to currently do that - the rmaps display comes -before-
> process launch, so the pid will not be displayed.
> >
> > Do you need to see them? We'd have to add that output somewhere
> post-launch - perhaps when debuggers are initialized.
> >
> > On Feb 23, 2010, at 12:58 PM, Greg Watson wrote:
> >
> >> Ralph,
> >>
> >> I notice that you've got support in the XML output code to display the
> pids of the processes, but I can't see how to enable them. Can you give me
> any pointers?
> >>
> >> Thanks,
> >> Greg
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread George Bosilca
Josh,

Next week is a little bit too early as will need some time to figure out how to 
integrate with this new framework, and at what extent our code and requirements 
fit into. Then the week after is the MPI Forum. How about on Thursday 11 March?

  Thanks,
george.

On Feb 25, 2010, at 12:46 , Josh Hursey wrote:

> Per my previous suggestion, would it be useful to chat on the phone early 
> next week about our various strategies?




Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-02-25 Thread Ralph Castain
If Josh is going to be at the forum, perhaps you folks could chat there?
Might as well take advantage of being colocated, if possible.

Otherwise, I'm available pretty much any time. I can't contribute much about
the MPI recovery issues, but can contribute to the RTE issues if that helps.


On Thu, Feb 25, 2010 at 7:39 PM, George Bosilca wrote:

> Josh,
>
> Next week is a little bit too early as will need some time to figure out
> how to integrate with this new framework, and at what extent our code and
> requirements fit into. Then the week after is the MPI Forum. How about on
> Thursday 11 March?
>
>  Thanks,
> george.
>
> On Feb 25, 2010, at 12:46 , Josh Hursey wrote:
>
> > Per my previous suggestion, would it be useful to chat on the phone early
> next week about our various strategies?
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>