2011/3/24 Ralph Castain <r...@open-mpi.org> > You really don't want to do it that way - you'll create a major confusion > in mpirun and the other daemons about who is where. Have you looked at the > code in orte/mca/errmgr/hnp/errmgr_hnp.c, line 1573 and following? > I did not look at that, but i will do it right now.
> > The ability to relocate a failed child process is already in the trunk - it > only requires turning "on" with an --enable-recovery flag at runtime if you > don't need the checkpoint/restart support. If you do need C/R, you can use > that too (just requires some configure flags). > About this, i'm needing C/R support, because what i'm trying to do is to restart a process in another node(as a child of the orted residing there) from a previous checkpoint .I will take a look to the relocation feature that you are mentioning and try to use it. > > At the least, the cited code should provide guidance on how to correctly > restart procs if you need your own errmgr module for other reasons. > Again thanks Ralph, you have been very helpful. Best regards. Hugo Meyer