2011/3/24 Ralph Castain <r...@open-mpi.org>

> You really don't want to do it that way - you'll create a major confusion
> in mpirun and the other daemons about who is where. Have you looked at the
> code in orte/mca/errmgr/hnp/errmgr_hnp.c, line 1573 and following?
>
I did not look at that, but i will do it right now.

>
> The ability to relocate a failed child process is already in the trunk - it
> only requires turning "on" with an --enable-recovery flag at runtime if you
> don't need the checkpoint/restart support. If you do need C/R, you can use
> that too (just requires some configure flags).
>
About this, i'm needing C/R support, because what i'm trying to do is to
restart a process in another node(as a child of the orted residing there)
from a previous checkpoint .I will take a look to the relocation feature
that you are mentioning and try to use it.

>
> At the least, the cited code should provide guidance on how to correctly
> restart procs if you need your own errmgr module for other reasons.
>

Again thanks Ralph, you have been very helpful.

Best regards.

Hugo Meyer

Reply via email to