BTW: just to be clear. You don't have to write any code to compute these
values, or to reset the job structures prior to restarting a process. This has
already been done.
Recomputing local and node ranks is done in
orte/mca/rmaps/base/rmaps_base_support_fns.c in a function called
orte_rmaps_base_update_local_ranks.
Resetting the job and proc structures for restarting a process is done in
orte/mca/plm/base/plm_base_rsh_support.c in a function called
orte_plm_base_reset_job.
The restart logic was in the orte/mca/errmgr/orcm module, but I moved that out
of the devel trunk recently as we needed to do some orcm-specific things in it.
However, I can (and probably should) restore it under a different name if that
would help.
Ralph
On Apr 7, 2010, at 10:15 PM, Ralph Castain wrote:
> The local rank of a process is computed by looking at all processes on a node
> from that job. The lowest MPI rank process on that node from that job is
> given local-rank=0. All processes on the node are given local-ranks in
> ascending order according to their MPI rank.
>
> The node rank is computed the same way, except that we look at all processes
> on the node, spanning all MPI jobs.
>
> Consider this example. Suppose we have an MPI application that launches 3
> processes on each of two nodes, with ranks assigned on a bynode round-robin
> basis. Thus, the MPI rank mapping looks like this:
>
> node0: rank 0, 2, 4
> node1: rank 1, 3, 5
>
> The local ranks would look like this:
>
> Node MPI Rank Local Rank
> node0 0 0
> node0 2 1
> node0 4 2
>
> node1 1 0
> node1 3 1
> node1 5 2
>
> Since we only have one job, the node rank of each process would be identical
> to its local rank. Now suppose that application does a comm_spawn that
> launches two processes on node0. The local ranks of the new processes would
> be 0,1 reflecting their relative position within that job. However, their
> node ranks would be 3,4 because of the processes already on the node.
>
> We use these values when assigning static ports and processor affinity. Other
> than that, they have no meaning.
>
> HTH
> Ralph
>
>
>
> On Apr 7, 2010, at 7:16 PM, luyang dong wrote:
>
>> dear teachers:
>> In orte_globals.h, there is a data structure.
>> typedef struct {
>> /* index to node */
>> int32_t node;
>> /* local rank */
>> orte_local_rank_t local_rank;
>> /* node rank */
>> orte_node_rank_t node_rank;
>> } orte_pmap_t;
>> And I do not understand what both local_rank and node_rank exactly mean. Is
>> local_rank similar to the rank of MPI Specification. Can you help me? My
>> motivation is to achieve process migration in openmpi, I urgently want to
>> the procedure of launching process.
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>