BTW: just to be clear. You don't have to write any code to compute these 
values, or to reset the job structures prior to restarting a process. This has 
already been done.

Recomputing local and node ranks is done in 
orte/mca/rmaps/base/rmaps_base_support_fns.c in a function called 
orte_rmaps_base_update_local_ranks.

Resetting the job and proc structures for restarting a process is done in 
orte/mca/plm/base/plm_base_rsh_support.c in a function called 
orte_plm_base_reset_job.

The restart logic was in the orte/mca/errmgr/orcm module, but I moved that out 
of the devel trunk recently as we needed to do some orcm-specific things in it. 
However, I can (and probably should) restore it under a different name if that 
would help.

Ralph


On Apr 7, 2010, at 10:15 PM, Ralph Castain wrote:

> The local rank of a process is computed by looking at all processes on a node 
> from that job. The lowest MPI rank process on that node from that job is 
> given local-rank=0. All processes on the node are given local-ranks in 
> ascending order according to their MPI rank.
> 
> The node rank is computed the same way, except that we look at all processes 
> on the node, spanning all MPI jobs.
> 
> Consider this example. Suppose we have an MPI application that launches 3 
> processes on each of two nodes, with ranks assigned on a bynode round-robin 
> basis. Thus, the MPI rank mapping looks like this:
> 
> node0:  rank 0, 2, 4
> node1: rank 1, 3, 5
> 
> The local ranks would look like this:
> 
> Node             MPI Rank               Local Rank
> node0                   0                                 0
> node0                   2                                 1
> node0                   4                                 2
> 
> node1                   1                                 0
> node1                   3                                 1
> node1                   5                                 2
> 
> Since we only have one job, the node rank of each process would be identical 
> to its local rank.  Now suppose that application does a comm_spawn that 
> launches two processes on node0. The local ranks of the new processes would 
> be 0,1 reflecting their relative position within that job. However, their 
> node ranks would be 3,4 because of the processes already on the node.
> 
> We use these values when assigning static ports and processor affinity. Other 
> than that, they have no meaning.
> 
> HTH
> Ralph
> 
> 
> 
> On Apr 7, 2010, at 7:16 PM, luyang dong wrote:
> 
>> dear teachers:
>>          In orte_globals.h, there is a data structure.
>> typedef struct {
>>     /* index to node */
>>     int32_t node;
>>     /* local rank */
>>     orte_local_rank_t local_rank;
>>     /* node rank */
>>     orte_node_rank_t node_rank;
>> } orte_pmap_t;
>> And I do not understand what both local_rank and node_rank exactly mean. Is 
>> local_rank similar to the rank of MPI Specification. Can you help me? My 
>> motivation is to achieve process migration in openmpi, I urgently want to 
>> the procedure of launching process.
>> 
>>  _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

Reply via email to