Since people may not be fully familiar, and because things have evolved, I 
thought it might help to provide a brief explanation of the ranks we assign to 
processes in OMPI.

Each process has four "ranks" assigned to it at launch:

1. vpid - equivalent to its MPI rank within the job. You can access the vpid 
with ORTE_PROC_MY_NAME->vpid.

2. local_rank - the relative rank of the process, within its own job, on the 
local node. For example, if there are three processes from this job on the 
node, then the lowest vpid process would have local_rank=0, the next highest 
vpid process would have local_rank=1, etc. The local_rank is typically used by 
the shared memory subsystem to decide which proc will create the backing file.

Note that processes from dynamically spawned jobs on the node will have 
overlapping local_ranks. For example, if a process on the above job were to 
comm_spawn two more procs on the node, the lowest vpid of those would also have 
local_rank=0 as it is in a different jobid.

Every process has full knowledge of the local_rank for every other process 
executing within that mpirun AND for any proc that connected to it via MPI 
connect/accept or comm_spawn (the info is included in the modex during the 
connect/accept procedure). You can obtain the local_rank of any process using

orte_local_rank_t orte_ess.get_local_rank(proc_name)

This will return ORTE_LOCAL_RANK_INVALID if the info isn't known.

3. node_rank - the relative rank of the process, spanning all jobs under this 
mpirun, on the local node. The node_rank is typically used by the OOB to select 
a static port from the given range, thus ensuring that each proc on the node - 
regardless of job - takes a unique port. For example, if there are three 
processes from this job on the node, then the lowest vpid process would have 
node_rank=0, the next highest vpid process would have node_rank =1, etc. If a 
process they comm_spawns another process onto the node, it will have 
node_rank=3 since the computation spans -all- jobs.

Every process has full knowledge of the node_rank for every other process 
executing within that mpirun AND for any proc that connected to it via MPI 
connect/accept or comm_spawn (the info is included in the modex during the 
connect/accept procedure). You can obtain the node_rank of any process using

orte_node_rank_t orte_ess.get_node_rank(proc_name)

This will return ORTE_NODE_RANK_INVALID if the info isn't known.

4. app_rank - the relative rank of the process within its app_context. This 
equates to the vpid for a job that contains only one app_context. However, for 
jobs with multiple app_contexts, this value provides a way of determining a 
proc's rank solely within its own app_context. Each process only has access to 
its own app_rank in orte_process_info - it doesn't have any knowledge of the 
app_rank for other processes.

HTH
Ralph


Reply via email to