Re: [OMPI devel] Locality info

Ralph Castain Wed, 19 Oct 2011 19:16:09 -0400

On Oct 19, 2011, at 5:05 PM, George Bosilca wrote:

> Wonderful!!! We've been waiting for such functionality for a while.

My pleasure :-)

> 
> I do have some questions/remarks related to this patch.
> 
> What is the my_node_rank in the orte_proc_info_t structure?

The node rank is a local ranking of procs on a node, starting with 0 for the 
lowest vpid on the node and going up from there. It normally was passed in the 
environment and picked up in the ess components so it could be used to select a 
static port during oob init, if those were specified.

I moved it to a more general place solely because I wanted to move a bunch of 
replicated code to the ess/base instead of having it in nearly every module. I 
debated about putting it in ess/base.h instead, but since other places in the 
code might also want it, figured I'd make it more globally available.

If it turns out nobody needs it, we can move it back into just the ess.

> Is there any difference between using the field my_node_rank or the vpid part 
> of the my_daemon?

Yes - my_daemon refers to the local daemon. The node rank refers solely to the 
relative ranking of application procs on the node.

> What is the correct way of finding that two processes are on the same remote 
> location, comparing their daemon vpid or their node_rank?

Daemon vpid

> How the node_rank change with respect to dynamic process management when new 
> daemons are joining?

This is where node_rank comes into play. The mapper sees across jobs that are 
sharing nodes, so the mapper currently is responsible for computing the 
node_rank of a proc. This info gets transmitted to all daemons, including new 
dynamically started ones, in the launch msg. So everyone always has a picture 
of the node_rank for every proc.

> 
> The flag OPAL_PROC_ON_L*CACHE is only set for local processes if I understand 
> correctly your last email?

Yes - all the locality flags refer only to the location of another process 
relative to you, you being an app process. As I said, though, this can easily 
be extended to return the relative locality of two procs on a remote node, if 
that would be of use.

> 
> I guess proc_flags in proc.h should be opal_paffinity_locality_t to match the 
> flags on the ORTE level?

My bad - I thought I had changed it? If not, it certainly needs to be...

> 
> A more high level remark. The fact that the locality information is 
> automatically packed and exchanged during the grpcomm modex call seems a 
> little bit weird (do the upper level have a saying on it?). I would not have 
> thought that the grpcomm (which based on the grpcomm.h header file is a 
> framework providing communication services that span entire jobs or 
> collections of processes) is the place to put it.

I agree - I wasn't entirely sure where to put it, frankly. It needs to be 
somewhere that both direct launch and mpirun-launched apps can see it. Could go 
in the MPI layer, I suppose.

Suggestions welcome!

> 
> Thanks,
>  george.
> 
> 
> On Oct 19, 2011, at 16:28 , Ralph Castain wrote:
> 
>> Hi folks
>> 
>> For those of you who don't follow the commits...
>> 
>> I just committed (r25323) an extension of the orte_ess.proc_get_locality 
>> function that allows a process to get its relative resource usage with any 
>> other proc in the job. In other words, you can provide a process name to the 
>> function, and the returned bitmask tells you if you share a node, numa, 
>> socket, caches (by level), core, and hyperthread with that process.
>> 
>> If you are on the same node and unbound, of course, you share all of those. 
>> However, if you are bound, then this can help tell you if you are on a 
>> common numa node, sharing an L1 cache, etc. Might be handy.
>> 
>> I implemented the underlying functionality so that we can further extend it 
>> to tell you the relative resource location of two procs on a remote node. If 
>> that someday becomes of interest, it would be relatively easy to do - but 
>> would require passing more info around. Hence, I've allowed for it, but not 
>> implemented it until there is some identified need.
>> 
>> Locality info is available anytime after the modex is completed during 
>> MPI_Init, and is supported regardless of launch environment (minus cnos, for 
>> now), launch by mpirun, or direct-launch - in other words, pretty much 
>> always.
>> 
>> Hope it proves of help in your work
>> Ralph
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Locality info

Reply via email to