As a followup of Stuttgart's developper's meeting, here is an RFC for our topology detection framework.

WHAT: Add a framework for hardware topology detection to be used by any other part of Open MPI to help optimization.

WHY: Collective operations or shared memory algorithms among others may have optimizations depending on the hardware relationship between two MPI processes. HiTopo is an attempt to provide it in a unified manner.

WHERE: ompi/mca/hitopo/

WHEN: When wanted.

==========================================================================
We developped the HiTopo framework for our collective operation component, but it may be useful for other parts of Open MPI, so we'd like to contribute it.

A wiki page has been setup :
https://svn.open-mpi.org/trac/ompi/wiki/HiTopo

and a bitbucket repository :
http://bitbucket.org/jeaugeys/hitopo/

In a few words, we have 3 steps in HiTopo :

 - Detection : each MPI process detects its topology at various levels :
    - core/socket : through the cpuid component
    - node : through gethostname
    - switch/island : through openib (mad) or slurm
      [ Other topology detection components may be added for other
        resource managers, specific hardware or whatever we want ...]

- Collection : an allgather is performed to have all other processes' addresses

- Renumbering : "string" addresses are converted to numbers starting at 0 (Example : nodenames "foo" and "bar" are renamed 0 and 1).

Any comment welcome,
Sylvain

Reply via email to