As a followup of Stuttgart's developper's meeting, here is an RFC for our
topology detection framework.
WHAT: Add a framework for hardware topology detection to be used by any
other part of Open MPI to help optimization.
WHY: Collective operations or shared memory algorithms among others may
have optimizations depending on the hardware relationship between two MPI
processes. HiTopo is an attempt to provide it in a unified manner.
WHERE: ompi/mca/hitopo/
WHEN: When wanted.
==========================================================================
We developped the HiTopo framework for our collective operation component,
but it may be useful for other parts of Open MPI, so we'd like to
contribute it.
A wiki page has been setup :
https://svn.open-mpi.org/trac/ompi/wiki/HiTopo
and a bitbucket repository :
http://bitbucket.org/jeaugeys/hitopo/
In a few words, we have 3 steps in HiTopo :
- Detection : each MPI process detects its topology at various levels :
- core/socket : through the cpuid component
- node : through gethostname
- switch/island : through openib (mad) or slurm
[ Other topology detection components may be added for other
resource managers, specific hardware or whatever we want ...]
- Collection : an allgather is performed to have all other processes'
addresses
- Renumbering : "string" addresses are converted to numbers starting at 0
(Example : nodenames "foo" and "bar" are renamed 0 and 1).
Any comment welcome,
Sylvain