On Mon, 2011-02-28 at 22:17 +0100, Brice Goglin wrote: > Le 28/02/2011 22:04, Jeff Squyres a écrit : > > That being said, someone cited on this list a long time ago that running > > the hwloc detection on very large machines (e.g., SGI machines with 1000+ > > cores) takes on the order of seconds (because it traverses /sys, etc.). So > > if you want your tool to be used on machines like that, then it might be > > better to do the discovery once and share that data among your threads. > > > > People running on such large machines should really export the machine > topology to XML once and reload from there all the time.
Btw. lstopo on such a large machine (64 NUMA nodes, 1024 logical CPUs) takes about 0.6 seconds at our site. This is accepteable for scripts, that run only frequently. It is also accepteable for executables that need the topology info at start time (e.g. pbs_mom of Torque). To calculate topology-based pinning schemes and do process pinning (like done e.g. by OpenMPI or MVAPICH2) this is too long, when every process (MPI task) or thread loads the topology in parallel. But exporting an XML topology and using this for this purpose is inaccepteable, when Linux cpusets are used, because one needs the topology of a subset of the machine depending on the caller context. What we currently do is to let only one process per machine load the topology, and distribute the essentials needed for pinning to the remaining processes. BK > Brice > > > > _______________________________________________ > hwloc-devel mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Dr. Bernd Kallies Konrad-Zuse-Zentrum für Informationstechnik Berlin Takustr. 7 14195 Berlin Tel: +49-30-84185-270 Fax: +49-30-84185-311 e-mail: [email protected]
