Aside from the idea of saving the topology to a XML file before running the job, you could also: * rank 0 load the topology as usual * rank 0 saves it to a XML buffer (hwloc_topology_export_xmlbuffer()) then MPI_Broadcast() to other ranks * those ranks would just load a hwloc topology from the received XML buffer (hwloc_topology_set_xmlbuffer()).
Brice Le 06/03/2013 03:53, Hammond, Simon David (-EXP) a écrit : > Hey Jeff, > > It's not in OpenMPI or MPICH :(. It's a custom library which is not > MPI aware making it difficult to share the topology query. Ill see if > we can get a stand alone piece of code. > > From earlier posts it sounds like OpenMPI queries once per physical > node so probably won't have this problem. I'm guessing MPICH would do > something similar? > > S. > > > > Sent with Good (www.good.com) > > > -----Original Message----- > *From: *Jeff Hammond [jhamm...@alcf.anl.gov > <mailto:jhamm...@alcf.anl.gov>] > *Sent: *Tuesday, March 05, 2013 07:17 PM Mountain Standard Time > *To: *Hardware locality user list > *Subject: *[EXTERNAL] Re: [hwloc-users] Many queries creating slow > performance > > Si - Is your code that calls hwloc part of MPICH or OpenMPI or > something that can be made standalone and shared? > > Brice - Do you have access to a MIC system for testing? Write me > offline if you don't and I'll see what I can do to help. > > If this affects MPICH i.e. Hydra, then I'm sure Intel will be > committed to helping fix it since Intel MPI is using Hydra as the > launcher on systems like Stampede. > > Best, > > Jeff > > On Tue, Mar 5, 2013 at 3:05 PM, Brice Goglin <brice.gog...@inria.fr> > wrote: > > Just tested on a 96-core shared-memory machine. Running OpenMPI 1.6 > mpiexec > > lstopo, here's the execution time (mpiexec launch time is 0.2-0.4s) > > > > 1 rank : 0.2s > > 8 ranks: 0.3-0.5s depending on binding (packed or scatter) > > 24ranks: 0.8-3.7s depending on binding > > 48ranks: 2.8-8.0s depending on binding > > 96ranks: 14.2s > > > > 96ranks from a single XML file: 0.4s (negligible against mpiexec launch > > time) > > > > Brice > > > > > > > > Le 05/03/2013 20:23, Simon Hammond a écrit : > > > > Hi HWLOC users, > > > > We are seeing some significant performance problems using HWLOC 1.6.2 on > > Intel's MIC products. In one of our configurations we create 56 MPI > ranks, > > each rank then queries the topology of the MIC card before creating > threads. > > We are noticing that if we run 56 MPI ranks as opposed to one the > calls to > > query the topology in HWLOC are very slow, runtime goes from seconds to > > minutes (and upwards). > > > > We guessed that this might be caused by the kernel serializing > access to the > > /proc filesystem but this is just a hunch. > > > > Has anyone had this problem and found an easy way to change the > library / > > calls to HWLOC so that the slow down is not experienced? Would you > describe > > this as a bug? > > > > Thanks for your help. > > > > > > -- > > Simon Hammond > > > > 1-(505)-845-7897 / MS-1319 > > Scalable Computer Architectures > > Sandia National Laboratories, NM > > > > > > > > > > > > > > _______________________________________________ > > hwloc-users mailing list > > hwloc-us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > > > > > > _______________________________________________ > > hwloc-users mailing list > > hwloc-us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > > -- > Jeff Hammond > Argonne Leadership Computing Facility > University of Chicago Computation Institute > jhamm...@alcf.anl.gov / (630) 252-5381 > http://www.linkedin.com/in/jeffhammond > https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users