Le 08/09/2016 19:17, Brice Goglin a écrit : > >> By the way, is it expected that binding will be slow on it? hwloc-bind >> is ~10 times slower (~1s) than on two-socket sandybridge, and ~3 times >> slower than on a 128-core, 16-socket system. > Binding itself shouldn't be slower. But hwloc's topology discovery > (which is performed by hwloc-bind before actual binding) is slower on > KNL than on "normal" nodes. The overhead is basically linear with the > number of hyperthreads, and KNL sequential perf is lower than your other > nodes. > > The easy fix is to export the topology to XML with lstopo foo.xml and > then tell all hwloc users to load from XML: > export HWLOC_XMLFILE=foo.xml > export HWLOC_THISSYSTEM=1 > https://www.open-mpi.org/projects/hwloc/doc/v1.11.4/a00030.php#faq_xml > > For hwloc 2.0, I am trying to make sure we don't perform useless > discovery steps. hwloc-bind (and many applications) don't require all > topology details. v1.x gathers everything and filters things out later. > For 2.0, the plan is rather to directly just gather what we need. What > you can try for fun is: > export HWLOC_COMPONENTS=-x86 (without the above XML env vars) > It disables the x86-specific discovery which is useless for most cases > on Linux. >
Interesting, this last idea doesn't help. XML is much faster (0.14s), but normal discovery is still 1s without the x86-specific code. So what's really slow is reading sysfs and/or inserting all hwloc objects in the tree. I need to do some profiling. And I am moving the item "parallelize the discovery" higher in the TODO list :) Brice _______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users