I guess the next step would be to look at how these tasks are placed on the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it starts placing a second task per NUMA node? For OMPI, --report-bindings may help. I am not sure about MPICH.
Brice Le 27/10/2015 15:52, Fabian Wein a écrit : > On 10/27/2015 03:42 PM, Brice Goglin wrote: >> I guess the problem is that your OMPI uses an old hwloc internally. That >> one may be too old to understand recent XML exports. >> Try replacing "Package" with "Socket" everywhere in the XML file. > > Thanks! That was it. > > I now get almost perfectly reproducible results. > > np speedup > 1 1.0 > 2 1.99 > 3 2.98 > 4 3.98 > 5 4.89 > 6 5.9 > 7 6.89 > 8 7.87 > 9 5.44 > 10 6.04 > 11 6.55 > 12 7.0 > 13 7.75 > 14 8.24 > 15 8.41 > 16 9.4 > 17 7.33 > 18 7.16 > 19 8.05 > 20 8.39 > > What still puzzles me is the almost perfect speedup up to eight and > than the > drop down. But for the beginning 8 is already good! > > Thanks again, > > Fabian > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2015/10/1210.php