Le 29/12/2017 à 23:15, Bill Broadley a écrit : > > > Very interesting, I was running parallel finite element code and was seeing > great performance compared to Intel in most cases, but on larger runs it was > 20x > slower. This would explain it. > > Do you know which commit, or anything else that might help find any related > discussion? I tried a few google searches without luck. > > Is it specific to the 24-core? The slowdown I described happened on a 32 core > Epyc single socket as well as a dual socket 24 core AMD Epyc system.
Hello Yes it's 24-core specific (that's the only core-count that doesn't have 8-core per zeppelin module). The commit in Linux git master is 2b83809a5e6d619a780876fcaf68cdc42b50d28c Brice commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c Author: Suravee Suthikulpanit <suravee.suthikulpa...@amd.com> Date: Mon Jul 31 10:51:59 2017 +0200 x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID to calculate shared_cpu_map. However, APIC IDs are not guaranteed to be contiguous for cores across different L3s (e.g. family17h system w/ downcore configuration). This breaks the logic, and results in an incorrect L3 shared_cpu_map. Instead, always use the previously calculated cpu_llc_shared_mask of each CPU to derive the L3 shared_cpu_map. _______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users