Le 09/05/2016 23:58, Mehmet Belgin a écrit : > Greetings! > > We've been receiving this error for a while on our 64-core Interlagos > AMD machines: > > **************************************************************************** > > * hwloc has encountered what looks like an error from the operating > system. > * > * Socket (P#2 cpuset 0x0000ffff,0x0) intersects with NUMANode (P#3 > cpuset 0x0000ff00,0xff000000) without inclusion! > * Error occurred in topology.c line 940 > * > * Please report this error message to the hwloc user's mailing list, > * along with the output+tarball generated by the hwloc-gather-topology > script. > **************************************************************************** > > > I've found some information in the hwloc list archives mentioning this > is due to buggy AMD platform and the impact should be limited to hwloc > missing L3 cache info (thanks Brice). If that's the case and processor > representation is correct then I am sure we can live with this, but I > still wanted to check with the list to confirm that (1) this is really > harmless and (2) are there any known solutions other than upgrading > BIOS/kernel?
Hello The L3 bug only applies to 12-core Opteron 62xx/63xx, while you have 16-core Opterons. Your L3 locality is correct, but your NUMA locality is wrong: $ cat sys/devices/system/node/node*/cpumap 00000000,00ffffff 0000ff00,ff000000 000000ff,00000000 ffff0000,00000000 You should have something like this instead: 00000000,0000ffff 00000000,ffff0000 0000ffff,00000000 ffff0000,00000000 This bug is not harmless since memory buffers have a good chance of being physically allocated far away from your cores. This is more likely a BIOS bug. Try upgrading. Regards Brice