The hwloc version will likely not change much regarding this hardware bug. Since your hardware/BIOS looks buggy, we can't do much about it except creating a valid XML that you could force to override the normal hardware-based discovery.
Brice Le 11/06/2014 21:16, Yury Vorobyov a écrit : > I do not see big difference... This time I used upstream version of > hwloc (not git live). > > $ lstopo > **************************************************************************** > * hwloc has encountered what looks like an error from the operating > system. > * > * L3 (P#6 cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset > 0x0000003f) without inclusion! > * Error occurred in topology.c line 940 > * > * Please report this error message to the hwloc user's mailing list, > * along with the output from the hwloc-gather-topology script. > **************************************************************************** > Machine > Socket L#0 > NUMANode L#0 (P#0) > L3 L#0 (6144KB) > L2 L#0 (2048KB) + L1i L#0 (64KB) > L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0) > L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1) > L2 L#1 (2048KB) + L1i L#1 (64KB) > L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2) > L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3) > L2 L#2 (2048KB) + L1i L#2 (64KB) > L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4) > L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5) > NUMANode L#1 (P#1) > L2 L#3 (2048KB) + L1i L#3 (64KB) > L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6) > L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7) > L2 L#4 (2048KB) + L1i L#4 (64KB) > L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8) > L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9) > L3 L#1 (6144KB) + L2 L#5 (2048KB) + L1i L#5 (64KB) > L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10) > L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11) > Socket L#1 > NUMANode L#2 (P#2) > L3 L#2 (6144KB) + L2 L#6 (2048KB) + L1i L#6 (64KB) > L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12) > L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13) > L2 L#7 (2048KB) + L1i L#7 (64KB) > L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14) > L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15) > L2 L#8 (2048KB) + L1i L#8 (64KB) > L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16) > L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17) > NUMANode L#3 (P#3) > L2 L#9 (2048KB) + L1i L#9 (64KB) > L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18) > L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19) > L3 L#3 (6144KB) > L2 L#10 (2048KB) + L1i L#10 (64KB) > L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20) > L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21) > L2 L#11 (2048KB) + L1i L#11 (64KB) > L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22) > L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23) > HostBridge L#0 > PCIBridge > PCI 10de:0f00 > PCIBridge > PCI 8086:10d3 > PCIBridge > PCI 8086:10d3 > PCIBridge > PCI 1002:6889 > PCI 1002:4390 > PCI 1002:439c > > > > On Tue, Apr 1, 2014 at 1:47 PM, Yury Vorobyov <teupol...@gmail.com > <mailto:teupol...@gmail.com>> wrote: > > Current BIOS version could be improperly detecting CPUs, which > engineering samples of 6348 (all characteristics are same). > > > On Tue, Apr 1, 2014 at 6:59 PM, Yury Vorobyov <teupol...@gmail.com > <mailto:teupol...@gmail.com>> wrote: > > The BIOS has latest version. If I should check some BIOS > information, I have access to hardware. Tell me what variables > from SMBIOS you want to see? > > > On Fri, Jan 31, 2014 at 1:07 PM, Brice Goglin > <brice.gog...@inria.fr <mailto:brice.gog...@inria.fr>> wrote: > > Hello, > > Your BIOS reports invalid L3 cache information. On these > processors, the L3 is shared by 6 cores, it covers 6 cores > of an entire half-socket NUMA node. But the BIOS says that > some L3 are shared between 4 cores, others by 6 cores. And > worse it says that some L3 is shared by some cores from a > NUMA node and others from another NUMA nodes, which causes > the error message (and these L3 cannot be inserted in the > topology). > > I see "AMD Eng Sample, ZS268145TCG54_32/26/20_2/16" in the > processor type, so it might explain why your BIOS is > somehow experimental. See if you can upgrade it. > > Also make sure your kernel isn't too old in case it misses > L3 info for these processors. At least 3.3 should be OK iirc. > > NUMA node sharing info: > $ cat sys/devices/system/node/node*/cpumap > 00000000,0000003f > 00000000,00000fc0 > 00000000,0003f000 > 00000000,00fc0000 > $ cat > sys/devices/system/cpu/cpu{?,??}/cache/index3/shared_cpu_map > 00000000,0000000f << wrong, should be 003f > 00000000,0000000f << wrong, should be 003f > 00000000,0000000f << wrong, should be 003f > 00000000,0000000f << wrong, should be 003f > 00000000,000003f0 <<impossible, should be 003f > 00000000,000003f0 <<impossible, should be 003f > 00000000,000003f0 <<impossible, should be 0fc0 > 00000000,000003f0 <<impossible, should be 0fc0 > 00000000,000003f0 <<impossible, should be 0fc0 > 00000000,000003f0 <<impossible, should be 0fc0 > 00000000,00000c00 <<wrong, should be 0fc0 > 00000000,00000c00 <<wrong, should be 0fc0 > 00000000,00003000 <<wrong, should be 003f000 > 00000000,00003000 <<wrong, should be 003f000 > 00000000,000fc000 <<impossible, should be 003f000 > 00000000,000fc000 <<impossible, should be 003f000 > 00000000,000fc000 <<impossible, should be 003f000 > 00000000,000fc000 <<impossible, should be 003f000 > 00000000,000fc000 <<impossible, should be 0fc0000 > 00000000,000fc000 <<impossible, should be 0fc0000 > 00000000,00f00000 <<wrong, should be 0fc0000 > 00000000,00f00000 <<wrong, should be 0fc0000 > 00000000,00f00000 <<wrong, should be 0fc0000 > 00000000,00f00000 <<wrong, should be 0fc0000 > > Brice > > > > Le 31/01/2014 03:46, Yury Vorobyov a écrit : >> I have got error about "intersecting caches". >> >> Info from hwloc in attachments. >> >> I never got this before. I use "live" builds of OpenMPI >> directly from repo. >> >> >> _______________________________________________ >> hwloc-users mailing list >> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > > > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2014/06/1039.php