The hwloc version will likely not change much regarding this hardware bug.
Since your hardware/BIOS looks buggy, we can't do much about it except
creating a valid XML that you could force to override the normal
hardware-based discovery.

Brice



Le 11/06/2014 21:16, Yury Vorobyov a écrit :
> I do not see big difference... This time I used upstream version of
> hwloc (not git live).
>
> $ lstopo
> ****************************************************************************
> * hwloc has encountered what looks like an error from the operating
> system.
> *
> * L3 (P#6 cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset
> 0x0000003f) without inclusion!
> * Error occurred in topology.c line 940
> *
> * Please report this error message to the hwloc user's mailing list,
> * along with the output from the hwloc-gather-topology script.
> ****************************************************************************
> Machine
>   Socket L#0
>     NUMANode L#0 (P#0)
>       L3 L#0 (6144KB)
>         L2 L#0 (2048KB) + L1i L#0 (64KB)
>           L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
>           L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
>         L2 L#1 (2048KB) + L1i L#1 (64KB)
>           L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
>           L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
>       L2 L#2 (2048KB) + L1i L#2 (64KB)
>         L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
>         L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
>     NUMANode L#1 (P#1)
>       L2 L#3 (2048KB) + L1i L#3 (64KB)
>         L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
>         L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
>       L2 L#4 (2048KB) + L1i L#4 (64KB)
>         L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
>         L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
>       L3 L#1 (6144KB) + L2 L#5 (2048KB) + L1i L#5 (64KB)
>         L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
>         L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
>   Socket L#1
>     NUMANode L#2 (P#2)
>       L3 L#2 (6144KB) + L2 L#6 (2048KB) + L1i L#6 (64KB)
>         L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
>         L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
>       L2 L#7 (2048KB) + L1i L#7 (64KB)
>         L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
>         L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
>       L2 L#8 (2048KB) + L1i L#8 (64KB)
>         L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
>         L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
>     NUMANode L#3 (P#3)
>       L2 L#9 (2048KB) + L1i L#9 (64KB)
>         L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
>         L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
>       L3 L#3 (6144KB)
>         L2 L#10 (2048KB) + L1i L#10 (64KB)
>           L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
>           L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
>         L2 L#11 (2048KB) + L1i L#11 (64KB)
>           L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
>           L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
>   HostBridge L#0
>     PCIBridge
>       PCI 10de:0f00
>     PCIBridge
>       PCI 8086:10d3
>     PCIBridge
>       PCI 8086:10d3
>     PCIBridge
>       PCI 1002:6889
>     PCI 1002:4390
>     PCI 1002:439c
>
>
>
> On Tue, Apr 1, 2014 at 1:47 PM, Yury Vorobyov <teupol...@gmail.com
> <mailto:teupol...@gmail.com>> wrote:
>
>     Current BIOS version could be improperly detecting CPUs, which
>     engineering samples of 6348 (all characteristics are same).
>
>
>     On Tue, Apr 1, 2014 at 6:59 PM, Yury Vorobyov <teupol...@gmail.com
>     <mailto:teupol...@gmail.com>> wrote:
>
>         The BIOS has latest version. If I should check some BIOS
>         information, I have access to hardware. Tell me what variables
>         from SMBIOS you want to see?
>
>
>         On Fri, Jan 31, 2014 at 1:07 PM, Brice Goglin
>         <brice.gog...@inria.fr <mailto:brice.gog...@inria.fr>> wrote:
>
>             Hello,
>
>             Your BIOS reports invalid L3 cache information. On these
>             processors, the L3 is shared by 6 cores, it covers 6 cores
>             of an entire half-socket NUMA node. But the BIOS says that
>             some L3 are shared between 4 cores, others by 6 cores. And
>             worse it says that some L3 is shared by some cores from a
>             NUMA node and others from another NUMA nodes, which causes
>             the error message (and these L3 cannot be inserted in the
>             topology).
>
>             I see "AMD Eng Sample, ZS268145TCG54_32/26/20_2/16" in the
>             processor type, so it might explain why your BIOS is
>             somehow experimental. See if you can upgrade it.
>
>             Also make sure your kernel isn't too old in case it misses
>             L3 info for these processors. At least 3.3 should be OK iirc.
>
>             NUMA node sharing info:
>             $ cat sys/devices/system/node/node*/cpumap
>             00000000,0000003f
>             00000000,00000fc0
>             00000000,0003f000
>             00000000,00fc0000
>             $ cat
>             sys/devices/system/cpu/cpu{?,??}/cache/index3/shared_cpu_map
>             00000000,0000000f << wrong, should be 003f
>             00000000,0000000f << wrong, should be 003f
>             00000000,0000000f << wrong, should be 003f
>             00000000,0000000f << wrong, should be 003f
>             00000000,000003f0 <<impossible, should be 003f
>             00000000,000003f0 <<impossible, should be 003f
>             00000000,000003f0 <<impossible, should be 0fc0
>             00000000,000003f0 <<impossible, should be 0fc0
>             00000000,000003f0 <<impossible, should be 0fc0
>             00000000,000003f0 <<impossible, should be 0fc0
>             00000000,00000c00 <<wrong, should be 0fc0
>             00000000,00000c00 <<wrong, should be 0fc0
>             00000000,00003000 <<wrong, should be 003f000
>             00000000,00003000 <<wrong, should be 003f000
>             00000000,000fc000 <<impossible, should be 003f000
>             00000000,000fc000 <<impossible, should be 003f000
>             00000000,000fc000 <<impossible, should be 003f000
>             00000000,000fc000 <<impossible, should be 003f000
>             00000000,000fc000 <<impossible, should be 0fc0000
>             00000000,000fc000 <<impossible, should be 0fc0000
>             00000000,00f00000 <<wrong, should be 0fc0000
>             00000000,00f00000 <<wrong, should be 0fc0000
>             00000000,00f00000 <<wrong, should be 0fc0000
>             00000000,00f00000 <<wrong, should be 0fc0000
>
>             Brice
>
>
>
>             Le 31/01/2014 03:46, Yury Vorobyov a écrit :
>>             I have got error about "intersecting caches".
>>
>>             Info from hwloc in attachments.
>>
>>             I never got this before. I use "live" builds of OpenMPI
>>             directly from repo.
>>
>>
>>             _______________________________________________
>>             hwloc-users mailing list
>>             hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>>             http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>             _______________________________________________
>             hwloc-users mailing list
>             hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>             http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2014/06/1039.php

Reply via email to