Hello

This is a kernel bug for 12-core AMD Bulldozer/Piledriver (62xx/63xx)
processors. hwloc is just complaining about buggy L3 information. lstopo
should report one L3 above each set of 6 cores below each NUMA node.
Instead you get strange L3s with 2, 4 or 6 cores.

If you're not binding tasks based on L3 locality and if your
applications do not care about L3, you can pass HWLOC_HIDE_ERRORS=1 in
the environment to hide the message.

AMD was working on a kernel patch but it doesn't seem to be in the
upstream Linux yet. In hwloc v1.11.2, you can workaround the problem by
passing HWLOC_COMPONENTS=x86 in the environment.

I am not sure why CentOS 6.5 didn't complain. That 2.6.32 kernel should
be buggy too, and old hwloc releases already complained about such bugs.

thanks
Brice






Le 07/01/2016 04:10, David Winslow a écrit :
> I upgraded our servers from Centos 6.5 to Centos7.2. Since then, when I run 
> mpirun I get the following error but the software continues to run and it 
> appears to work fine.
>
> * hwloc 1.11.0rc3-git has encountered what looks like an error from the 
> operating system.
> *
> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) 
> without inclusion!
> * Error occurred in topology.c line 983
> *
> * The following FAQ entry in the hwloc documentation may help:
> *   What should I do when hwloc reports "operating system" warnings?
> * Otherwise please report this error message to the hwloc user's mailing list,
> * along with the output+tarball generated by the hwloc-gather-topology script.
>
> I can replicate the error by simply running hwloc-info.
>
> The version of hwloc used with mpirun is 1.9. The version installed on the 
> server that I ran is 1.7 that comes with Centos 7. They both give the error 
> with minor differences shown below.
>
> With hwloc 1.7
> * object (L3 cpuset 0x000003f0) intersection without inclusion!
> * Error occurred in topology.c line 753
>
> With hwloc 1.9
> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f) 
> without inclusion!
> * Error occurred in topology.c line 983
>
> The current kernel is 3.10.0-327.el7.x86_64. I’ve tried updating the kernel 
> to a minor release update and even tried to install kernel v4.4.3. None of 
> the kernels worked. Again, hwloc works fine in Centos 6.5 with kernel 
> 2.6.32-431.29.2.el6.x86_64.
>
> I’ve attached the files generated by hwloc-gather-topology.sh.  I compared 
> what this script says is the expected output to the actual output and, from 
> what I can tell, they look the same. Maybe I’m missing something after 
> staring all day at the information.
>
> I did a clean install of the OS to perform the upgrade from 6.5.
>
> I’ve attached the results of the hwloc-gather-topology.sh script. Any help 
> will be greatly appreciated.
>
>
>
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1238.php

Reply via email to