Dear hwloc experts,
Using hwloc 1.11.13 I receive an "incorrect PCI locality information"
error message. The complete message is attached as file
"lstopo_1.11.13.err".
I get this error on a dual socket Xeon Platinum 9242 system running
CentOS 7.8.
I don't see this error on a dual socket Xeon Gold 6148 system running
the same CentOS release (7.8).
And if I remember correctly, I also did not see that error earlier with
our dual socket Xeon Platinum 9242 system before it was updated to
version 7.8 of CentOS.
So to me it is the combination of that specific CentOS release (7.8) and
that particular CPU type (Xeon Platinum 9242) which triggers the error
in hwloc 1.11.13.
With hwloc 2.1.0, however, I do not see any error message. For your
reference, I am attaching the XML output files obtained from hwloc
1.11.13 and 2.1.0.
Unfortunately, I cannot switch from hwloc 1.x to 2.x because I need to
compile OpenMPI 3.x where hwloc 1.x is required. And simply setting
HWLOC_HIDE_ERRORS is not a true solution.
Could someone please provide a fix for this particular problem in hwloc 1.x?
Thank you in advance -
Christian Tuma
--
Dr. Christian Tuma
Consultant, Supercomputing
Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
+49 30 84185132 | t...@zib.de | www.zib.de
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus :40 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI__40_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your
environment.
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus :44 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI__44_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your
environment.
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus :53 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI__53_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your
environment.
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus :62 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI__62_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your
environment.
* hwloc 1.11.13 has encountered an incorrect PCI locality information.
* PCI bus :71 is supposedly close to 2nd NUMA node of 1st package,
* however hwloc believes this is impossible on this architecture.
* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.
*
* If you feel this fixup is wrong, disable it by setting in your environment
* HWLOC_PCI__71_LOCALCPUS= (empty value), and report the problem
* to the hwloc's user mailing list together with the XML output of lstopo.
*
* You may silence this message by setting HWLOC_HIDE_ERRORS=1