Hello

I don't have access to a MI300A but I worked with AMD several month ago to solve a very similar issue. It was caused by a buggy APCI HMAT in the BIOS.

Try setting HWLOC_USE_NUMA_DISTANCES=0 in the environment to disable the hwloc code that uses this HMAT info. If the warning goes away then you need to get a more recent firmware. That said, it would be annoying if this old buggy firmware is still in the wild 4 months later.

Brice


Le 08/02/2024 à 18:23, Hartman, John a écrit :

Is there a timeline for hwloc to support the MI300A? Currently, hwloc isn’t happy when it encounters one:

****************************************************************************

* hwloc 2.9.0 received invalid information from the operating system.

*

* Failed with: intersection without inclusion

* while inserting Group0 (cpuset 0x000000ff,0xffff0000,0x00ffffff,0x000000ff,0xffff0000,0x00ffffff) at Group0 (cpuset 0x0000ffff,0xffffffff,,0x0000ffff,0xffffffff)

* coming from: linux:sysfs:numa

*

* The following FAQ entry in the hwloc documentation may help:

* What should I do when hwloc reports "operating system" warnings?

* Otherwise please report this error message to the hwloc user's mailing list,

* along with the files generated by the hwloc-gather-topology script.

*

* hwloc will now ignore this invalid topology information and continue.

****************************************************************************

John


_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to