HelloI don't have access to a MI300A but I worked with AMD several month ago to solve a very similar issue. It was caused by a buggy APCI HMAT in the BIOS.
Try setting HWLOC_USE_NUMA_DISTANCES=0 in the environment to disable the hwloc code that uses this HMAT info. If the warning goes away then you need to get a more recent firmware. That said, it would be annoying if this old buggy firmware is still in the wild 4 months later.
Brice Le 08/02/2024 à 18:23, Hartman, John a écrit :
Is there a timeline for hwloc to support the MI300A? Currently, hwloc isn’t happy when it encounters one:**************************************************************************** * hwloc 2.9.0 received invalid information from the operating system. * * Failed with: intersection without inclusion* while inserting Group0 (cpuset 0x000000ff,0xffff0000,0x00ffffff,0x000000ff,0xffff0000,0x00ffffff) at Group0 (cpuset 0x0000ffff,0xffffffff,,0x0000ffff,0xffffffff)* coming from: linux:sysfs:numa * * The following FAQ entry in the hwloc documentation may help: * What should I do when hwloc reports "operating system" warnings?* Otherwise please report this error message to the hwloc user's mailing list,* along with the files generated by the hwloc-gather-topology script. * * hwloc will now ignore this invalid topology information and continue. **************************************************************************** John _______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users