The cgroup information under /sys/fs/cgroup/ should be fixed. cpuset.cpus should contain 0-3 and cpuset.mems should contain 0. In the meantime, hwloc may ignore this cgroup info if you set HWLOC_ALLOW=all in the environment.

The x86 CPUID information is also wrong on this machine. All 4 cores report the same "APIC id" (sort of hardware core ID), I guess all your 4 cores are virtualized over a single hardware core and the hypervisor doesn't care about emulating topology information correctly.

Brice



Le 02/08/2023 à 15:23, Max R. Dechantsreiter a écrit :
Hi Brice,

Well, the VPS gives me a 4-core slice of an Intel(R) Xeon(R)
CPU E5-2620 node, which is Sandy Bridge EP, with 6 physical
cores, so probably 12 cores on the node.  The numbering does
seem wacky: it seems to describe a node with 2 8-core CPUs.

This is the VPS on which I host my Web site; I use its shell
account for sundry testing, mostly of build procedures.

Is there anything I could do to get hwloc to work?

Regards,

Max
---


On Wed, Aug 02, 2023 at 03:12:27PM +0200, Brice Goglin wrote:
Hello

There's something wrong in this machine. It exposes 4 cores (number 0 to 3)
and no NUMA node, but says the only allowed resources are cores 8-15,24-31
and NUMA node 1. That's why hwloc says the topology is empty (running lstopo
--disallowed shows NUMA 0 and cores 0-3 in red, which means they aren't
allowed). How did this get configured so badly?

Brice



Le 02/08/2023 à 14:54, Max R. Dechantsreiter a écrit :
Hello,

On my VPS I tested my build of hwloc-2.9.2 by running lstopo:

./lstopo
hwloc: Topology became empty, aborting!
Segmentation fault

On a GCP n1-standard-2 a similar build (GCC 12.2 vs. 13.2) seemed to work:

./lstopo
hwloc/nvml: Failed to initialize with nvmlInit(): Driver Not Loaded
Machine (7430MB total)
     Package L#0
       NUMANode L#0 (P#0 7430MB)
       L3 L#0 (45MB) + L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core 
L#0
         PU L#0 (P#0)
         PU L#1 (P#1)
     HostBridge
       PCI 00:03.0 (Other)
         Block(Disk) "sda"
       PCI 00:04.0 (Ethernet)
         Net "ens4"
       PCI 00:05.0 (Other)

(from which I conclude my build procedure is correct).

At the suggestion of Brice Goglin (in response to my post of the same
issue to Open MPI Users), I rebuilt with '--enable-debug' and ran lstopo;
then I also ran

hwloc-gather-topology hwloc-gather-topology

The resulting lstopo.tar.gz and hwloc-gather-topology.tar.gz are attached,
as I was unable to recognize the underlying problem, although I believe it
could be a system issue, for my builds of OpenMPI on the VPS used to work
before a new OS image was installed.

Max


Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to