Hi Brice, thanks a lot for the quick response!
I have tested the patch and it works just fine:-) [1] I am trying to release hwloc 2.5 "soon". If that's too slow, please let me > know, I'll see if I can do a 2.4.1 earlier. There is no rush, 2.5 sounds great. Merci beaucoup! Jirka [1] $ ./utils/lstopo/lstopo-no-graphics Machine (7615MB total) Package L#0 NUMANode L#0 (P#2 7615MB) L3 L#0 (4096KB) + L2 L#0 (1024KB) + Core L#0 L1d L#0 (32KB) + L1i L#0 (48KB) PU L#0 (P#0) PU L#1 (P#2) PU L#2 (P#4) PU L#3 (P#6) L1d L#1 (32KB) + L1i L#1 (48KB) PU L#4 (P#1) PU L#5 (P#3) PU L#6 (P#5) PU L#7 (P#7) Block(Disk) "sda" Net "env2" On Mon, Apr 26, 2021 at 8:43 PM Brice Goglin <brice.gog...@inria.fr> wrote: > This patch should fix the issue. We had to fix the same issue for CPU#0 > being offline recently but I didn't know it could be needed for NUMA node#0 > being offline too. > > I am trying to release hwloc 2.5 "soon". If that's too slow, please let me > know, I'll see if I can do a 2.4.1 earlier. > > Brice > > > > > commit 7c159d723432e461b4e48cc2d38212913d2ba7c7 > Author: Brice Goglin <brice.gog...@inria.fr> <brice.gog...@inria.fr> > Date: Mon Apr 26 20:35:42 2021 +0200 > > linux: fix support for NUMA node0 being oddline > > Just like we didn't support offline CPU#0 until commit > 7bcc273efd50536961ba16d474efca4ae163229b, we need to > support node0 being offline as well. > It's not clear whether it's a new Linux feature or not, > this was reported on a POWER LPAR VM. > > We opportunistically assume node0 is online to avoid > the overhead in the vast majority of cases. If node0 > is missing, we parse "online" to find the first node. > > Thanks to Jirka Hladky for the report. > > Signed-off-by: Brice Goglin <brice.gog...@inria.fr> > <brice.gog...@inria.fr> > > diff --git a/hwloc/topology-linux.c b/hwloc/topology-linux.c > index 94b242dd0..10e038e64 100644 > --- a/hwloc/topology-linux.c > +++ b/hwloc/topology-linux.c > @@ -5264,6 +5264,9 @@ static const char *find_sysfs_cpu_path(int root_fd, > int *old_filenames) > > static const char *find_sysfs_node_path(int root_fd) > { > + unsigned first; > + int err; > + > if (!hwloc_access("/sys/bus/node/devices", R_OK|X_OK, root_fd) > && !hwloc_access("/sys/bus/node/devices/node0/cpumap", R_OK, root_fd)) > return "/sys/bus/node/devices"; > @@ -5272,6 +5275,28 @@ static const char *find_sysfs_node_path(int root_fd) > && !hwloc_access("/sys/devices/system/node/node0/cpumap", R_OK, > root_fd)) > return "/sys/devices/system/node"; > > + /* node0 might be offline, fallback to looking at the first online node. > + * online contains comma-separated ranges, just read the first number. > + */ > + hwloc_debug("Failed to find sysfs node files using node0, looking at > online nodes...\n"); > + err = hwloc_read_path_as_uint("/sys/devices/system/node/online", &first, > root_fd); > + if (err) { > + hwloc_debug("Failed to find read /sys/devices/system/node/online.\n"); > + } else { > + char path[PATH_MAX]; > + hwloc_debug("Found node#%u as first online node\n", first); > + > + snprintf(path, sizeof(path), "/sys/bus/node/devices/node%u/cpumap", > first); > + if (!hwloc_access("/sys/bus/node/devices", R_OK|X_OK, root_fd) > + && !hwloc_access(path, R_OK, root_fd)) > + return "/sys/bus/node/devices"; > + > + snprintf(path, sizeof(path), "/sys/devices/system/node/node%u/cpumap", > first); > + if (!hwloc_access("/sys/devices/system/node", R_OK|X_OK, root_fd) > + && !hwloc_access(path, R_OK, root_fd)) > + return "/sys/devices/system/node"; > + } > + > return NULL; > } > > > > > > > > Le 26/04/2021 à 16:48, Brice Goglin a écrit : > > Hello, > > Maybe we have something that assumes that the first NUMA node on Linux is > #0. And something is wrong in the disallowed case anyway since the NUMA > node physical number is 0 instead of 2 there. > > Can you run "hwloc-gather-topology lpar" and send the resulting > lpar.tar.bz2? (send it only to me if it's too big or somehow confidential). > > Thanks > > Brice > > > > Le 26/04/2021 à 16:40, Jirka Hladky a écrit : > > Hi Brice, > > how are you doing? I hope you are fine. We are all well and safe. > > I have been running hwloc on IBM Power LPAR VM with only 1 CPU core and 8 > PUs [1]. There is only one NUMA node. The numbering is however quite > strange, the NUMA node number is "2". See [2]. > > hwloc reports "Topology does not contain any NUMA node, aborting!" > > $ lstopo > Topology does not contain any NUMA node, aborting! > hwloc_topology_load() failed (No such file or directory). > > Could you please double-check if this behavior is correct? I believe hwloc > should work on this HW setup. > > FYI, we can get it working with --disallowed option [3] (but I think it > should work without this option as well) > > Thanks a lot! > Jirka > > > [1] $ lscpu > Architecture: ppc64le > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 8 > Core(s) per socket: 1 > Socket(s): 1 > NUMA node(s): 1 > > [2] There is ONE NUMA node with the number "2": > $ numactl -H > available: 1 nodes (2) > node 2 cpus: 0 1 2 3 4 5 6 7 > node 2 size: 7614 MB > node 2 free: 1098 MB > node distances: > node 2 > 2: 10 > > [3] > $ lstopo --disallowed > > Machine (7615MB total) > Package L#0 > NUMANode L#0 (P#0 7615MB) > L3 L#0 (4096KB) + L2 L#0 (1024KB) + Core L#0 > L1d L#0 (32KB) + L1i L#0 (48KB) > Die L#0 + PU L#0 (P#0) > PU L#1 (P#2) > PU L#2 (P#4) > PU L#3 (P#6) > L1d L#1 (32KB) + L1i L#1 (48KB) > PU L#4 (P#1) > PU L#5 (P#3) > PU L#6 (P#5) > PU L#7 (P#7) > Block(Disk) "sda" > Net "env2" > > > > > _______________________________________________ > hwloc-devel mailing > listhwloc-de...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-devel > > > _______________________________________________ > hwloc-devel mailing > listhwloc-de...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-devel > > _______________________________________________ > hwloc-devel mailing list > hwloc-devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-devel -- -Jirka
_______________________________________________ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel