Hi Brice,

thanks a lot for the quick response!

I have tested the patch and it works just fine:-) [1]

I am trying to release hwloc 2.5 "soon". If that's too slow, please let me
> know, I'll see if I can do a 2.4.1 earlier.


There is no rush, 2.5 sounds great.

Merci beaucoup!
Jirka


[1]
$ ./utils/lstopo/lstopo-no-graphics
Machine (7615MB total)
 Package L#0
   NUMANode L#0 (P#2 7615MB)
   L3 L#0 (4096KB) + L2 L#0 (1024KB) + Core L#0
     L1d L#0 (32KB) + L1i L#0 (48KB)
       PU L#0 (P#0)
       PU L#1 (P#2)
       PU L#2 (P#4)
       PU L#3 (P#6)
     L1d L#1 (32KB) + L1i L#1 (48KB)
       PU L#4 (P#1)
       PU L#5 (P#3)
       PU L#6 (P#5)
       PU L#7 (P#7)
 Block(Disk) "sda"
 Net "env2"


On Mon, Apr 26, 2021 at 8:43 PM Brice Goglin <brice.gog...@inria.fr> wrote:

> This patch should fix the issue. We had to fix the same issue for CPU#0
> being offline recently but I didn't know it could be needed for NUMA node#0
> being offline too.
>
> I am trying to release hwloc 2.5 "soon". If that's too slow, please let me
> know, I'll see if I can do a 2.4.1 earlier.
>
> Brice
>
>
>
>
> commit 7c159d723432e461b4e48cc2d38212913d2ba7c7
> Author: Brice Goglin <brice.gog...@inria.fr> <brice.gog...@inria.fr>
> Date:   Mon Apr 26 20:35:42 2021 +0200
>
>     linux: fix support for NUMA node0 being oddline
>
>     Just like we didn't support offline CPU#0 until commit
>     7bcc273efd50536961ba16d474efca4ae163229b, we need to
>     support node0 being offline as well.
>     It's not clear whether it's a new Linux feature or not,
>     this was reported on a POWER LPAR VM.
>
>     We opportunistically assume node0 is online to avoid
>     the overhead in the vast majority of cases. If node0
>     is missing, we parse "online" to find the first node.
>
>     Thanks to Jirka Hladky for the report.
>
>     Signed-off-by: Brice Goglin <brice.gog...@inria.fr> 
> <brice.gog...@inria.fr>
>
> diff --git a/hwloc/topology-linux.c b/hwloc/topology-linux.c
> index 94b242dd0..10e038e64 100644
> --- a/hwloc/topology-linux.c
> +++ b/hwloc/topology-linux.c
> @@ -5264,6 +5264,9 @@ static const char *find_sysfs_cpu_path(int root_fd,
> int *old_filenames)
>
>  static const char *find_sysfs_node_path(int root_fd)
>  {
> +  unsigned first;
> +  int err;
> +
>    if (!hwloc_access("/sys/bus/node/devices", R_OK|X_OK, root_fd)
>        && !hwloc_access("/sys/bus/node/devices/node0/cpumap", R_OK, root_fd))
>      return "/sys/bus/node/devices";
> @@ -5272,6 +5275,28 @@ static const char *find_sysfs_node_path(int root_fd)
>        && !hwloc_access("/sys/devices/system/node/node0/cpumap", R_OK, 
> root_fd))
>      return "/sys/devices/system/node";
>
> +  /* node0 might be offline, fallback to looking at the first online node.
> +   * online contains comma-separated ranges, just read the first number.
> +   */
> +  hwloc_debug("Failed to find sysfs node files using node0, looking at 
> online nodes...\n");
> +  err = hwloc_read_path_as_uint("/sys/devices/system/node/online", &first, 
> root_fd);
> +  if (err) {
> +    hwloc_debug("Failed to find read /sys/devices/system/node/online.\n");
> +  } else {
> +    char path[PATH_MAX];
> +    hwloc_debug("Found node#%u as first online node\n", first);
> +
> +    snprintf(path, sizeof(path), "/sys/bus/node/devices/node%u/cpumap", 
> first);
> +    if (!hwloc_access("/sys/bus/node/devices", R_OK|X_OK, root_fd)
> +        && !hwloc_access(path, R_OK, root_fd))
> +      return "/sys/bus/node/devices";
> +
> +    snprintf(path, sizeof(path), "/sys/devices/system/node/node%u/cpumap", 
> first);
> +    if (!hwloc_access("/sys/devices/system/node", R_OK|X_OK, root_fd)
> +        && !hwloc_access(path, R_OK, root_fd))
> +      return "/sys/devices/system/node";
> +  }
> +
>    return NULL;
>  }
>
>
>
>
>
>
>
> Le 26/04/2021 à 16:48, Brice Goglin a écrit :
>
> Hello,
>
> Maybe we have something that assumes that the first NUMA node on Linux is
> #0. And something is wrong in the disallowed case anyway since the NUMA
> node physical number is 0 instead of 2 there.
>
> Can you run "hwloc-gather-topology lpar" and send the resulting
> lpar.tar.bz2? (send it only to me if it's too big or somehow confidential).
>
> Thanks
>
> Brice
>
>
>
> Le 26/04/2021 à 16:40, Jirka Hladky a écrit :
>
> Hi Brice,
>
> how are you doing? I hope you are fine. We are all well and safe.
>
> I have been running hwloc on IBM Power LPAR VM with only 1 CPU core and 8
> PUs [1]. There is only one NUMA node. The numbering is however quite
> strange, the NUMA node number is "2".  See [2].
>
> hwloc reports "Topology does not contain any NUMA node, aborting!"
>
> $ lstopo
> Topology does not contain any NUMA node, aborting!
> hwloc_topology_load() failed (No such file or directory).
>
> Could you please double-check if this behavior is correct? I believe hwloc
> should work on this HW setup.
>
> FYI, we can get it working with --disallowed option [3] (but I think it
> should work without this option as well)
>
> Thanks a lot!
> Jirka
>
>
> [1] $ lscpu
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):           1
> NUMA node(s):        1
>
> [2] There is ONE NUMA node with the number "2":
> $ numactl -H
> available: 1 nodes (2)
> node 2 cpus: 0 1 2 3 4 5 6 7
> node 2 size: 7614 MB
> node 2 free: 1098 MB
> node distances:
> node   2
>  2:  10
>
> [3]
> $ lstopo --disallowed
>
> Machine (7615MB total)
>  Package L#0
>    NUMANode L#0 (P#0 7615MB)
>    L3 L#0 (4096KB) + L2 L#0 (1024KB) + Core L#0
>      L1d L#0 (32KB) + L1i L#0 (48KB)
>        Die L#0 + PU L#0 (P#0)
>        PU L#1 (P#2)
>        PU L#2 (P#4)
>        PU L#3 (P#6)
>      L1d L#1 (32KB) + L1i L#1 (48KB)
>        PU L#4 (P#1)
>        PU L#5 (P#3)
>        PU L#6 (P#5)
>        PU L#7 (P#7)
>  Block(Disk) "sda"
>  Net "env2"
>
>
>
>
> _______________________________________________
> hwloc-devel mailing 
> listhwloc-de...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-devel
>
>
> _______________________________________________
> hwloc-devel mailing 
> listhwloc-de...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-devel
>
> _______________________________________________
> hwloc-devel mailing list
> hwloc-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-devel



-- 
-Jirka
_______________________________________________
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Reply via email to