Hello, If you print the set that is built before calling set_area_membind, you > should only see 4 bits in there, right? (since threadcount=4 in your code) > > I'd say 0xf for rank0, 0xf0 for rank1, etc. > > set_area_membind() will translate that into a single NUMA node, before > asking the kernel to bind. Later get_area_membind translate the single NUMA > node back into a set that contains all PUs of the NUMA node. > > That said, I am not sure I understand what threadcount means in your code. > Are you calling the allocate function multiple times with many different > ranks? (MPI ranks?) > The allocator function is called once for every MPI rank and threadcount is the number of threads that run on one MPI rank. I build the set so that only 1 bit is set before calling set_area_membind, so that the memory can only be bound to the specified hardware core. Basically, I call set_area_membind once for every thread on a MPI rank. After the allocation I will call hwloc_set_cpubind with a set that has again 1 bit set, so that (if all works properly) I bound an area of memory and a software thread to one specific hardware core.
Mike Am Mi., 2. März 2022 um 10:50 Uhr schrieb Brice Goglin < brice.gog...@inria.fr>: > > Le 02/03/2022 à 10:09, Mike a écrit : > > Ok then your mask 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff >> corresponds exactly to NUMA node 0 (socket 0). Object cpusets can be >> displayed on the command-line with "lstopo --cpuset" or "hwloc-calc numa:0". >> >> This would be OK if you're only spawning threads to the first socket. Do >> you see the same mask for threads on the other socket? >> > Yes, I do. > > > If you print the set that is built before calling set_area_membind, you > should only see 4 bits in there, right? (since threadcount=4 in your code) > > I'd say 0xf for rank0, 0xf0 for rank1, etc. > > set_area_membind() will translate that into a single NUMA node, before > asking the kernel to bind. Later get_area_membind translate the single NUMA > node back into a set that contains all PUs of the NUMA node. > > That said, I am not sure I understand what threadcount means in your code. > Are you calling the allocate function multiple times with many different > ranks? (MPI ranks?) > > Brice > > > > > Mike > > Am Mi., 2. März 2022 um 09:53 Uhr schrieb Brice Goglin < > brice.gog...@inria.fr>: > >> Le 02/03/2022 à 09:39, Mike a écrit : >> >> Hello, >> >> Please run "lstopo -.synthetic" to compress the output a lot. I will be >>> able to reuse it from here and understand your binding mask. >>> >> Package:2 [NUMANode(memory=270369247232)] L3Cache:8(size=33554432) >> L2Cache:8(size=524288) L1dCache:1(size=32768) L1iCache:1(size=32768) Core:1 >> PU:2(indexes=2*128:1*2) >> >> >> Ok then your mask 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff >> corresponds exactly to NUMA node 0 (socket 0). Object cpusets can be >> displayed on the command-line with "lstopo --cpuset" or "hwloc-calc numa:0". >> >> This would be OK if you're only spawning threads to the first socket. Do >> you see the same mask for threads on the other socket? >> >> Brice >> >> _______________________________________________ >> hwloc-users mailing list >> hwloc-users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/hwloc-users > > > _______________________________________________ > hwloc-users mailing > listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users > > _______________________________________________ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-users
_______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users