Hello, > Can you display both mask before set_area_membind and after > get_area_membind and send the entire output of all processes and threads? > If you can prefix the line with the PID, it'd help a lot :) > What do you mean with output of all processes and threads? If I execute with 1 MPI rank and 4 threads on that rank I get the following masks (all allocation is done on one thread, so only one pid):
pid=1799039 mask before set_area_membind: 0x00000001 mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff mask before set_area_membind: 0x00000002 mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff mask before set_area_membind: 0x00000004 mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff mask before set_area_membind: 0x00000008 mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff Mike Am Mi., 2. März 2022 um 11:58 Uhr schrieb Brice Goglin < brice.gog...@inria.fr>: > Le 02/03/2022 à 11:38, Mike a écrit : > > Hello, > > If you print the set that is built before calling set_area_membind, you >> should only see 4 bits in there, right? (since threadcount=4 in your code) >> >> I'd say 0xf for rank0, 0xf0 for rank1, etc. >> >> set_area_membind() will translate that into a single NUMA node, before >> asking the kernel to bind. Later get_area_membind translate the single NUMA >> node back into a set that contains all PUs of the NUMA node. >> >> That said, I am not sure I understand what threadcount means in your >> code. Are you calling the allocate function multiple times with many >> different ranks? (MPI ranks?) >> > The allocator function is called once for every MPI rank and threadcount > is the number of threads that run on one MPI rank. > I build the set so that only 1 bit is set before calling set_area_membind, > so that the memory can only be bound to the specified hardware core. > Basically, I call set_area_membind once for every thread on a MPI rank. > After the allocation I will call hwloc_set_cpubind with a set that has > again 1 bit set, so that (if all works properly) I bound an area of memory > and a software thread to one specific hardware core. > > > Can you display both mask before set_area_membind and after > get_area_membind and send the entire output of all processes and threads? > If you can prefix the line with the PID, it'd help a lot :) > > Brice > > > _______________________________________________ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-users
_______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users