Hello,

> Can you display both mask before set_area_membind and after
> get_area_membind and send the entire output of all processes and threads?
> If you can prefix the line with the PID, it'd help a lot :)
>
What do you mean with output of all processes and threads?
If I execute with 1 MPI rank and 4 threads on that rank I get the following
masks (all allocation is done on one thread, so only one pid):

pid=1799039
mask before set_area_membind: 0x00000001
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
mask before set_area_membind: 0x00000002
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
mask before set_area_membind: 0x00000004
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
mask before set_area_membind: 0x00000008
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff

 Mike

Am Mi., 2. März 2022 um 11:58 Uhr schrieb Brice Goglin <
brice.gog...@inria.fr>:

> Le 02/03/2022 à 11:38, Mike a écrit :
>
> Hello,
>
> If you print the set that is built before calling set_area_membind, you
>> should only see 4 bits in there, right? (since threadcount=4 in your code)
>>
>> I'd say 0xf for rank0, 0xf0 for rank1, etc.
>>
>> set_area_membind() will translate that into a single NUMA node, before
>> asking the kernel to bind. Later get_area_membind translate the single NUMA
>> node back into a set that contains all PUs of the NUMA node.
>>
>> That said, I am not sure I understand what threadcount means in your
>> code. Are you calling the allocate function multiple times with many
>> different ranks? (MPI ranks?)
>>
> The allocator function is called once for every MPI rank and threadcount
> is the number of threads that run on one MPI rank.
> I build the set so that only 1 bit is set before calling set_area_membind,
> so that the memory can only be bound to the specified hardware core.
> Basically, I call set_area_membind once for every thread on a MPI rank.
> After the allocation I will call hwloc_set_cpubind with a set that has
> again 1 bit set, so that (if all works properly) I bound an area of memory
> and a software thread to one specific hardware core.
>
>
> Can you display both mask before set_area_membind and after
> get_area_membind and send the entire output of all processes and threads?
> If you can prefix the line with the PID, it'd help a lot :)
>
> Brice
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to