Hello,

If you print the set that is built before calling set_area_membind, you
> should only see 4 bits in there, right? (since threadcount=4 in your code)
>
> I'd say 0xf for rank0, 0xf0 for rank1, etc.
>
> set_area_membind() will translate that into a single NUMA node, before
> asking the kernel to bind. Later get_area_membind translate the single NUMA
> node back into a set that contains all PUs of the NUMA node.
>
> That said, I am not sure I understand what threadcount means in your code.
> Are you calling the allocate function multiple times with many different
> ranks? (MPI ranks?)
>
The allocator function is called once for every MPI rank and threadcount is
the number of threads that run on one MPI rank.
I build the set so that only 1 bit is set before calling set_area_membind,
so that the memory can only be bound to the specified hardware core.
Basically, I call set_area_membind once for every thread on a MPI rank.
After the allocation I will call hwloc_set_cpubind with a set that has
again 1 bit set, so that (if all works properly) I bound an area of memory
and a software thread to one specific hardware core.

Mike


Am Mi., 2. März 2022 um 10:50 Uhr schrieb Brice Goglin <
brice.gog...@inria.fr>:

>
> Le 02/03/2022 à 10:09, Mike a écrit :
>
> Ok then your mask 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
>> corresponds exactly to NUMA node 0 (socket 0). Object cpusets can be
>> displayed on the command-line with "lstopo --cpuset" or "hwloc-calc numa:0".
>>
>> This would be OK if you're only spawning threads to the first socket. Do
>> you see the same mask for threads on the other socket?
>>
> Yes, I do.
>
>
> If you print the set that is built before calling set_area_membind, you
> should only see 4 bits in there, right? (since threadcount=4 in your code)
>
> I'd say 0xf for rank0, 0xf0 for rank1, etc.
>
> set_area_membind() will translate that into a single NUMA node, before
> asking the kernel to bind. Later get_area_membind translate the single NUMA
> node back into a set that contains all PUs of the NUMA node.
>
> That said, I am not sure I understand what threadcount means in your code.
> Are you calling the allocate function multiple times with many different
> ranks? (MPI ranks?)
>
> Brice
>
>
>
>
> Mike
>
> Am Mi., 2. März 2022 um 09:53 Uhr schrieb Brice Goglin <
> brice.gog...@inria.fr>:
>
>> Le 02/03/2022 à 09:39, Mike a écrit :
>>
>> Hello,
>>
>> Please run "lstopo -.synthetic" to compress the output a lot. I will be
>>> able to reuse it from here and understand your binding mask.
>>>
>> Package:2 [NUMANode(memory=270369247232)] L3Cache:8(size=33554432)
>> L2Cache:8(size=524288) L1dCache:1(size=32768) L1iCache:1(size=32768) Core:1
>> PU:2(indexes=2*128:1*2)
>>
>>
>> Ok then your mask 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
>> corresponds exactly to NUMA node 0 (socket 0). Object cpusets can be
>> displayed on the command-line with "lstopo --cpuset" or "hwloc-calc numa:0".
>>
>> This would be OK if you're only spawning threads to the first socket. Do
>> you see the same mask for threads on the other socket?
>>
>> Brice
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
>
> _______________________________________________
> hwloc-users mailing 
> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to