Le 02/03/2022 à 12:31, Mike a écrit :
Hello,

    Can you display both mask before set_area_membind and after
    get_area_membind and send the entire output of all processes and
    threads? If you can prefix the line with the PID, it'd help a lot :)

What do you mean with output of all processes and threads?
If I execute with 1 MPI rank and 4 threads on that rank I get the following masks (all allocation is done on one thread, so only one pid):

pid=1799039
mask before set_area_membind: 0x00000001
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
mask before set_area_membind: 0x00000002
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
mask before set_area_membind: 0x00000004
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff
mask before set_area_membind: 0x00000008
mask after get_area_membind: 0xffffffff,0xffffffff,,,0xffffffff,0xffffffff


Everything looks normal here. With a single rank and 4 threads, your 4 threads go on the first 4 cores. All of them are inside the first NUMA node. It's normal that all memory goes there.

Your code won't use any core of the second socket/NUMAnode unless you have more than 64 threads, so you need more than 16 MPI ranks. Ranks 17 and above will allocate memory on the second socket/NUMAnode.

Brice


Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to