https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108037

            Bug ID: 108037
           Summary: prefer for affinity with OMP_PROC_BIND=true to match
                    "spread" instead of "close"
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yhe at lbl dot gov
  Target Milestone: ---

With gcc version 11.2.0 (same for a few previous versions too), it seems that
OMP_PROC_BIND=true does the same affinity as OMP_PROC_BIND=close. When there
are multiple hyperthreads per physical core, and when OMP_PLACES=threads is
set, it will end up with multiple threads bind on the same physical core, which
is not optimal.

I would like to propose OMP_PROC_BIND=true use the same affinity as
OMP_PROC_SPREAD, which is seen in a few other compilers (nvidia, cray, for
example).

Below are some sample affinity output on an Intel Haswell node (32 physical
cores, 2 hyperthreads each).

% numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
48 50 52 54 56 58 60 62
node 0 size: 257592 MB
node 0 free: 188196 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
49 51 53 55 57 59 61 63
node 1 size: 257527 MB
node 1 free: 173862 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

% gcc --version
gcc (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% more hello-omp.c
#include <omp.h>
#include <stdio.h>
int main ()  
{
#pragma omp parallel
   printf("Hello World... from thread = %d\n", omp_get_thread_num());
}

% gcc -fopenmp hello-omp.c -o hello

% export OMP_NUM_THREADS=4
% export OMP_PLACES=threads
% export OMP_DISPLAY_AFFINITY=true

% export OMP_PROC_BIND=true
% ./hello |sort -k4n
level 1 thread 0x1555554f3d80 affinity 0
level 1 thread 0x15555490b700 affinity 32
level 1 thread 0x15555470a700 affinity 1
level 1 thread 0x155554509700 affinity 33
Hello World... from thread = 0
Hello World... from thread = 1
Hello World... from thread = 2
Hello World... from thread = 3

% export OMP_PROC_BIND=close
% ./hello |sort -k4n
level 1 thread 0x1555554f3d80 affinity 0
level 1 thread 0x15555490b700 affinity 32
level 1 thread 0x15555470a700 affinity 1
level 1 thread 0x155554509700 affinity 33
Hello World... from thread = 0
Hello World... from thread = 1

% export OMP_PROC_BIND=spread
% ./hello |sort -k4n
level 1 thread 0x1555554f3d80 affinity 0
level 1 thread 0x15555490b700 affinity 8
level 1 thread 0x15555470a700 affinity 16
level 1 thread 0x155554509700 affinity 24
Hello World... from thread = 0
Hello World... from thread = 1
Hello World... from thread = 2
Hello World... from thread = 3

When setting OMP_PLACES=cores, even when OMP_PROC_BIND=true still does the same
as OMP_PROC_BIND=close, the affinity for pure OpenMP codes would be fine.
However, it is still preferred that OMP_PROC_BIND=true to use the same affinity
as OMP_PROC_BIND=spread for optimal process and thread affinity for hybrid
MPI/OpenMP codes. 

% export OMP_PLACES=cores  

% export OMP_PROC_BIND=true
%./hello |sort -k4n
level 1 thread 0x1555554f3d80 affinity 0,32
level 1 thread 0x15555490b700 affinity 1,33
level 1 thread 0x15555470a700 affinity 2,34
level 1 thread 0x155554509700 affinity 3,35
Hello World... from thread = 0
Hello World... from thread = 1
Hello World... from thread = 2
Hello World... from thread = 3

% export OMP_PROC_BIND=close 
% ./hello |sort -k4n
level 1 thread 0x1555554f3d80 affinity 0,32
level 1 thread 0x15555490b700 affinity 1,33
level 1 thread 0x15555470a700 affinity 2,34
level 1 thread 0x155554509700 affinity 3,35
Hello World... from thread = 0
Hello World... from thread = 1
Hello World... from thread = 2
Hello World... from thread = 3

% export OMP_PROC_BIND=spread
% ./hello |sort -k4n
level 1 thread 0x1555554f3d80 affinity 0,32
level 1 thread 0x15555490b700 affinity 8,40
level 1 thread 0x15555470a700 affinity 16,48
level 1 thread 0x155554509700 affinity 24,56
Hello World... from thread = 0
Hello World... from thread = 1
Hello World... from thread = 2
Hello World... from thread = 3

Reply via email to