https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108037
Bug ID: 108037 Summary: prefer for affinity with OMP_PROC_BIND=true to match "spread" instead of "close" Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yhe at lbl dot gov Target Milestone: --- With gcc version 11.2.0 (same for a few previous versions too), it seems that OMP_PROC_BIND=true does the same affinity as OMP_PROC_BIND=close. When there are multiple hyperthreads per physical core, and when OMP_PLACES=threads is set, it will end up with multiple threads bind on the same physical core, which is not optimal. I would like to propose OMP_PROC_BIND=true use the same affinity as OMP_PROC_SPREAD, which is seen in a few other compilers (nvidia, cray, for example). Below are some sample affinity output on an Intel Haswell node (32 physical cores, 2 hyperthreads each). % numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 node 0 size: 257592 MB node 0 free: 188196 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 node 1 size: 257527 MB node 1 free: 173862 MB node distances: node 0 1 0: 10 21 1: 21 10 % gcc --version gcc (GCC) 11.2.0 20210728 (Cray Inc.) Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. % more hello-omp.c #include <omp.h> #include <stdio.h> int main () { #pragma omp parallel printf("Hello World... from thread = %d\n", omp_get_thread_num()); } % gcc -fopenmp hello-omp.c -o hello % export OMP_NUM_THREADS=4 % export OMP_PLACES=threads % export OMP_DISPLAY_AFFINITY=true % export OMP_PROC_BIND=true % ./hello |sort -k4n level 1 thread 0x1555554f3d80 affinity 0 level 1 thread 0x15555490b700 affinity 32 level 1 thread 0x15555470a700 affinity 1 level 1 thread 0x155554509700 affinity 33 Hello World... from thread = 0 Hello World... from thread = 1 Hello World... from thread = 2 Hello World... from thread = 3 % export OMP_PROC_BIND=close % ./hello |sort -k4n level 1 thread 0x1555554f3d80 affinity 0 level 1 thread 0x15555490b700 affinity 32 level 1 thread 0x15555470a700 affinity 1 level 1 thread 0x155554509700 affinity 33 Hello World... from thread = 0 Hello World... from thread = 1 % export OMP_PROC_BIND=spread % ./hello |sort -k4n level 1 thread 0x1555554f3d80 affinity 0 level 1 thread 0x15555490b700 affinity 8 level 1 thread 0x15555470a700 affinity 16 level 1 thread 0x155554509700 affinity 24 Hello World... from thread = 0 Hello World... from thread = 1 Hello World... from thread = 2 Hello World... from thread = 3 When setting OMP_PLACES=cores, even when OMP_PROC_BIND=true still does the same as OMP_PROC_BIND=close, the affinity for pure OpenMP codes would be fine. However, it is still preferred that OMP_PROC_BIND=true to use the same affinity as OMP_PROC_BIND=spread for optimal process and thread affinity for hybrid MPI/OpenMP codes. % export OMP_PLACES=cores % export OMP_PROC_BIND=true %./hello |sort -k4n level 1 thread 0x1555554f3d80 affinity 0,32 level 1 thread 0x15555490b700 affinity 1,33 level 1 thread 0x15555470a700 affinity 2,34 level 1 thread 0x155554509700 affinity 3,35 Hello World... from thread = 0 Hello World... from thread = 1 Hello World... from thread = 2 Hello World... from thread = 3 % export OMP_PROC_BIND=close % ./hello |sort -k4n level 1 thread 0x1555554f3d80 affinity 0,32 level 1 thread 0x15555490b700 affinity 1,33 level 1 thread 0x15555470a700 affinity 2,34 level 1 thread 0x155554509700 affinity 3,35 Hello World... from thread = 0 Hello World... from thread = 1 Hello World... from thread = 2 Hello World... from thread = 3 % export OMP_PROC_BIND=spread % ./hello |sort -k4n level 1 thread 0x1555554f3d80 affinity 0,32 level 1 thread 0x15555490b700 affinity 8,40 level 1 thread 0x15555470a700 affinity 16,48 level 1 thread 0x155554509700 affinity 24,56 Hello World... from thread = 0 Hello World... from thread = 1 Hello World... from thread = 2 Hello World... from thread = 3