https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110319
Bug ID: 110319
Summary: Performance slowdown using a pointer to perform a
reduction vs. using a normal variable
Product: gcc
Version: 11.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: lorien.lopez at unizar dot es
CC: jakub at gcc dot gnu.org
Target Milestone: ---
When performing an OpenMP sum reduction into a regular variable, GCC uses a
"lock cmpxchg" instruction. In contrast, when the reduction is performed into a
pointer, it uses an OpenMP atomic region. The second version is several times
slower in an Intel Skylake CPU.
The original report can be found in Stack Overflow:
https://stackoverflow.com/questions/76480632/performance-slowdown-using-a-pointer-to-perform-a-reduction-vs-using-a-normal-v