https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
Bug ID: 114635 Summary: OpenMP reductions fail dependency analysis Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The following testcase reduced from an HPC workload: #include <math.h> #define RESTRICT restrict void work(int n, float *RESTRICT x, float *RESTRICT y, float *RESTRICT z, float *RESTRICT mass, float x0, float y0, float z0, float *RESTRICT ax, float *RESTRICT ay, float *RESTRICT az) { float lax = 0.0f, lay = 0.0f, laz = 0.0f; #if _OPENMP >= 201307 #pragma omp simd reduction(+:lax,lay,laz) #endif for (int i = 0; i < n; ++i) { float dx = x[i] - x0; float dy = y[i] - y0; float dz = z[i] - z0; float r2 = dx + dy + dz; if (r2 == 0.0f) continue; float f = (1.0f / (r2 * sqrtf(r2))) * mass[i]; lax += f * dx; lay += f * dy; laz += f * dz; } *ax += lax; *ay += lay; *az += laz; } when compiled with -Ofast -march=armv9-a -fopenmp-simd vectorizes as expected but when the pragma is in effect, e.g. -Ofast -march=armv9-a -fopenmp then the main loop fails to vectorize with: (compute_affine_dependence ref_a: D.5962[_33], stmt_a: _69 = D.5962[_33]; ref_b: D.5962[_33], stmt_b: D.5962[_33] = _ifc__147; ) -> dependence analysis failed /app/example.c:16:17: missed: bad data dependence. /app/example.c:16:17: note: ***** Analysis failed with vector mode VNx4SF This doesn't seem to happen with just 2 reductions, but with 3 dependency analysis seems to fail. I don't know much about openmp but my understanding is that this pragma is intended for architectures that don't have masking support and works by splitting the loop and removing the reductions from the main loop creating openmp "workers" whom each work on one thread. the reduction values are turned into local arrays and these threads then write into their own slots into these arrays. The reduction itself is then done as a final post step. It looks like the only thing we can vectorize is the post step. I wonder, since the compiler is the one introducing these local arrays, can we not mark them safe from inter dependencies?