On 19/03/2026 08:25, Richard Biener wrote:
That said - the *scatter* optabs assume naive vectorization of the
first loop works, even when b[] = { 0, 1, 2, 0 }, so if GCN is not able
to guarantee this their "vector address store" are not scatters in
terms of what GCC assumes.  The documentation for the optabs
does not mention this constraint.

The primary use-case for the GCN port is OpenMP/OpenACC in which loop iterations are considered to be "independent" and therefore all such considerations can be ignored. Not only is vectorization in play, but also two levels of threading, so there is absolutely no guarantee what order operations happen. If the user writes code that is not, in fact, "independent" then that's on them.

There have indeed been a few occasions where GCC has refused to optimize because it would not preserve "correctness" even though all hope of that correctness have already gone.

We "fixed" the floating-point reduction case by implementing "fold_left" optabs that actually do not strictly fold left, albeit only when -fopenmp is active. Consequently, the result of floating-point vector reductions is stable, but it's not the same stable you'd get from the unvectorized loop. (The result of the outer OpenMP reduction loop, as a whole, is unstable, because the threads complete out of order.) Basically -fopenmp implies -fassociative-math, in this case.

If necessary, we'd do the same thing for scatter_store.

Andrew

Reply via email to