https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123603
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot
gnu.org
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2026-01-15
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The following is the opt-info difference w/o PGO and LTO where I cannot
reproduce a difference in runtime.
+exchange2.fppized.f90:1003:32: optimized: basic block part vectorized using 8
byte vectors
+exchange2.fppized.f90:1003:32: optimized: basic block part vectorized using 8
byte vectors
+exchange2.fppized.f90:1003:32: optimized: basic block part vectorized using 8
byte vectors
+exchange2.fppized.f90:1003:32: optimized: basic block part vectorized using 8
byte vectors
the location looks like a fallback (function-scope) one.
I can't reproduce with -march=x86-64-v3 {-O2, -O3, -O3 -flto}, so it seems
PGO is required :/
The opt-info difference for -O2 -march=x86-64-v3 -flto with PGO (training with
train set) is
-exchange2.fppized.f90:1207:71: optimized: loop vectorized using 8 byte vectors
and unroll factor 1
+exchange2.fppized.f90:1207:71: optimized: loop vectorized using 16 byte
vectors and unroll factor 2
+exchange2.fppized.f90:1207:71: optimized: epilogue loop vectorized using 8
byte vectors and unroll factor 1
...
this is in the notoriously critical digits_2 subroutine. Let me see
if there's some obviously wrong now (I fear not).