https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2025-01-07
Ever confirmed|0 |1
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and yes - we fail to "thread" the inner (what you identify as outer) loop
j == 0 check, so we fail to realize the inner loop body rolls only once. We're
doing this later. Possibly loop header copying could realize this - we had
improvements to catch these kind of cases there, but possibly number of
iteration analysis needs to be improved here.
We also refuse to loop-header copy this because there's a pow() call in the
block.
The thread2 pass after loop the finally elides one of the loops, but the
j == 0 check remains and is only elided by threadfull2 which has all loops
removed. We do apply SLP vectorization with -march=znver3 so I wonder
what you think we are missing (apart from the confusing -fopt-info-missed
messages)?