https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- it seems to split the reduction, performing many 0.99 ** n in parallel which is stupid itself as those compute the same result ... I'd say the benchmark is stupid and with -ffast-math we could optimize it to pow (0.99, LEN_1D/2), aka const-fold the inner loop in final value replacement.