https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> So what seems to be happening is PRE is pull out the following from the loop:
> 
>   pretmp_250 = MEM[(float *)_2 + 4294933760B + ivtmp.159_57 * 1];
>   _22 = (void *) ivtmp.140_79;
>   pretmp_253 = MEM[(float *)_22 + 4294934276B];
>   pretmp_257 = MEM[(float *)_22 + 4294900220B];
>   pretmp_259 = MEM[(float *)_22 + 4294933244B];
>   pretmp_261 = MEM[(float *)_22 + 4294933760B];

I don't see any of that for the original testcase, in fact the original
reported issue that -O2/-O3 -fno-vectorize are slower than -O/-Os
-fno-vectorize is no longer present.

vectorizing also provides a nice speedup for me

Reply via email to