https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338
--- Comment #1 from vincenzo Innocente <vincenzo.innocente at cern dot ch> --- if I write it "reverse" void foo2() { for (int i=511; i>=0; --i) x[1023-i] += y[1023-i]*z[512-i]; } its ok __Z4foo2v: LFB1: leaq 2048+_x(%rip), %rdx xorl %eax, %eax leaq 4+_z(%rip), %rsi leaq 2048+_y(%rip), %rcx .align 4,0x90 L6: vmovaps (%rdx,%rax), %ymm1 vmovups (%rsi,%rax), %ymm0 vfmadd132ps (%rcx,%rax), %ymm1, %ymm0 vmovaps %ymm0, (%rdx,%rax) addq $32, %rax cmpq $2048, %rax jne L6 vzeroupper ret