https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2014-05-28 Blocks| |53947 Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. We fail to detect that all DRs are accessed "reverse" which is the case where we can drop the permutes. We also fail to reverse the positive vectors if they happen to be lower in number: float x[1024]; float y[1024]; float z[1024]; void foo() { for (int i=0; i<512; ++i) x[i] += y[1023-i]*z[512-i]; } produces .L2: vpermd (%rdx), %ymm1, %ymm0 subq $32, %rdx vpermd (%rcx), %ymm1, %ymm2 addq $32, %rax vfmadd213ps -32(%rax), %ymm2, %ymm0 subq $32, %rcx vmovaps %ymm0, -32(%rax) cmpq $z-28, %rdx jne .L2 instead of permuting the result before storing it.