------- Comment #3 from rguenth at gcc dot gnu dot org 2009-07-04 12:36 ------- Tuned for Core2 I get for the innermost loop
.L19: leal (%eax,%ebx), %edx movsd (%eax,%ecx), %xmm1 movsd (%edx), %xmm7 movhpd 8(%eax,%ecx), %xmm1 movhpd 8(%edx), %xmm7 movapd %xmm1, %xmm0 incl %esi mulpd %xmm3, %xmm0 addl $16, %eax addpd %xmm7, %xmm0 cmpl %edi, %esi movlpd %xmm0, (%edx) movhpd %xmm0, 8(%edx) jb .L19 which is slower than with vectorization disabled (which is what happened before the patch?). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648