http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
            Summary|[4.6/4.7/4.8 Regression]    |[4.6/4.7/4.8
                   |Performance breakdown for   |Regression][IVOPTS]
                   |gcc-4.{6,7} vs. gcc-4.5     |Performance breakdown for
                   |using std::vector in matrix |gcc-4.{6,7} vs. gcc-4.5
                   |vector multiplication       |using std::vector in matrix
                   |                            |vector multiplication
      Known to fail|                            |4.7.1, 4.8.0

--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-07 
10:00:27 UTC ---
Thanks for the reduced testcase.  The innermost loops compare as follows:

4.5:

.L7:
        movsd   (%rbx,%rcx), %xmm0
        addq    $8, %rcx
        mulsd   0(%rbp,%rdx), %xmm0
        addq    $8, %rdx
        cmpq    $24, %rdx
        addsd   %xmm0, %xmm1
        movsd   %xmm1, (%rsi)
        jne     .L7

4.7:

.L13:
        movq    64(%rsp), %rdi
        movq    80(%rsp), %rdx
        addq    %rcx, %rdi
        addq    %r8, %rdx
        movsd   -8(%rax,%rdi), %xmm0
        mulsd   (%rsi,%rax), %xmm0
        addq    $8, %rax
        cmpq    $24, %rax
        addsd   (%rdx), %xmm0
        movsd   %xmm0, (%rdx)
        jne     .L13

so we seem to have a register allocation / spilling issue here as well
as a bad induction variable choice.  GCC 4.8 is not any better here.

Reply via email to