http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53844

Edward Rosten <ed at edrosten dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |middle-end

--- Comment #8 from Edward Rosten <ed at edrosten dot com> 2012-07-04 13:36:28 
UTC ---
(In reply to comment #7)
> Fixed on trunk sofar, watching for fallout.

I pulled the latest change from SVN and tried it on the test code, with
success.

I'm using the shortened test function:


void test(const Vector<>& in, Vector<>& out, int i)
{
    out = in*1*1*1*1;
}

If I change the test function to:


void test(const Vector<>& in, Vector<>& out, int i)
{
    const Vector<ScalarMulExpr<ScalarMulExpr<ScalarMulExpr<ScalarMulExpr<VBase>
> > > >& v = in*1*1*1*1;
    out = v;
}

The the results go to being almost identical to gcc 4.7 (and much worse than
the first test function). The asm code (compiled with -fno-tree-vectorize to
avoid all the asm code to deal with alignment etc) gives:

_Z4testRK6VectorI5VBaseERS1_i:
.LFB8:
    .cfi_startproc
    movq    (%rsi), %rcx
    xorl    %esi, %esi
    movq    -16(%rsp), %rax
    cvtsi2sd    %esi, %xmm1
    movq    8(%rax), %rdx
    movq    (%rax), %rax
    cvtsi2sd    (%rax), %xmm3
    movq    -24(%rsp), %rax
    movq    (%rdx), %rdx
    cvtsi2sd    (%rax), %xmm2
    xorl    %eax, %eax
    .p2align 4,,10
    .p2align 3
.L3:
    movsd    (%rdx,%rax), %xmm0
    mulsd    %xmm3, %xmm0
    mulsd    %xmm2, %xmm0
    mulsd    %xmm1, %xmm0
    movsd    %xmm0, (%rcx,%rax)
    addq    $8, %rax
    cmpq    $800, %rax
    jne    .L3
    rep; ret
    .cfi_endproc


In this case, it's clearly converting all those 1's to floats and then
multiplying by all but the first one. Note that if the following test function
is used:

void test(const Vector<>& in, Vector<>& out, int i)
{
    const Vector<ScalarMulExpr<ScalarMulExpr<VBase> > >& v = in*1*1;
    out = v;
}

Then suboptimal code isn't produced. Further investigation shows that this also
applies to the previous symptom with unnecessary pushes in gcc 4.7. 

If I change the mul member in ScalarMulExpr to int rather than int&, the
compiler can optimize away the first two multiplications, rather than the first
one.

GCC version is:

gcc-4.8-svn -v
Using built-in specs.
COLLECT_GCC=gcc-4.8-svn
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --program-suffix=-4.8-svn
--enable-languages=c,c++
Thread model: posix
gcc version 4.8.0 20120704 (experimental) (GCC) 

but similar results are reported on 4.7 as well.


Is this a continuation of the same bug, or should I refile this as a new bug?

Reply via email to