http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53844
Edward Rosten <ed at edrosten dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |middle-end --- Comment #8 from Edward Rosten <ed at edrosten dot com> 2012-07-04 13:36:28 UTC --- (In reply to comment #7) > Fixed on trunk sofar, watching for fallout. I pulled the latest change from SVN and tried it on the test code, with success. I'm using the shortened test function: void test(const Vector<>& in, Vector<>& out, int i) { out = in*1*1*1*1; } If I change the test function to: void test(const Vector<>& in, Vector<>& out, int i) { const Vector<ScalarMulExpr<ScalarMulExpr<ScalarMulExpr<ScalarMulExpr<VBase> > > > >& v = in*1*1*1*1; out = v; } The the results go to being almost identical to gcc 4.7 (and much worse than the first test function). The asm code (compiled with -fno-tree-vectorize to avoid all the asm code to deal with alignment etc) gives: _Z4testRK6VectorI5VBaseERS1_i: .LFB8: .cfi_startproc movq (%rsi), %rcx xorl %esi, %esi movq -16(%rsp), %rax cvtsi2sd %esi, %xmm1 movq 8(%rax), %rdx movq (%rax), %rax cvtsi2sd (%rax), %xmm3 movq -24(%rsp), %rax movq (%rdx), %rdx cvtsi2sd (%rax), %xmm2 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: movsd (%rdx,%rax), %xmm0 mulsd %xmm3, %xmm0 mulsd %xmm2, %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, (%rcx,%rax) addq $8, %rax cmpq $800, %rax jne .L3 rep; ret .cfi_endproc In this case, it's clearly converting all those 1's to floats and then multiplying by all but the first one. Note that if the following test function is used: void test(const Vector<>& in, Vector<>& out, int i) { const Vector<ScalarMulExpr<ScalarMulExpr<VBase> > >& v = in*1*1; out = v; } Then suboptimal code isn't produced. Further investigation shows that this also applies to the previous symptom with unnecessary pushes in gcc 4.7. If I change the mul member in ScalarMulExpr to int rather than int&, the compiler can optimize away the first two multiplications, rather than the first one. GCC version is: gcc-4.8-svn -v Using built-in specs. COLLECT_GCC=gcc-4.8-svn COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --program-suffix=-4.8-svn --enable-languages=c,c++ Thread model: posix gcc version 4.8.0 20120704 (experimental) (GCC) but similar results are reported on 4.7 as well. Is this a continuation of the same bug, or should I refile this as a new bug?