[Bug rtl-optimization/48064] New: Optimizer produces suboptimal code for e.g. x = x ^ (x >> 1)

jasper.neumann at web dot de Thu, 10 Mar 2011 11:30:57 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48064


           Summary: Optimizer produces suboptimal code for e.g. x = x ^ (x
                    >> 1)
           Product: gcc
           Version: 4.5.2
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: jasper.neum...@web.de
            Target: windows 32


When I compile the following OPT.CPP with gcc 4.5.2 (mingw) under Windows-32...
===
int test(int x)
{
   x = x ^ (x >> 1);

   int x1=x;
   x = x >> 2;
   x = x ^ x1;

   return x;
}
===

...a call to
gpp -O3 -S OPT.CPP
produces this OPT.s:
===
         .file   "OPT.CPP"
         .text
         .p2align 2,,3
.globl __Z4testi
         .def    __Z4testi;      .scl    2;      .type   32;     .endef
__Z4testi:
LFB0:
         pushl   %ebp
LCFI0:
         movl    %esp, %ebp
LCFI1:
         movl    8(%ebp), %eax
         movl    %eax, %edx
         sarl    %edx
         xorl    %eax, %edx
         movl    %edx, %eax
         sarl    $2, %eax
         xorl    %edx, %eax
         leave
LCFI2:
         ret
LFE0:
===

The problem I see is that in
         movl    %eax, %edx
         sarl    %edx
         xorl    %eax, %edx

         movl    %edx, %eax
         sarl    $2, %eax
         xorl    %edx, %eax
gcc produces code which presumably costs 6 cycles
(edx and then eax is modified 3 times in a row)
whereas the equivalent statements
         movl    %eax, %edx
         sarl    %eax
         xorl    %eax, %edx

         movl    %edx, %eax
         sarl    $2, %edx
         xorl    %edx, %eax
cost only 4 cycles since the mov and the shift can go in parallel.
I would have expected this at least for explicit form in
   int x1=x;
   x = x >> 2;
   x = x ^ x1;
I found no way to get gcc to output my version.

A speed test reveals that the proposed form only costs about
2/3 of the time on Intel Atom N450 and 3/4 of the time on Intel i7.

Have I missed something?


By the way: If I produce an output in Intel syntax
the statement "sar eax" should be "sar eax,1".
Otherwise some assemblers will complain.

[Bug rtl-optimization/48064] New: Optimizer produces suboptimal code for e.g. x = x ^ (x >> 1)

Reply via email to