http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593

             Bug #: 54593
           Summary: [missed-optimization] Move from SSE to integer
                    register goes through the stack without -march=native
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: sgunder...@bigfoot.com


Hi,

I have reproduced this on 4.4, 4.6, 4.7 and 4.8 (Debian 20120820-1, trunk
version 190537). Given the following code:

  #include <x86intrin.h>

  int test1(__m128i v) {
     return _mm_cvtsi128_si32(v);
  }

GCC generates

   0:    66 0f 7e 44 24 f4        movd   %xmm0,-0xc(%rsp)
   6:    8b 44 24 f4              mov    -0xc(%rsp),%eax
   a:    c3                       retq   

Shouldn't it go directly to %eax instead of through the stack? Granted, on
Netburst this takes ten cycles or so, but this is x86-64. It appears to be some
sort of tuning issue, since if I use -mtune=native (I am on an Atom) I get:

   0:    66 0f 7e c0              movd   %xmm0,%eax
   4:    90                       nop
   5:    90                       nop
   6:    90                       nop
   7:    90                       nop
   8:    90                       nop
   9:    90                       nop
   a:    c3                       retq   

which is sort-of what I expect. Well, the NOPs are a bit weird, but... :-)

Reply via email to