http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593
Bug #: 54593 Summary: [missed-optimization] Move from SSE to integer register goes through the stack without -march=native Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I have reproduced this on 4.4, 4.6, 4.7 and 4.8 (Debian 20120820-1, trunk version 190537). Given the following code: #include <x86intrin.h> int test1(__m128i v) { return _mm_cvtsi128_si32(v); } GCC generates 0: 66 0f 7e 44 24 f4 movd %xmm0,-0xc(%rsp) 6: 8b 44 24 f4 mov -0xc(%rsp),%eax a: c3 retq Shouldn't it go directly to %eax instead of through the stack? Granted, on Netburst this takes ten cycles or so, but this is x86-64. It appears to be some sort of tuning issue, since if I use -mtune=native (I am on an Atom) I get: 0: 66 0f 7e c0 movd %xmm0,%eax 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 90 nop 9: 90 nop a: c3 retq which is sort-of what I expect. Well, the NOPs are a bit weird, but... :-)