gcc seems allergic to movq in the context of mmx: % cat movq.c #include <inttypes.h> #include <mmintrin.h>
__m64 x; __m64 y; uint64_t foo(__m64 m) { return _mm_cvtm64_si64(_mm_add_pi32(x, y)); } % gcc -g -O3 -Wall -std=gnu99 -c -o movq.o movq.c % objdump -dr movq.o movq.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # 7 <foo+0x7> 3: R_X86_64_PC32 x+0xfffffffffffffffc 7: 48 89 44 24 f8 mov %rax,0xfffffffffffffff8(%rsp) c: 0f 6f 44 24 f8 movq 0xfffffffffffffff8(%rsp),%mm0 11: 0f fe 05 00 00 00 00 paddd 0(%rip),%mm0 # 18 <foo+0x18> 14: R_X86_64_PC32 y+0xfffffffffffffffc 18: 0f 7f 44 24 f8 movq %mm0,0xfffffffffffffff8(%rsp) 1d: 48 8b 44 24 f8 mov 0xfffffffffffffff8(%rsp),%rax 22: c3 retq the load of x should use "movq m64,mm". this is true in i386 targets as well. the transfer of %mm0 to %rax has the option of "movq %mm0,%rax" on x86_64, but should possibly be passed through memory depending on -mtune= settings: for intel core2 always use movq directly between the registers, no matter which direction. for AMD k8 family 15 always pass through mem for AMD k8 family 16+, for gpr->xmm/mmx pass through memory and for xmm/mmx -> gpr always use movd/movq direct between the registers. -dean p.s. gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/home/odo/gcc --disable-multilib --disable-biarch x86_64-unknown-linux-gnu --enable-languages=c Thread model: posix gcc version 4.3.0 20071128 (experimental) (GCC) -- Summary: mmx and movd/movq on x86_64 Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dean at arctic dot org GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34256