The following example generates wrong code when compiled with -O2 on amd64. It doesn't seem to happen when compiling in 32bit mode though. #include <mmintrin.h> #include <assert.h> typedef unsigned int CARD32; static void mmxCombineAddU (CARD32 *dest, const CARD32 *src, int width) { const __m64 mmx_0 = _mm_setzero_si64(); const CARD32 *end = dest + width; while (dest < end) { __m64 s, d; s = _mm_unpacklo_pi8 (_mm_cvtsi32_si64(*src), mmx_0); d = _mm_unpacklo_pi8 (_mm_cvtsi32_si64(*dest), mmx_0); s = _mm_add_pi16(s, d); *dest = (CARD32)_mm_cvtsi64_si32(_mm_packs_pu16(s, mmx_0)); ++dest; ++src; } _mm_empty(); } int main() { CARD32 a = 0xffffffff; CARD32 b = 0x10101010; mmxCombineAddU(&a, &b, 1); assert(a == 0xffffffff); return 0; } The compiled assembler for the mmx instructions looks like this: 400555: 0f 6e 00 movd (%rax),%mm0 400558: 0f 60 ca punpcklbw %mm2,%mm1 40055b: 0f 60 c2 punpcklbw %mm2,%mm0 40055e: 0f fc c1 paddb %mm1,%mm0 400561: 0f 67 c3 packuswb %mm3,%mm0 400564: 0f 7e 00 movd %mm0,(%rax) It should use paddw instead of paddb here. best regards, Lars
-- Summary: Wrong code generation using MMX intrinsics on amd64 Product: gcc Version: 4.0.1 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lars at trolltech dot com CC: gcc-bugs at gcc dot gnu dot org GCC host triplet: linux-amd64 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22432