The following example generates wrong code when compiled with -O2 on amd64. It 
doesn't seem to happen when compiling in 32bit mode though. 
 
#include <mmintrin.h> 
#include <assert.h> 
 
typedef unsigned int CARD32; 
 
static void 
mmxCombineAddU (CARD32 *dest, const CARD32 *src, int width) 
{ 
    const __m64 mmx_0 = _mm_setzero_si64(); 
 
    const CARD32 *end = dest + width; 
    while (dest < end) { 
        __m64 s, d; 
        s = _mm_unpacklo_pi8 (_mm_cvtsi32_si64(*src), mmx_0); 
        d = _mm_unpacklo_pi8 (_mm_cvtsi32_si64(*dest), mmx_0); 
        s = _mm_add_pi16(s, d); 
        *dest = (CARD32)_mm_cvtsi64_si32(_mm_packs_pu16(s, mmx_0)); 
        ++dest; 
        ++src; 
    } 
    _mm_empty(); 
} 
 
int main() 
{ 
    CARD32 a = 0xffffffff; 
    CARD32 b = 0x10101010; 
 
    mmxCombineAddU(&a,  &b, 1); 
    assert(a == 0xffffffff); 
    return 0; 
} 
 
The compiled assembler for the mmx instructions looks like this: 
 
  400555:       0f 6e 00                movd   (%rax),%mm0 
  400558:       0f 60 ca                punpcklbw %mm2,%mm1 
  40055b:       0f 60 c2                punpcklbw %mm2,%mm0 
  40055e:       0f fc c1                paddb  %mm1,%mm0 
  400561:       0f 67 c3                packuswb %mm3,%mm0 
  400564:       0f 7e 00                movd   %mm0,(%rax) 
 
It should use paddw instead of paddb here. 
 
best regards, 
Lars

-- 
           Summary: Wrong code generation using MMX intrinsics on amd64
           Product: gcc
           Version: 4.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: lars at trolltech dot com
                CC: gcc-bugs at gcc dot gnu dot org
  GCC host triplet: linux-amd64


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22432

Reply via email to