on GCC-trunk/Cygwin/Core2 I observe the following behaviour.

g++ -std=gnu++0x -O2 -m32 -march=native -msse -msse2 -msse3 -Wall
-Werror -Wno-unused -Wno-strict-aliasing -march=native
-fomit-frame-pointer -Wno-pmf-conversions -g main.cpp

-----------8<--------------

#include <x86intrin.h>

int test1(__m128i v) {

    return _mm_cvtsi128_si32(v);
}

-----------8<--------------

emits:

004012e0 <__Z5test1U8__vectorx>:
  4012e0:       83 ec 0c                sub    $0xc,%esp
  4012e3:       66 0f 7e c0             movd   %xmm0,%eax
  4012e7:       83 c4 0c                add    $0xc,%esp
  4012ea:       c3                      ret

which shows that the stack pointer is being updated
without any purpose.

GCC also happens to lose the consition codes,
as shown here:

  4011a0:       66 0f df 01             pandn  (%ecx),%xmm0
  4011a4:       39 d9                   cmp    %ebx,%ecx
  4011a6:       66 0f 7f 0c 24          movdqa %xmm1,(%esp)
  4011ab:       75 04                   jne    4011b1 
<__Z8popcountPKU8__vectorxjj+0x61>
  4011ad:       66 0f db c1             pand   %xmm1,%xmm0
  4011b1:       66 0f 6f 1d 90 28 40    movdqa 0x402890,%xmm3
  4011b8:       00
  4011b9:       66 0f 6f 15 a0 28 40    movdqa 0x4028a0,%xmm2
  4011c0:       00
  4011c1:       66 0f 6f f3             movdqa %xmm3,%xmm6
  4011c5:       66 0f 6f fb             movdqa %xmm3,%xmm7
  4011c9:       66 0f db f0             pand   %xmm0,%xmm6
  4011cd:       66 0f df f8             pandn  %xmm0,%xmm7
  4011d1:       66 0f 6f ca             movdqa %xmm2,%xmm1
  4011d5:       66 0f 6f c7             movdqa %xmm7,%xmm0
  4011d9:       66 0f 38 00 ce          pshufb %xmm6,%xmm1
  4011de:       66 0f 71 d0 04          psrlw  $0x4,%xmm0
  4011e3:       66 0f 6f f1             movdqa %xmm1,%xmm6
  4011e7:       66 0f 6f fa             movdqa %xmm2,%xmm7
  4011eb:       39 d9                   cmp    %ebx,%ecx
  4011ed:       66 0f 38 00 f8          pshufb %xmm0,%xmm7
  4011f2:       66 0f fc f7             paddb  %xmm7,%xmm6
  4011f6:       66 0f ef ff             pxor   %xmm7,%xmm7
  4011fa:       66 0f f6 f7             psadbw %xmm7,%xmm6
  4011fe:       0f 84 be 00 00 00       je     4012c2
<__Z8popcountPKU8__vectorxjj+0x172>

The second cmp is superfluous, as the SSE instructions in between
do not modify CC.

Are these known issues?

Best regards
Piotr Wyderski

Reply via email to