https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946

            Bug ID: 104946
           Summary: [12 regression] Suboptimal gimple foding for blendvpd
                    under sse4.1
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---

When working on PR104666, i found

cat test.c

typedef double __m128d __attribute__((__vector_size__(16), __may_alias__));
__m128d sse4_1_blendvpd (__m128d a, __m128d b, __m128d c)
__attribute__((__target__("sse4.1")));

__m128d
generic_blendvpd (__m128d a, __m128d b, __m128d c)
{
  return __builtin_ia32_blendvpd (a, b, c);
}

gcc -O2 -msse4.1 -mno-sse4.2

generic_blendvpd:
        movq    rax, xmm2
        movapd  xmm3, xmm0
        test    rax, rax
        jns     .L3
        movapd  xmm0, xmm1
.L3:
        pextrq  rax, xmm2, 1
        unpckhpd        xmm3, xmm3
        test    rax, rax
        jns     .L5
        unpckhpd        xmm1, xmm1
        movapd  xmm3, xmm1
.L5:
        unpcklpd        xmm0, xmm3
        ret

It's because it pcmpgtq is under sse4.2 w/o which vec_cmpv2di will be lower to
scalar operations and not combined back.

w/ sse4.2 gcc can generate optimal code.

generic_blendvpd:
        movapd  xmm3, xmm0
        movdqa  xmm0, xmm2
        blendvpd        xmm3, xmm1, xmm0
        movapd  xmm0, xmm3
        ret

Reply via email to