https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113090

            Bug ID: 113090
           Summary: Suboptimal vector permuation for 64-bit vector.
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

When working on PR113079, loop vectorizer try to reduc sum of v2si with
permuation, x86 backend generates

typedef int v2si __attribute__((vector_size(8)));

v2si
foo (v2si a, v2si b)
{
    return __builtin_shufflevector (a, b, 1, 2);
}

foo(int __vector(2), int __vector(2)):
        vpshufb xmm0, xmm0, XMMWORD PTR .LC0[rip]
        vpshufb xmm1, xmm1, XMMWORD PTR .LC1[rip]
        vpor    xmm0, xmm0, xmm1

But it can be better with

        .cfi_startproc
        punpcklqdq      %xmm1, %xmm0
        pshufd  $153, %xmm0, %xmm0
        ret

Reply via email to