Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

Hongtao Liu Sat, 23 Aug 2025 06:52:12 -0700

On Fri, Aug 22, 2025 at 11:26 PM Andi Kleen <a...@linux.intel.com> wrote:
>
> > > +  else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2]))
> > I don't think we need AVX512F here, and let's exclude >>7 cases here,
> > so better be.
> > else if (TARGET_GFNI
> >             && CONST_INT_P (operands[2])
> >             /* It's just vpcmpgtb against 0.  */
> >             && !(INTVAL (operands[2]) == 7 && <CODE> == ASHIFTRT))
>
> With current gcc 7 is not special cased and generates the same code as the 
> others.
> So I didn't exclude it.
I didn't quite follow the meaning of **7 is not special cased in current GCC**.


cat test.c

typedef char v16qi __attribute__((vector_size(16)));

v16qi
foo (v16qi a)
{
  return a >> 7;
}

with -march=x86-64-v4 -gfni -O2

originally, it generated
        vmovdqa %xmm0, %xmm1
        vpxor   %xmm0, %xmm0, %xmm0
        vpcmpgtb        %xmm1, %xmm0, %xmm0

with your patch, it generates

        movl    $-2139062144, %eax
        vmovd   %eax, %xmm1
        vpbroadcastd    %xmm1, %xmm1
        vgf2p8affineqb  $0, %xmm1, %xmm0, %xmm0


After adding below coode to the condition.

  && (<MODE>_SZIE == 64
         || !(INTVAL (operands[2]) == 7 && <CODE> == ASHIFTRT)))

it generates with original code

        vmovdqa %xmm0, %xmm1
        vpxor   %xmm0, %xmm0, %xmm0
        vpcmpgtb        %xmm1, %xmm0, %xmm0

I think for a 512-bit vector, vgf2p8affineqb is better than the
original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be
better than vgf2p8affineqb?

original codegen for 512-bit >> 7 is like below

        vpmovb2m        %zmm0, %k0
        vpmovm2b        %k0, %zmm0


>
> -Andi



-- 
BR,
Hongtao

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

Reply via email to