On Fri, Aug 22, 2025 at 11:26 PM Andi Kleen <a...@linux.intel.com> wrote: > > > > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2])) > > I don't think we need AVX512F here, and let's exclude >>7 cases here, > > so better be. > > else if (TARGET_GFNI > > && CONST_INT_P (operands[2]) > > /* It's just vpcmpgtb against 0. */ > > && !(INTVAL (operands[2]) == 7 && <CODE> == ASHIFTRT)) > > With current gcc 7 is not special cased and generates the same code as the > others. > So I didn't exclude it. I didn't quite follow the meaning of **7 is not special cased in current GCC**.
cat test.c typedef char v16qi __attribute__((vector_size(16))); v16qi foo (v16qi a) { return a >> 7; } with -march=x86-64-v4 -gfni -O2 originally, it generated vmovdqa %xmm0, %xmm1 vpxor %xmm0, %xmm0, %xmm0 vpcmpgtb %xmm1, %xmm0, %xmm0 with your patch, it generates movl $-2139062144, %eax vmovd %eax, %xmm1 vpbroadcastd %xmm1, %xmm1 vgf2p8affineqb $0, %xmm1, %xmm0, %xmm0 After adding below coode to the condition. && (<MODE>_SZIE == 64 || !(INTVAL (operands[2]) == 7 && <CODE> == ASHIFTRT))) it generates with original code vmovdqa %xmm0, %xmm1 vpxor %xmm0, %xmm0, %xmm0 vpcmpgtb %xmm1, %xmm0, %xmm0 I think for a 512-bit vector, vgf2p8affineqb is better than the original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be better than vgf2p8affineqb? original codegen for 512-bit >> 7 is like below vpmovb2m %zmm0, %k0 vpmovm2b %k0, %zmm0 > > -Andi -- BR, Hongtao