On Fri, May 10, 2024 at 6:26 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > The following one line patch improves the code generated for V8QI and V4QI > shifts when AV512BW and AVX512VL functionality is available. + /* With AVX512 its cheaper to do vpmovsxbw/op/vpmovwb. */ + && !(TARGET_AVX512BW && TARGET_AVX512VL && TARGET_SSE4_1) && ix86_expand_vec_shift_qihi_constant (code, qdest, qop1, qop2)) I think TARGET_SSE4_1 is enough, it's always better w/ sse4.1 and above when not going into ix86_expand_vec_shift_qihi_constant. Others LGTM. > > For the testcase (from gcc.target/i386/vect-shiftv8qi.c): > > typedef signed char v8qi __attribute__ ((__vector_size__ (8))); > v8qi foo (v8qi x) { return x >> 5; } > > GCC with -O2 -march=cascadelake currently generates: > > foo: movl $67372036, %eax > vpsraw $5, %xmm0, %xmm2 > vpbroadcastd %eax, %xmm1 > movl $117901063, %eax > vpbroadcastd %eax, %xmm3 > vmovdqa %xmm1, %xmm0 > vmovdqa %xmm3, -24(%rsp) > vpternlogd $120, -24(%rsp), %xmm2, %xmm0 It looks like a miss-optimization under AVX512, but it's a separate issue. > vpsubb %xmm1, %xmm0, %xmm0 > ret > > with this patch we now generate the much improved: > > foo: vpmovsxbw %xmm0, %xmm0 > vpsraw $5, %xmm0, %xmm0 > vpmovwb %xmm0, %xmm0 > ret > > This patch also fixes the FAILs of gcc.target/i386/vect-shiftv[48]qi.c > when run with the additional -march=cascadelake flag, by splitting these > tests into two; one form testing code generation with -msse2 (and > -mno-avx512vl) as originally intended, and the other testing AVX512 > code generation with an explicit -march=cascadelake. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-05-09 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): > Don't attempt ix86_expand_vec_shift_qihi_constant on AVX512. > > gcc/testsuite/ChangeLog > * gcc.target/i386/vect-shiftv4qi.c: Specify -mno-avx512vl. > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > * gcc.target/i386/vect-shiftv4qi-2.c: New test case. > * gcc.target/i386/vect-shiftv8qi-2.c: Likewise. > > > Thanks in advance, > Roger > -- >
-- BR, Hongtao