On Fri, May 10, 2024 at 3:41 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Many thanks for the speedy review and correction/improvement. > It's interesting that you spotted the ternlog "spill"... > I have a patch that rewrites ternlog handling that's been > waiting for stage 1, that would also fix this mem operand > issue. I hope to submit it for review this weekend. I opened a PR for that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 > > Thanks again, > Roger > > > From: Hongtao Liu <crazy...@gmail.com> > > On Fri, May 10, 2024 at 6:26 AM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > > > > The following one line patch improves the code generated for V8QI and > > > V4QI shifts when AV512BW and AVX512VL functionality is available. > > + /* With AVX512 its cheaper to do vpmovsxbw/op/vpmovwb. */ > > + && !(TARGET_AVX512BW && TARGET_AVX512VL && TARGET_SSE4_1) > > && ix86_expand_vec_shift_qihi_constant (code, qdest, qop1, qop2)) I > > think > > TARGET_SSE4_1 is enough, it's always better w/ sse4.1 and above when not > > going > > into ix86_expand_vec_shift_qihi_constant. > > Others LGTM. > > > > > > For the testcase (from gcc.target/i386/vect-shiftv8qi.c): > > > > > > typedef signed char v8qi __attribute__ ((__vector_size__ (8))); v8qi > > > foo (v8qi x) { return x >> 5; } > > > > > > GCC with -O2 -march=cascadelake currently generates: > > > > > > foo: movl $67372036, %eax > > > vpsraw $5, %xmm0, %xmm2 > > > vpbroadcastd %eax, %xmm1 > > > movl $117901063, %eax > > > vpbroadcastd %eax, %xmm3 > > > vmovdqa %xmm1, %xmm0 > > > vmovdqa %xmm3, -24(%rsp) > > > vpternlogd $120, -24(%rsp), %xmm2, %xmm0 > > It looks like a miss-optimization under AVX512, but it's a separate issue. > > > vpsubb %xmm1, %xmm0, %xmm0 > > > ret > > > > > > with this patch we now generate the much improved: > > > > > > foo: vpmovsxbw %xmm0, %xmm0 > > > vpsraw $5, %xmm0, %xmm0 > > > vpmovwb %xmm0, %xmm0 > > > ret > > > > > > This patch also fixes the FAILs of gcc.target/i386/vect-shiftv[48]qi.c > > > when run with the additional -march=cascadelake flag, by splitting > > > these tests into two; one form testing code generation with -msse2 > > > (and > > > -mno-avx512vl) as originally intended, and the other testing AVX512 > > > code generation with an explicit -march=cascadelake. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=unix{-m32} > > > with no new failures. Ok for mainline? > > > > > > > > > 2024-05-09 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): > > > Don't attempt ix86_expand_vec_shift_qihi_constant on AVX512. > > > > > > gcc/testsuite/ChangeLog > > > * gcc.target/i386/vect-shiftv4qi.c: Specify -mno-avx512vl. > > > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > > > * gcc.target/i386/vect-shiftv4qi-2.c: New test case. > > > * gcc.target/i386/vect-shiftv8qi-2.c: Likewise. > > > > > > > > > Thanks in advance, > > > Roger > > > -- > > > > > -- > > BR, > > Hongtao >
-- BR, Hongtao