On Fri, May 10, 2024 at 6:26 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> The following one line patch improves the code generated for V8QI and V4QI
> shifts when AV512BW and AVX512VL functionality is available.
+      /* With AVX512 its cheaper to do vpmovsxbw/op/vpmovwb.  */
+      && !(TARGET_AVX512BW && TARGET_AVX512VL && TARGET_SSE4_1)
       && ix86_expand_vec_shift_qihi_constant (code, qdest, qop1, qop2))
I think TARGET_SSE4_1 is enough, it's always better w/ sse4.1 and
above when not going into ix86_expand_vec_shift_qihi_constant.
Others LGTM.
>
> For the testcase (from gcc.target/i386/vect-shiftv8qi.c):
>
> typedef signed char v8qi __attribute__ ((__vector_size__ (8)));
> v8qi foo (v8qi x) { return x >> 5; }
>
> GCC with -O2 -march=cascadelake currently generates:
>
> foo:    movl    $67372036, %eax
>         vpsraw  $5, %xmm0, %xmm2
>         vpbroadcastd    %eax, %xmm1
>         movl    $117901063, %eax
>         vpbroadcastd    %eax, %xmm3
>         vmovdqa %xmm1, %xmm0
>         vmovdqa %xmm3, -24(%rsp)
>         vpternlogd      $120, -24(%rsp), %xmm2, %xmm0
It looks like a miss-optimization under AVX512, but it's a separate issue.
>         vpsubb  %xmm1, %xmm0, %xmm0
>         ret
>
> with this patch we now generate the much improved:
>
> foo:    vpmovsxbw       %xmm0, %xmm0
>         vpsraw  $5, %xmm0, %xmm0
>         vpmovwb %xmm0, %xmm0
>         ret
>
> This patch also fixes the FAILs of gcc.target/i386/vect-shiftv[48]qi.c
> when run with the additional -march=cascadelake flag, by splitting these
> tests into two; one form testing code generation with -msse2 (and
> -mno-avx512vl) as originally intended, and the other testing AVX512
> code generation with an explicit -march=cascadelake.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-05-09  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
>         Don't attempt ix86_expand_vec_shift_qihi_constant on AVX512.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/vect-shiftv4qi.c: Specify -mno-avx512vl.
>         * gcc.target/i386/vect-shiftv8qi.c: Likewise.
>         * gcc.target/i386/vect-shiftv4qi-2.c: New test case.
>         * gcc.target/i386/vect-shiftv8qi-2.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>


-- 
BR,
Hongtao

Reply via email to