On Wed, Aug 11, 2021 at 02:43:06PM +0800, liuhongt wrote:
>   Add define_insn_and_split to combine avx_vec_concatv16si/2 and
> avx512f_zero_extendv16hiv16si2_1 since the latter already zero_extend
> the upper bits, similar for other patterns which are related to
> pmovzx{bw,wd,dq}.
> 
> It will do optimization like
> 
> -       vmovdqa %ymm0, %ymm0    # 7     [c=4 l=6]  avx_vec_concatv16si/2
>         vpmovzxwd       %ymm0, %zmm0    # 22    [c=4 l=6]  
> avx512f_zero_extendv16hiv16si2
>         ret             # 25    [c=0 l=1]  simple_return_internal
> 
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
> 
> gcc/ChangeLog:
> 
>       PR target/101846
>       * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_2): New
>       post_reload define_insn_and_split.

The ChangeLog doesn't mention the newly added mode iterators, perhaps it
should.

>       (*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
>       (*sse4_1_zero_extendv8qiv8hi2_4): Ditto.
>       (*avx512f_zero_extendv16hiv16si2_2): Ditto.
>       (*avx2_zero_extendv8hiv8si2_2): Ditto.
>       (*sse4_1_zero_extendv4hiv4si2_4): Ditto.
>       (*avx512f_zero_extendv8siv8di2_2): Ditto.
>       (*avx2_zero_extendv4siv4di2_2): Ditto.
>       (*sse4_1_zero_extendv2siv2di2_4): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>       PR target/101846
>       * gcc.target/i386/pr101846-1.c: New test.
> ---
>  gcc/config/i386/sse.md                     | 220 +++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr101846-1.c |  95 +++++++++
>  2 files changed, 315 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101846-1.c
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index a46a2373547..6450c058458 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -673,8 +673,14 @@ (define_mode_iterator VI12_128 [V16QI V8HI])
>  (define_mode_iterator VI14_128 [V16QI V4SI])
>  (define_mode_iterator VI124_128 [V16QI V8HI V4SI])
>  (define_mode_iterator VI24_128 [V8HI V4SI])
> +(define_mode_iterator VI128_128 [V16QI V8HI V2DI])

And this mode iterator isn't used anywhere in the patch it seems.

Otherwise LGTM, although it fixes just particular, though perhaps very
important, cases, for detecting generally that some operations on
a vector aren't needed because following permutation that uses it never
reads those elements is something that would need to be done on gimple.

Would it be possible to handle also the similar pmovzx{bd,wq,bq} cases?

        Jakub

Reply via email to