> -----Original Message-----
> From: Tamar Christina <tamar.christ...@arm.com>
> Sent: Wednesday, September 29, 2021 5:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <n...@arm.com>; Richard Earnshaw <richard.earns...@arm.com>;
> Marcus Shawcroft <marcus.shawcr...@arm.com>; Kyrylo Tkachov
> <kyrylo.tkac...@arm.com>; Richard Sandiford
> <richard.sandif...@arm.com>
> Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> 
> Hi All,
> 
> This optimizes signed right shift by BITSIZE-1 into a cmlt operation which is
> more optimal because generally compares have a higher throughput than
> shifts.
> 
> On AArch64 the result of the shift would have been either -1 or 0 which is the
> results of the compare.
> 
> i.e.
> 
> void e (int * restrict a, int *b, int n)
> {
>     for (int i = 0; i < n; i++)
>       b[i] = a[i] >> 31;
> }
> 
> now generates:
> 
> .L4:
>         ldr     q0, [x0, x3]
>         cmlt    v0.4s, v0.4s, #0
>         str     q0, [x1, x3]
>         add     x3, x3, 16
>         cmp     x4, x3
>         bne     .L4
> 
> instead of:
> 
> .L4:
>         ldr     q0, [x0, x3]
>         sshr    v0.4s, v0.4s, 31
>         str     q0, [x1, x3]
>         add     x3, x3, 16
>         cmp     x4, x3
>         bne     .L4
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

This should be okay (either a win or neutral) for Arm Cortex and Neoverse cores 
so I'm tempted to not ask for a CPU-specific tunable to guard it to keep the 
code clean.
Andrew, would this change be okay from a Thunder X line perspective?
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>):
> Add case cmp
>       case.
>       * config/aarch64/constraints.md (D1): New.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/shl-combine-2.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> 7ded861228ee22007 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr<mode>"
>  )
> 
>  (define_insn "aarch64_simd_ashr<mode>"
> - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> -       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> -                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> "Dr")))]
> + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> +       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w,w")
> +                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> "D1,Dr")))]
>   "TARGET_SIMD"
> - "sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> -  [(set_attr "type" "neon_shift_imm<q>")]
> + "@
> +  cmlt\t%0.<Vtype>, %1.<Vtype>, #0
> +  sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> +  [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
>  )
> 
>  (define_insn "*aarch64_simd_sra<mode>"
> diff --git a/gcc/config/aarch64/constraints.md
> b/gcc/config/aarch64/constraints.md
> index
> 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> 9db69fd428dfb0d6 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -437,6 +437,14 @@ (define_constraint "Dl"
>        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
>                                                true)")))
> 
> +(define_constraint "D1"
> +  "@internal
> + A constraint that matches vector of immediates that is bits(mode)-1."
> + (and (match_code "const,const_vector")
> +      (match_test "aarch64_const_vec_all_same_in_range_p (op,
> +                     GET_MODE_UNIT_BITSIZE (mode) - 1,
> +                     GET_MODE_UNIT_BITSIZE (mode) - 1)")))
> +
>  (define_constraint "Dr"
>    "@internal
>   A constraint that matches vector of immediates for right shifts."
> diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e
> 57f1034f7ca2c798
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
> +
> +void e (int * restrict a, int *b, int n)
> +{
> +    for (int i = 0; i < n; i++)
> +      b[i] = a[i] >> 31;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */
> +/* { dg-final { scan-assembler-not {\tsshr\t} } } */
> +
> 
> 
> --

Reply via email to