Yuliang Wang <yuliang.w...@arm.com> writes:
> +;; Unpredicated arithmetic right shift for division by power-of-2.
> +(define_expand "sdiv_pow2<mode>3"
> +  [(set (match_operand:SVE_I 0 "register_operand" "")
> +     (unspec:SVE_I
> +       [(match_dup 3)
> +        (unspec:SVE_I
> +          [(match_operand:SVE_I 1 "register_operand" "")
> +           (match_operand 2 "aarch64_simd_rshift_imm")]
> +         UNSPEC_ASRD)]
> +      UNSPEC_PRED_X))]
> +  "TARGET_SVE"
> +  {
> +    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
> +  }
> +)

Sorry for not noticing last time, but: define_expands shouldn't have
constraints, so it's better to drop the empty "" from the match_operands.

> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 
> 4ace224a8ff5ed4fafed10a69ef00ffb2d7d8c39..009b8f8db74c7a3bef996ceaba58123f6558221c
>  100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1446,6 +1446,10 @@ of bytes.
>  Target supports both signed and unsigned multiply-high-with-round-and-scale
>  operations on vectors of half-words.
>  
> +@item vect_sdiv_pow2_si
> +Target supports signed division by constant power-of-2 operations
> +on vectors of words.

"4 bytes" is more accurate than "words".  The problem with "word" is
that it depends on context; e.g. although the AArch64 ISA uses "word"
to mean 32 bits, its words are really 64 bits as far as GCC is concerned.
(It's also worth noting that "4 bytes" isn't necessarily 32 bits, since GCC
supports 16-bit and 32-bit bytes.)

All a biit pedantic, sorry.

> +/* { dg-final { scan-assembler-not {\tand\t%} } } */
> +

Stray newline at end of file.

> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> 414bf80003b9192806f79afed9393f9ef4750a7d..dedee87ced3d4cbed18fe4144282e5b4330a113d
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6192,6 +6192,14 @@ proc check_effective_target_vect_mulhrs_hi {} {
>                  && [check_effective_target_aarch64_sve2] }]
>  }
>  
> +# Return 1 if the target plus current options supports signed division
> +# by power-of-2 operations on vectors of half-words.

"words" rather than "half-words", although as above 4 bytes is more
accurate.

> +
> +proc check_effective_target_vect_sdiv_pow2_si {} {
> +    return [expr { [istarget aarch64*-*-*]
> +                && [check_effective_target_aarch64_sve] }]
> +}
> +
>  # Return 1 if the target plus current options supports a vector
>  # demotion (packing) of shorts (to chars) and ints (to shorts) 
>  # using modulo arithmetic, 0 otherwise.
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 
> 2f86f9e4fc7039add1b1d7b82574cb8262eb4ba4..f09e9d54701ebee0382742d20d4f5a0db84110de
>  100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2925,6 +2925,38 @@ vect_recog_divmod_pattern (stmt_vec_info stmt_vinfo, 
> tree *type_out)
>        /* Pattern detected.  */
>        vect_pattern_detected ("vect_recog_divmod_pattern", last_stmt);
>  
> +      *type_out = vectype;
> +
> +      /* Check if the target supports this internal function.  */
> +      internal_fn ifn = IFN_DIV_POW2;
> +      if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
> +     {
> +       tree shift = build_int_cst (itype, tree_log2 (oprnd1));
> +
> +       tree var_div = vect_recog_temp_ssa_var (itype, NULL);
> +       gimple *div_stmt = gimple_build_call_internal (ifn, 2, oprnd0, shift);
> +       gimple_call_set_lhs (div_stmt, var_div);
> +
> +       if (rhs_code == TRUNC_MOD_EXPR)
> +         {
> +           append_pattern_def_seq (stmt_vinfo, div_stmt);
> +           def_stmt
> +             = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
> +                                    LSHIFT_EXPR, var_div, shift);
> +           append_pattern_def_seq (stmt_vinfo, def_stmt);
> +           pattern_stmt
> +             = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
> +                                    MINUS_EXPR, oprnd0,
> +                                    gimple_assign_lhs (def_stmt));
> +         }
> +       else
> +         {
> +           pattern_stmt = div_stmt;
> +           gimple_set_location (pattern_stmt, gimple_location (last_stmt));
> +         }

Again, sorry for not noticing last time, but we should do this outside
the "else".  Although the TRUNC_MOD_EXPR is no longer being calculated as
a modulus operation, the final MINUS_EXPR is still giving the same result.

> +          return pattern_stmt;

The indentation of this line uses 8 spaces instead of tabs.

OK otherwise, thanks.

Richard

Reply via email to