Hi Victor,

> -----Original Message-----
> From: Victor Do Nascimento <victor.donascime...@arm.com>
> Sent: Thursday, May 16, 2024 3:39 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <richard.sandif...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>; Victor Do Nascimento
> <vicdo...@e133397.arm.com>
> Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> 
> From: Victor Do Nascimento <vicdo...@e133397.arm.com>
> 
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
> 
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
> 
> This means that the following example fails autovectorization:
> 
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1) {
>     sum += data[i] * data[i];
>   }
>   return sum;
> }
> 
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
> 
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> 
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
> 
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
> 
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_<sur>dotvnx4sivnx8hi):
>       renamed to `<sur>dot_prod_twoway_vnx8hi'.
>       * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>       update icodes used in line with above rename.

Please split the target specific bits from the target agnostic parts.
I.e. this patch series should be split in two.

>       * optabs-tree.cc (optab_for_tree_code_1): Renamed
>       `optab_for_tree_code' and added new argument.
>       (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>       * optabs-tree.h (optab_for_tree_code_1): New.
>       * optabs.cc (expand_widen_pattern_expr): Expand support for
>       DOT_PROD_EXPR patterns.
>       * optabs.def (udot_prod_twoway_optab): New.
>       (sdot_prod_twoway_optab): Likewise.
>       * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>       support for misc optabs that use two modes.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
>  gcc/optabs-tree.h                             |  2 ++
>  gcc/optabs.cc                                 |  2 +-
>  gcc/optabs.def                                |  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 +++++++++++++++++++
>  gcc/tree-vect-patterns.cc                     |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>        icode = (e.type_suffix (0).float_p
>              ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>              : e.type_suffix (0).unsigned_p
> -            ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -            : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +            ? CODE_FOR_udot_prod_twoway_vnx8hi
> +            : CODE_FOR_sdot_prod_twoway_vnx8hi);
>      return e.use_unpred_insn (icode);
>    }
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-
> sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@ (define_insn
> "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
>  )
> 
>  ;; Two-way dot-product.
> -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> +(define_insn "<sur>dot_prod_twoway_vnx8hi"

This drops the @ modifier, did it have no callers?

>    [(set (match_operand:VNx4SI 0 "register_operand")
>       (plus:VNx4SI
>         (unspec:VNx4SI
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index b69a5bc3676..e3c5a618ea2 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>     cannot give complete results for multiplication or division) but probably
>     ought to be relied on more widely throughout the expander.  */
>  optab
> -optab_for_tree_code (enum tree_code code, const_tree type,
> -                  enum optab_subtype subtype)
> +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> +                    const_tree otype, enum optab_subtype subtype)
>  {

Since we're C++ now, why not just overload optab_for_tree_code.
Both functions are useful in their own right so it seems better to overload 
them.
By overloading it also means you don't have to change the existing call sites.

I'd also make the optab_for_tree_code with one type just call 
optab_for_tree_code
with the same type twice.

>    bool trapv;
>    switch (code)
> @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
> 
>      case DOT_PROD_EXPR:
>        {
> +     if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> +                   == TYPE_PRECISION (TREE_TYPE (otype))))
> +       {

Because then you can remove the check for otype here,
And store TREE_TYPE (type) and TREE_TYPE (otype) safely into a variable so
the if becomes
if (otype != type
    && TYPE_PRECISION ...
   && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (otype)

> +         if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> +           return udot_prod_twoway_optab;
> +         if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> +           return sdot_prod_twoway_optab;
> +       }

And you can then simplify this into

return (TYPE_UNSIGNED (otype) ? udot_prod_twoway_optab : sdot_prod_twoway_optab;

>       if (subtype == optab_vector_mixed_sign)
>         return usdot_prod_optab;
> 
> @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
>      }
>  }
> 
> +/* Return the optab used for computing the operation given by the tree code,
> +   CODE and the tree EXP.  This function is not always usable (for example, 
> it
> +   cannot give complete results for multiplication or division) but probably
> +   ought to be relied on more widely throughout the expander.  */
> +optab
> +optab_for_tree_code (enum tree_code code, const_tree type,
> +                  enum optab_subtype subtype)
> +{
> +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> +}
> +
>  /* Check whether an operation represented by CODE is a 'half' widening 
> operation
>     in which the input vector type has half the number of bits of the output
>     vector type e.g. V8QI->V8HI.
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index f2b49991462..13ff7ca2e4b 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -36,6 +36,8 @@ enum optab_subtype
>  /* Return the optab used for computing the given operation on the type given 
> by
>     the second argument.  The third argument distinguishes between the types 
> of
>     vector shifts and rotates.  */
> +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> __attribute__((unused)),
> +                         enum optab_subtype );
>  optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
>  bool
>  supportable_half_widening_operation (enum tree_code, tree, tree,
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index ce91f94ed43..3a1c6c7b90e 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> op1, rtx wide_op,
>       gcc_unreachable ();
> 
>        widen_pattern_optab
> -     = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +     = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
> (oprnd2), subtype);
>      }
>    else
>      widen_pattern_optab
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..cf1a6e7a7dc 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..cba2aadbec8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-tree-
> vect-details" { target { aarch64*-*-* } } } */

Does this work? -march=-O3? I am guessing you're getting lucky that the
later -march overrides it?

For tests in the vectorizer testsuite you shouldn't give optimization options 
as the tests
are ran with -fno-vect-cost-model.  I.e. we'll vectorize even if unprofitable.

You also shouldn't add the -fdump-tree-vect-details, the vectorizer testsuite 
adds it
automatically.

> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}

So you have an AArch64 only test and place it in generic folder.
You should either make it aarch64 only or make the test a generic test.

Normally for these I add a generic test that scans for the vectorizer scans as 
you've done below
And I then add an aach64 specific test with check-functions-bodies that checks 
that we actually
produced the right instructions.

Your test at the moment do not actually guarantee that the instruction gets 
generated.

Look at how the tests for 4 way dot product are written for inspiration here.

> +
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
> detected:" 2
> "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected"
> 4 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index dfb7d800526..0760f25d94d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree
> otype, tree_code code,
>    if (!vecotype)
>      return false;
> 
> -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);

If you overload the function then these changes go away.

Thanks for working on this,
Tamar

>    if (!optab)
>      return false;
> 
> --
> 2.34.1

Reply via email to