On Thu, May 16, 2024, 7:46 PM Tamar Christina <tamar.christ...@arm.com> wrote:
> Hi Victor, > > > -----Original Message----- > > From: Victor Do Nascimento <victor.donascime...@arm.com> > > Sent: Thursday, May 16, 2024 3:39 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Richard Earnshaw > > <richard.earns...@arm.com>; Victor Do Nascimento > > <vicdo...@e133397.arm.com> > > Subject: [PATCH] middle-end: Expand {u|s}dot product support in > autovectorizer > > > > From: Victor Do Nascimento <vicdo...@e133397.arm.com> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > optabs for dealing with vectorizable dot product code sequences. The > > consequence of using a direct optab for this is that backend-pattern > > selection is only ever able to match against one datatype - Either > > that of the operands or of the accumulated value, never both. > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > in AArch64 SVE2, the existing direct opcode approach is no longer > > sufficient for full specification of all the possible dot product > > machine instructions to be matched to the code sequence; a dot product > > resulting in VNx4SI may result from either dot products on VNx16QI or > > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > > > This means that the following example fails autovectorization: > > > > uint32_t foo(int n, uint16_t* data) { > > uint32_t sum = 0; > > for (int i=0; i<n; i+=1) { > > sum += data[i] * data[i]; > > } > > return sum; > > } > > > > To remedy the issue a new optab is added, tentatively named > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > of both input and output types involved in the operation. > > > > In order to minimize changes to the existing codebase, > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > > argument is added to its signature - `const_tree otype', allowing type > > information to be specified for both input and output types. The > > existing nterface is retained by defining a new `optab_for_tree_code', > > which serves as a shim to `optab_for_tree_code_1', passing old > > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > > directly, passing it both types, adding the internal logic to the > > function to distinguish between competing optabs. > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > > ensure the new icode can be correctly selected, given the new optab. > > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve2.md > > (@aarch64_sve_<sur>dotvnx4sivnx8hi): > > renamed to `<sur>dot_prod_twoway_vnx8hi'. > > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > > update icodes used in line with above rename. > > Please split the target specific bits from the target agnostic parts. > I.e. this patch series should be split in two. > > > * optabs-tree.cc (optab_for_tree_code_1): Renamed > > `optab_for_tree_code' and added new argument. > > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > > * optabs-tree.h (optab_for_tree_code_1): New. > > * optabs.cc (expand_widen_pattern_expr): Expand support for > > DOT_PROD_EXPR patterns. > > * optabs.def (udot_prod_twoway_optab): New. > > (sdot_prod_twoway_optab): Likewise. > > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add > > support for misc optabs that use two modes. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > > --- > > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > > gcc/config/aarch64/aarch64-sve2.md | 2 +- > > gcc/optabs-tree.cc | 23 ++++++++++++++++-- > > gcc/optabs-tree.h | 2 ++ > > gcc/optabs.cc | 2 +- > > gcc/optabs.def | 2 ++ > > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++++++++++++++++++ > > gcc/tree-vect-patterns.cc | 2 +- > > 8 files changed, 54 insertions(+), 7 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > index 0d2edf3f19e..e457db09f66 100644 > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > @@ -764,8 +764,8 @@ public: > > icode = (e.type_suffix (0).float_p > > ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf > > : e.type_suffix (0).unsigned_p > > - ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi > > - : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi); > > + ? CODE_FOR_udot_prod_twoway_vnx8hi > > + : CODE_FOR_sdot_prod_twoway_vnx8hi); > > return e.use_unpred_insn (icode); > > } > > }; > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64- > > sve2.md > > index 934e57055d3..5677de7108d 100644 > > --- a/gcc/config/aarch64/aarch64-sve2.md > > +++ b/gcc/config/aarch64/aarch64-sve2.md > > @@ -2021,7 +2021,7 @@ (define_insn > > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>" > > ) > > > > ;; Two-way dot-product. > > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi" > > +(define_insn "<sur>dot_prod_twoway_vnx8hi" > > This drops the @ modifier, did it have no callers? > > > [(set (match_operand:VNx4SI 0 "register_operand") > > (plus:VNx4SI > > (unspec:VNx4SI > > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc > > index b69a5bc3676..e3c5a618ea2 100644 > > --- a/gcc/optabs-tree.cc > > +++ b/gcc/optabs-tree.cc > > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3. If not see > > cannot give complete results for multiplication or division) but > probably > > ought to be relied on more widely throughout the expander. */ > > optab > > -optab_for_tree_code (enum tree_code code, const_tree type, > > - enum optab_subtype subtype) > > +optab_for_tree_code_1 (enum tree_code code, const_tree type, > > + const_tree otype, enum optab_subtype subtype) > > { > > Since we're C++ now, why not just overload optab_for_tree_code. > Both functions are useful in their own right so it seems better to > overload them. > By overloading it also means you don't have to change the existing call > sites. > > I'd also make the optab_for_tree_code with one type just call > optab_for_tree_code > with the same type twice. > > > bool trapv; > > switch (code) > > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree > > type, > > > > case DOT_PROD_EXPR: > > { > > + if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2 > > + == TYPE_PRECISION (TREE_TYPE (otype)))) > > + { > > Because then you can remove the check for otype here, > And store TREE_TYPE (type) and TREE_TYPE (otype) safely into a variable so > the if becomes > if (otype != type > && TYPE_PRECISION ... > && TYPE_UNSIGNED (type) == TYPE_UNSIGNED (otype) > > > + if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype)) > > + return udot_prod_twoway_optab; > > + if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype)) > > + return sdot_prod_twoway_optab; > > + } > > And you can then simplify this into > > return (TYPE_UNSIGNED (otype) ? udot_prod_twoway_optab : > sdot_prod_twoway_optab; > > > if (subtype == optab_vector_mixed_sign) > > return usdot_prod_optab; > > > > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code, const_tree > > type, > > } > > } > > > > +/* Return the optab used for computing the operation given by the tree > code, > > + CODE and the tree EXP. This function is not always usable (for > example, it > > + cannot give complete results for multiplication or division) but > probably > > + ought to be relied on more widely throughout the expander. */ > > +optab > > +optab_for_tree_code (enum tree_code code, const_tree type, > > + enum optab_subtype subtype) > > +{ > > + return optab_for_tree_code_1 (code, type, NULL_TREE, subtype); > > +} > > + > > /* Check whether an operation represented by CODE is a 'half' widening > operation > > in which the input vector type has half the number of bits of the > output > > vector type e.g. V8QI->V8HI. > > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h > > index f2b49991462..13ff7ca2e4b 100644 > > --- a/gcc/optabs-tree.h > > +++ b/gcc/optabs-tree.h > > @@ -36,6 +36,8 @@ enum optab_subtype > > /* Return the optab used for computing the given operation on the type > given by > > the second argument. The third argument distinguishes between the > types of > > vector shifts and rotates. */ > > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree > > __attribute__((unused)), > > + enum optab_subtype ); > > optab optab_for_tree_code (enum tree_code, const_tree, enum > optab_subtype); > > bool > > supportable_half_widening_operation (enum tree_code, tree, tree, > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > > index ce91f94ed43..3a1c6c7b90e 100644 > > --- a/gcc/optabs.cc > > +++ b/gcc/optabs.cc > > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx > > op1, rtx wide_op, > > gcc_unreachable (); > > > > widen_pattern_optab > > - = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype); > > + = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE > > (oprnd2), subtype); > > } > > else > > widen_pattern_optab > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index ad14f9328b9..cf1a6e7a7dc 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") > > OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") > > OPTAB_D (udot_prod_optab, "udot_prod$I$a") > > OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") > > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a") > > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a") > > OPTAB_D (usum_widen_optab, "widen_usum$I$a3") > > OPTAB_D (usad_optab, "usad$I$a") > > OPTAB_D (ssad_optab, "ssad$I$a") > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > new file mode 100644 > > index 00000000000..cba2aadbec8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do compile { target { aarch64*-*-* } } } */ > > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 > -fdump-tree- > > vect-details" { target { aarch64*-*-* } } } */ > > Does this work? -march=-O3? I am guessing you're getting lucky that the > later -march overrides it? > Yes the later overrides the first one. I did report that as a bug a few years back, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103422 . Thanks, Andrew > For tests in the vectorizer testsuite you shouldn't give optimization > options as the tests > are ran with -fno-vect-cost-model. I.e. we'll vectorize even if > unprofitable. > > You also shouldn't add the -fdump-tree-vect-details, the vectorizer > testsuite adds it > automatically. > > > + > > +#include <stdint.h> > > + > > +uint32_t udot2(int n, uint16_t* data) { > > + uint32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +int32_t sdot2(int n, int16_t* data) { > > + int32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > So you have an AArch64 only test and place it in generic folder. > You should either make it aarch64 only or make the test a generic test. > > Normally for these I add a generic test that scans for the vectorizer > scans as you've done below > And I then add an aach64 specific test with check-functions-bodies that > checks that we actually > produced the right instructions. > > Your test at the moment do not actually guarantee that the instruction > gets generated. > > Look at how the tests for 4 way dot product are written for inspiration > here. > > > + > > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: > detected:" 2 > > "vect" } } */ > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: > detected" > > 4 "vect" } } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } > */ > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index dfb7d800526..0760f25d94d 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, > tree > > otype, tree_code code, > > if (!vecotype) > > return false; > > > > - optab optab = optab_for_tree_code (code, vecitype, subtype); > > + optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, > subtype); > > If you overload the function then these changes go away. > > Thanks for working on this, > Tamar > > > if (!optab) > > return false; > > > > -- > > 2.34.1 > >