Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-20 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, May 17, 2024 at 11:56 AM Tamar Christina
>  wrote:
>>
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Friday, May 17, 2024 10:46 AM
>> > To: Tamar Christina 
>> > Cc: Victor Do Nascimento ; gcc-
>> > patc...@gcc.gnu.org; Richard Sandiford ; Richard
>> > Earnshaw ; Victor Do Nascimento
>> > 
>> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > autovectorizer
>> >
>> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>> >  wrote:
>> > >
>> > > > -Original Message-
>> > > > From: Richard Biener 
>> > > > Sent: Friday, May 17, 2024 6:51 AM
>> > > > To: Victor Do Nascimento 
>> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
>> > ;
>> > > > Richard Earnshaw ; Victor Do Nascimento
>> > > > 
>> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > > > autovectorizer
>> > > >
>> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>> > > >  wrote:
>> > > > >
>> > > > > From: Victor Do Nascimento 
>> > > > >
>> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
>> > > > > optabs for dealing with vectorizable dot product code sequences.  The
>> > > > > consequence of using a direct optab for this is that backend-pattern
>> > > > > selection is only ever able to match against one datatype - Either
>> > > > > that of the operands or of the accumulated value, never both.
>> > > > >
>> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
>> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
>> > > > > sufficient for full specification of all the possible dot product
>> > > > > machine instructions to be matched to the code sequence; a dot 
>> > > > > product
>> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
>> > > > > VNx8HI values for the 4- and 2-way dot product operations, 
>> > > > > respectively.
>> > > > >
>> > > > > This means that the following example fails autovectorization:
>> > > > >
>> > > > > uint32_t foo(int n, uint16_t* data) {
>> > > > >   uint32_t sum = 0;
>> > > > >   for (int i=0; i> > > > > sum += data[i] * data[i];
>> > > > >   }
>> > > > >   return sum;
>> > > > > }
>> > > > >
>> > > > > To remedy the issue a new optab is added, tentatively named
>> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
>> > > > > of both input and output types involved in the operation.
>> > > >
>> > > > I don't like this too much.  I'll note we document dot_prod as
>> > > >
>> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
>> > > > @item @samp{sdot_prod@var{m}}
>> > > >
>> > > > Compute the sum of the products of two signed elements.
>> > > > Operand 1 and operand 2 are of the same mode. Their
>> > > > product, which is of a wider mode, is computed and added to operand 3.
>> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
>> > > > result is placed in operand 0, which is of the same mode as operand 3.
>> > > > @var{m} is the mode of operand 1 and operand 2.
>> > > >
>> > > > with no restriction on the wider mode but we don't specify it which is
>> > > > bad design.  This should have been a convert optab with two modes
>> > > > from the start - adding a _twoway variant is just a hack.
>> > >
>> > > We did discuss this at the time we started implementing it.  There was 
>> > > two
>> > > options, one was indeed to change it to a convert dot_prod optab, but 
>> > > doing
>> > > this means we have to update every target that uses it.
>> > >
>> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
>> > altivec.
>> > >
>> > > Which sure could be possible, but there's also every use in the backends 
>> > > that
>> > need
>> > > to be updated, and tested, which for some targets we don't even know how 
>> > > to
>> > begin.
>> > >
>> > > So it seems very hard to correct dotprod to a convert optab now.
>> >
>> > It's still the correct way to go.  At _least_ your new pattern should
>> > have been this,
>> > otherwise what do you do when you have two-way, four-way and eight-way
>> > variants?
>> > Add yet another optab?
>>
>> I guess that's fair, but having the new optab only be convert resulted in 
>> messy
>> code as everywhere you must check for both variants.
>>
>> Additionally that optab would then overlap with the existing optabs as, as 
>> you
>> Say, the documentation only says it's of a wider type and doesn't indicate
>> precision.
>>
>> So to avoid issues down the line then If the new optab isn't acceptable then
>> we'll have to do a wholesale conversion then..
>
> Yep.  It shouldn't be difficult though.

Still catching up, but FWIW, I agree this is the way to go.  (Convert all
existing dot_prods to convert optabs first, and then add the new AArch64
ones.)  Having two mechanisms feels like storing up trouble for later. :)

Richard


Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Victor Do Nascimento

Dear Richard and Tamar,

Thanks to the both of you for the various bits of feedback.
I've implemented all the more straightforward bits of feedback given, 
leaving "only" the merging of the two- and four-way dot product optabs 
into one, together with the necessary changes to the various backends 
which, though a little time-consuming, should be rather mechanical.


I had originally implemented the new two-way dotprod optab as a covert 
optab anyway, so going back to the work on that local branch will give 
me a good starting point from which to do this.


And Tamar, thanks very much for the feedback regarding the unit-tests. 
I knew my testing as it currently is was rather anaemic and was eager to 
get the relevant feedback on it.  Rest assured it's all been taken on board.


Cheers,
Victor


On 5/17/24 11:13, Richard Biener wrote:

On Fri, May 17, 2024 at 11:56 AM Tamar Christina
 wrote:



-Original Message-
From: Richard Biener 
Sent: Friday, May 17, 2024 10:46 AM
To: Tamar Christina 
Cc: Victor Do Nascimento ; gcc-
patc...@gcc.gnu.org; Richard Sandiford ; Richard
Earnshaw ; Victor Do Nascimento

Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer

On Fri, May 17, 2024 at 11:05 AM Tamar Christina
 wrote:



-Original Message-
From: Richard Biener 
Sent: Friday, May 17, 2024 6:51 AM
To: Victor Do Nascimento 
Cc: gcc-patches@gcc.gnu.org; Richard Sandiford

;

Richard Earnshaw ; Victor Do Nascimento

Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer

On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
 wrote:


From: Victor Do Nascimento 

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
   uint32_t sum = 0;
   for (int i=0; i

I don't like this too much.  I'll note we document dot_prod as

@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}

Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.

with no restriction on the wider mode but we don't specify it which is
bad design.  This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.


We did discuss this at the time we started implementing it.  There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.

Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and

altivec.


Which sure could be possible, but there's also every use in the backends that

need

to be updated, and tested, which for some targets we don't even know how to

begin.


So it seems very hard to correct dotprod to a convert optab now.


It's still the correct way to go.  At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way
variants?
Add yet another optab?


I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.

Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.

So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..


Yep.  It shouldn't be difficult though.



Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.

And sure, it's not the only optab suffering from this issue.


Sure, all the zero and sign extending optabs for instance 


But for example the scalar ones are correct:

OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)

Richard.


Tamar



Richard.


Tamar



Richard.


In order to minimize changes to the existing 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Richard Biener
On Fri, May 17, 2024 at 11:56 AM Tamar Christina
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 17, 2024 10:46 AM
> > To: Tamar Christina 
> > Cc: Victor Do Nascimento ; gcc-
> > patc...@gcc.gnu.org; Richard Sandiford ; Richard
> > Earnshaw ; Victor Do Nascimento
> > 
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
> >  wrote:
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, May 17, 2024 6:51 AM
> > > > To: Victor Do Nascimento 
> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > ;
> > > > Richard Earnshaw ; Victor Do Nascimento
> > > > 
> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > > autovectorizer
> > > >
> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > > >  wrote:
> > > > >
> > > > > From: Victor Do Nascimento 
> > > > >
> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > > consequence of using a direct optab for this is that backend-pattern
> > > > > selection is only ever able to match against one datatype - Either
> > > > > that of the operands or of the accumulated value, never both.
> > > > >
> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > > sufficient for full specification of all the possible dot product
> > > > > machine instructions to be matched to the code sequence; a dot product
> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > > VNx8HI values for the 4- and 2-way dot product operations, 
> > > > > respectively.
> > > > >
> > > > > This means that the following example fails autovectorization:
> > > > >
> > > > > uint32_t foo(int n, uint16_t* data) {
> > > > >   uint32_t sum = 0;
> > > > >   for (int i=0; i > > > > sum += data[i] * data[i];
> > > > >   }
> > > > >   return sum;
> > > > > }
> > > > >
> > > > > To remedy the issue a new optab is added, tentatively named
> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > > of both input and output types involved in the operation.
> > > >
> > > > I don't like this too much.  I'll note we document dot_prod as
> > > >
> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > > @item @samp{sdot_prod@var{m}}
> > > >
> > > > Compute the sum of the products of two signed elements.
> > > > Operand 1 and operand 2 are of the same mode. Their
> > > > product, which is of a wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > > result is placed in operand 0, which is of the same mode as operand 3.
> > > > @var{m} is the mode of operand 1 and operand 2.
> > > >
> > > > with no restriction on the wider mode but we don't specify it which is
> > > > bad design.  This should have been a convert optab with two modes
> > > > from the start - adding a _twoway variant is just a hack.
> > >
> > > We did discuss this at the time we started implementing it.  There was two
> > > options, one was indeed to change it to a convert dot_prod optab, but 
> > > doing
> > > this means we have to update every target that uses it.
> > >
> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> > altivec.
> > >
> > > Which sure could be possible, but there's also every use in the backends 
> > > that
> > need
> > > to be updated, and tested, which for some targets we don't even know how 
> > > to
> > begin.
> > >
> > > So it seems very hard to correct dotprod to a convert optab now.
> >
> > It's still the correct way to go.  At _least_ your new pattern should
> > have been this,
> > othe

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 17, 2024 10:46 AM
> To: Tamar Christina 
> Cc: Victor Do Nascimento ; gcc-
> patc...@gcc.gnu.org; Richard Sandiford ; Richard
> Earnshaw ; Victor Do Nascimento
> 
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>  wrote:
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, May 17, 2024 6:51 AM
> > > To: Victor Do Nascimento 
> > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> ;
> > > Richard Earnshaw ; Victor Do Nascimento
> > > 
> > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > autovectorizer
> > >
> > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > >  wrote:
> > > >
> > > > From: Victor Do Nascimento 
> > > >
> > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > consequence of using a direct optab for this is that backend-pattern
> > > > selection is only ever able to match against one datatype - Either
> > > > that of the operands or of the accumulated value, never both.
> > > >
> > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > sufficient for full specification of all the possible dot product
> > > > machine instructions to be matched to the code sequence; a dot product
> > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > > >
> > > > This means that the following example fails autovectorization:
> > > >
> > > > uint32_t foo(int n, uint16_t* data) {
> > > >   uint32_t sum = 0;
> > > >   for (int i=0; i > > > sum += data[i] * data[i];
> > > >   }
> > > >   return sum;
> > > > }
> > > >
> > > > To remedy the issue a new optab is added, tentatively named
> > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > of both input and output types involved in the operation.
> > >
> > > I don't like this too much.  I'll note we document dot_prod as
> > >
> > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > @item @samp{sdot_prod@var{m}}
> > >
> > > Compute the sum of the products of two signed elements.
> > > Operand 1 and operand 2 are of the same mode. Their
> > > product, which is of a wider mode, is computed and added to operand 3.
> > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > result is placed in operand 0, which is of the same mode as operand 3.
> > > @var{m} is the mode of operand 1 and operand 2.
> > >
> > > with no restriction on the wider mode but we don't specify it which is
> > > bad design.  This should have been a convert optab with two modes
> > > from the start - adding a _twoway variant is just a hack.
> >
> > We did discuss this at the time we started implementing it.  There was two
> > options, one was indeed to change it to a convert dot_prod optab, but doing
> > this means we have to update every target that uses it.
> >
> > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> altivec.
> >
> > Which sure could be possible, but there's also every use in the backends 
> > that
> need
> > to be updated, and tested, which for some targets we don't even know how to
> begin.
> >
> > So it seems very hard to correct dotprod to a convert optab now.
> 
> It's still the correct way to go.  At _least_ your new pattern should
> have been this,
> otherwise what do you do when you have two-way, four-way and eight-way
> variants?
> Add yet another optab?

I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.

Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.

So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..

> 
> Another thing is that when you do it 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Richard Biener
On Fri, May 17, 2024 at 11:05 AM Tamar Christina
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 17, 2024 6:51 AM
> > To: Victor Do Nascimento 
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> > Richard Earnshaw ; Victor Do Nascimento
> > 
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> >  wrote:
> > >
> > > From: Victor Do Nascimento 
> > >
> > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > optabs for dealing with vectorizable dot product code sequences.  The
> > > consequence of using a direct optab for this is that backend-pattern
> > > selection is only ever able to match against one datatype - Either
> > > that of the operands or of the accumulated value, never both.
> > >
> > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > sufficient for full specification of all the possible dot product
> > > machine instructions to be matched to the code sequence; a dot product
> > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > >
> > > This means that the following example fails autovectorization:
> > >
> > > uint32_t foo(int n, uint16_t* data) {
> > >   uint32_t sum = 0;
> > >   for (int i=0; i > > sum += data[i] * data[i];
> > >   }
> > >   return sum;
> > > }
> > >
> > > To remedy the issue a new optab is added, tentatively named
> > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > of both input and output types involved in the operation.
> >
> > I don't like this too much.  I'll note we document dot_prod as
> >
> > @cindex @code{sdot_prod@var{m}} instruction pattern
> > @item @samp{sdot_prod@var{m}}
> >
> > Compute the sum of the products of two signed elements.
> > Operand 1 and operand 2 are of the same mode. Their
> > product, which is of a wider mode, is computed and added to operand 3.
> > Operand 3 is of a mode equal or wider than the mode of the product. The
> > result is placed in operand 0, which is of the same mode as operand 3.
> > @var{m} is the mode of operand 1 and operand 2.
> >
> > with no restriction on the wider mode but we don't specify it which is
> > bad design.  This should have been a convert optab with two modes
> > from the start - adding a _twoway variant is just a hack.
>
> We did discuss this at the time we started implementing it.  There was two
> options, one was indeed to change it to a convert dot_prod optab, but doing
> this means we have to update every target that uses it.
>
> Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and 
> altivec.
>
> Which sure could be possible, but there's also every use in the backends that 
> need
> to be updated, and tested, which for some targets we don't even know how to 
> begin.
>
> So it seems very hard to correct dotprod to a convert optab now.

It's still the correct way to go.  At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way variants?
Add yet another optab?

Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.

And sure, it's not the only optab suffering from this issue.

Richard.

> Tamar
>
> >
> > Richard.
> >
> > > In order to minimize changes to the existing codebase,
> > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > argument is added to its signature - `const_tree otype', allowing type
> > > information to be specified for both input and output types.  The
> > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > which serves as a shim to `optab_for_tree_code_1', passing old
> > > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> > >
> > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > > directly, passing it both types, adding the internal logic to the
> > > function to distinguish between competing optabs.
> > >
> > > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > > ensure the new ic

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message-
> From: Hongtao Liu 
> Sent: Friday, May 17, 2024 3:14 AM
> To: Victor Do Nascimento 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> Richard Earnshaw ; Victor Do Nascimento
> 
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> > >
> > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
> > 2-way dot_prod operations can be generated
> >
> This is the link https://godbolt.org/z/hcWr64vx3, x86 define
> udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod
> instructions are generated
> 

That's not the same, the 2-way vs 4-way dot_prod here is that
e.g. udot_prod8hi can reduce to either DImode or SImode.
udot_prod8hi does not have enough information to distinguish the two and in RTL
you can't overload the names.  So this is about the ISA having instructions 
that overlap
on the source mode of the instruction.

Tamar

> 
> --
> BR,
> Hongtao


RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 17, 2024 6:51 AM
> To: Victor Do Nascimento 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> Richard Earnshaw ; Victor Do Nascimento
> 
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>  wrote:
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> 
> I don't like this too much.  I'll note we document dot_prod as
> 
> @cindex @code{sdot_prod@var{m}} instruction pattern
> @item @samp{sdot_prod@var{m}}
> 
> Compute the sum of the products of two signed elements.
> Operand 1 and operand 2 are of the same mode. Their
> product, which is of a wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or wider than the mode of the product. The
> result is placed in operand 0, which is of the same mode as operand 3.
> @var{m} is the mode of operand 1 and operand 2.
> 
> with no restriction on the wider mode but we don't specify it which is
> bad design.  This should have been a convert optab with two modes
> from the start - adding a _twoway variant is just a hack.

We did discuss this at the time we started implementing it.  There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.

Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and 
altivec.

Which sure could be possible, but there's also every use in the backends that 
need
to be updated, and tested, which for some targets we don't even know how to 
begin.

So it seems very hard to correct dotprod to a convert optab now.

Tamar

> 
> Richard.
> 
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve2.md 
> > (@aarch64_sve_dotvnx4sivnx8hi):
> > renamed to `dot_prod_twoway_vnx8hi'.
> > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > update icodes used in line with above rename.
> > * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > `optab_for_tree_code' and added new argument.
> > (optab_for_tree_code): Now a call to `optab_for_tree_cod

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
 wrote:
>
> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.

I don't like this too much.  I'll note we document dot_prod as

@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}

Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.

with no restriction on the wider mode but we don't specify it which is
bad design.  This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.

Richard.

> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>
> [1] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu
> >
> Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
> 2-way dot_prod operations can be generated
>
This is the link https://godbolt.org/z/hcWr64vx3, x86 define
udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod
instructions are generated


-- 
BR,
Hongtao


Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu
On Thu, May 16, 2024 at 10:40 PM Victor Do Nascimento
 wrote:
>
> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
2-way dot_prod operations can be generated

> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>
> [1] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@ 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 7:46 PM Tamar Christina 
wrote:

> Hi Victor,
>
> > -Original Message-
> > From: Victor Do Nascimento 
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Richard Earnshaw
> > ; Victor Do Nascimento
> > 
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_dotvnx4sivnx8hi):
> >   renamed to `dot_prod_twoway_vnx8hi'.
> >   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >   update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> >   * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >   `optab_for_tree_code' and added new argument.
> >   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >   * optabs-tree.h (optab_for_tree_code_1): New.
> >   * optabs.cc (expand_widen_pattern_expr): Expand support for
> >   DOT_PROD_EXPR patterns.
> >   * optabs.def (udot_prod_twoway_optab): New.
> >   (sdot_prod_twoway_optab): Likewise.
> >   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >   support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md|  2 +-
> >  gcc/optabs-tree.cc| 23 --
> >  gcc/optabs-tree.h |  2 ++
> >  gcc/optabs.cc |  2 +-
> >  gcc/optabs.def|  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
> >  gcc/tree-vect-patterns.cc |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >icode = (e.type_suffix (0).float_p
> >  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >  : e.type_suffix (0).unsigned_p
> > -? 

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Tamar Christina
Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Thursday, May 16, 2024 3:39 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> 
> From: Victor Do Nascimento 
> 
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
> 
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
> 
> This means that the following example fails autovectorization:
> 
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
> 
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
> 
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> 
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
> 
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
> 
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
>   renamed to `dot_prod_twoway_vnx8hi'.
>   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>   update icodes used in line with above rename.

Please split the target specific bits from the target agnostic parts.
I.e. this patch series should be split in two.

>   * optabs-tree.cc (optab_for_tree_code_1): Renamed
>   `optab_for_tree_code' and added new argument.
>   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>   * optabs-tree.h (optab_for_tree_code_1): New.
>   * optabs.cc (expand_widen_pattern_expr): Expand support for
>   DOT_PROD_EXPR patterns.
>   * optabs.def (udot_prod_twoway_optab): New.
>   (sdot_prod_twoway_optab): Likewise.
>   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>   support for misc optabs that use two modes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>  : e.type_suffix (0).unsigned_p
> -? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -: CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +? CODE_FOR_udot_prod_twoway_vnx8hi
> +: CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>

Since you are adding an optab, please update the internals manual with the
documentation of the optab (the standard pattern names section).

Thanks,
Andrew


> [1]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
>