Richard Biener <richard.guent...@gmail.com> writes:
> On Fri, May 17, 2024 at 11:56 AM Tamar Christina
> <tamar.christ...@arm.com> wrote:
>>
>> > -----Original Message-----
>> > From: Richard Biener <richard.guent...@gmail.com>
>> > Sent: Friday, May 17, 2024 10:46 AM
>> > To: Tamar Christina <tamar.christ...@arm.com>
>> > Cc: Victor Do Nascimento <victor.donascime...@arm.com>; gcc-
>> > patc...@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>; Richard
>> > Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento
>> > <vicdo...@e133397.arm.com>
>> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > autovectorizer
>> >
>> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>> > <tamar.christ...@arm.com> wrote:
>> > >
>> > > > -----Original Message-----
>> > > > From: Richard Biener <richard.guent...@gmail.com>
>> > > > Sent: Friday, May 17, 2024 6:51 AM
>> > > > To: Victor Do Nascimento <victor.donascime...@arm.com>
>> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
>> > <richard.sandif...@arm.com>;
>> > > > Richard Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento
>> > > > <vicdo...@e133397.arm.com>
>> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > > > autovectorizer
>> > > >
>> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>> > > > <victor.donascime...@arm.com> wrote:
>> > > > >
>> > > > > From: Victor Do Nascimento <vicdo...@e133397.arm.com>
>> > > > >
>> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
>> > > > > optabs for dealing with vectorizable dot product code sequences.  The
>> > > > > consequence of using a direct optab for this is that backend-pattern
>> > > > > selection is only ever able to match against one datatype - Either
>> > > > > that of the operands or of the accumulated value, never both.
>> > > > >
>> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
>> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
>> > > > > sufficient for full specification of all the possible dot product
>> > > > > machine instructions to be matched to the code sequence; a dot 
>> > > > > product
>> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
>> > > > > VNx8HI values for the 4- and 2-way dot product operations, 
>> > > > > respectively.
>> > > > >
>> > > > > This means that the following example fails autovectorization:
>> > > > >
>> > > > > uint32_t foo(int n, uint16_t* data) {
>> > > > >   uint32_t sum = 0;
>> > > > >   for (int i=0; i<n; i+=1) {
>> > > > >     sum += data[i] * data[i];
>> > > > >   }
>> > > > >   return sum;
>> > > > > }
>> > > > >
>> > > > > To remedy the issue a new optab is added, tentatively named
>> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
>> > > > > of both input and output types involved in the operation.
>> > > >
>> > > > I don't like this too much.  I'll note we document dot_prod as
>> > > >
>> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
>> > > > @item @samp{sdot_prod@var{m}}
>> > > >
>> > > > Compute the sum of the products of two signed elements.
>> > > > Operand 1 and operand 2 are of the same mode. Their
>> > > > product, which is of a wider mode, is computed and added to operand 3.
>> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
>> > > > result is placed in operand 0, which is of the same mode as operand 3.
>> > > > @var{m} is the mode of operand 1 and operand 2.
>> > > >
>> > > > with no restriction on the wider mode but we don't specify it which is
>> > > > bad design.  This should have been a convert optab with two modes
>> > > > from the start - adding a _twoway variant is just a hack.
>> > >
>> > > We did discuss this at the time we started implementing it.  There was 
>> > > two
>> > > options, one was indeed to change it to a convert dot_prod optab, but 
>> > > doing
>> > > this means we have to update every target that uses it.
>> > >
>> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
>> > altivec.
>> > >
>> > > Which sure could be possible, but there's also every use in the backends 
>> > > that
>> > need
>> > > to be updated, and tested, which for some targets we don't even know how 
>> > > to
>> > begin.
>> > >
>> > > So it seems very hard to correct dotprod to a convert optab now.
>> >
>> > It's still the correct way to go.  At _least_ your new pattern should
>> > have been this,
>> > otherwise what do you do when you have two-way, four-way and eight-way
>> > variants?
>> > Add yet another optab?
>>
>> I guess that's fair, but having the new optab only be convert resulted in 
>> messy
>> code as everywhere you must check for both variants.
>>
>> Additionally that optab would then overlap with the existing optabs as, as 
>> you
>> Say, the documentation only says it's of a wider type and doesn't indicate
>> precision.
>>
>> So to avoid issues down the line then If the new optab isn't acceptable then
>> we'll have to do a wholesale conversion then..
>
> Yep.  It shouldn't be difficult though.

Still catching up, but FWIW, I agree this is the way to go.  (Convert all
existing dot_prods to convert optabs first, and then add the new AArch64
ones.)  Having two mechanisms feels like storing up trouble for later. :)

Richard

Reply via email to