Richard Biener <richard.guent...@gmail.com> writes: > On Fri, May 17, 2024 at 11:56 AM Tamar Christina > <tamar.christ...@arm.com> wrote: >> >> > -----Original Message----- >> > From: Richard Biener <richard.guent...@gmail.com> >> > Sent: Friday, May 17, 2024 10:46 AM >> > To: Tamar Christina <tamar.christ...@arm.com> >> > Cc: Victor Do Nascimento <victor.donascime...@arm.com>; gcc- >> > patc...@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>; Richard >> > Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento >> > <vicdo...@e133397.arm.com> >> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in >> > autovectorizer >> > >> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina >> > <tamar.christ...@arm.com> wrote: >> > > >> > > > -----Original Message----- >> > > > From: Richard Biener <richard.guent...@gmail.com> >> > > > Sent: Friday, May 17, 2024 6:51 AM >> > > > To: Victor Do Nascimento <victor.donascime...@arm.com> >> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford >> > <richard.sandif...@arm.com>; >> > > > Richard Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento >> > > > <vicdo...@e133397.arm.com> >> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in >> > > > autovectorizer >> > > > >> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento >> > > > <victor.donascime...@arm.com> wrote: >> > > > > >> > > > > From: Victor Do Nascimento <vicdo...@e133397.arm.com> >> > > > > >> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct >> > > > > optabs for dealing with vectorizable dot product code sequences. The >> > > > > consequence of using a direct optab for this is that backend-pattern >> > > > > selection is only ever able to match against one datatype - Either >> > > > > that of the operands or of the accumulated value, never both. >> > > > > >> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] >> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer >> > > > > sufficient for full specification of all the possible dot product >> > > > > machine instructions to be matched to the code sequence; a dot >> > > > > product >> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or >> > > > > VNx8HI values for the 4- and 2-way dot product operations, >> > > > > respectively. >> > > > > >> > > > > This means that the following example fails autovectorization: >> > > > > >> > > > > uint32_t foo(int n, uint16_t* data) { >> > > > > uint32_t sum = 0; >> > > > > for (int i=0; i<n; i+=1) { >> > > > > sum += data[i] * data[i]; >> > > > > } >> > > > > return sum; >> > > > > } >> > > > > >> > > > > To remedy the issue a new optab is added, tentatively named >> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking >> > > > > of both input and output types involved in the operation. >> > > > >> > > > I don't like this too much. I'll note we document dot_prod as >> > > > >> > > > @cindex @code{sdot_prod@var{m}} instruction pattern >> > > > @item @samp{sdot_prod@var{m}} >> > > > >> > > > Compute the sum of the products of two signed elements. >> > > > Operand 1 and operand 2 are of the same mode. Their >> > > > product, which is of a wider mode, is computed and added to operand 3. >> > > > Operand 3 is of a mode equal or wider than the mode of the product. The >> > > > result is placed in operand 0, which is of the same mode as operand 3. >> > > > @var{m} is the mode of operand 1 and operand 2. >> > > > >> > > > with no restriction on the wider mode but we don't specify it which is >> > > > bad design. This should have been a convert optab with two modes >> > > > from the start - adding a _twoway variant is just a hack. >> > > >> > > We did discuss this at the time we started implementing it. There was >> > > two >> > > options, one was indeed to change it to a convert dot_prod optab, but >> > > doing >> > > this means we have to update every target that uses it. >> > > >> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and >> > altivec. >> > > >> > > Which sure could be possible, but there's also every use in the backends >> > > that >> > need >> > > to be updated, and tested, which for some targets we don't even know how >> > > to >> > begin. >> > > >> > > So it seems very hard to correct dotprod to a convert optab now. >> > >> > It's still the correct way to go. At _least_ your new pattern should >> > have been this, >> > otherwise what do you do when you have two-way, four-way and eight-way >> > variants? >> > Add yet another optab? >> >> I guess that's fair, but having the new optab only be convert resulted in >> messy >> code as everywhere you must check for both variants. >> >> Additionally that optab would then overlap with the existing optabs as, as >> you >> Say, the documentation only says it's of a wider type and doesn't indicate >> precision. >> >> So to avoid issues down the line then If the new optab isn't acceptable then >> we'll have to do a wholesale conversion then.. > > Yep. It shouldn't be difficult though.
Still catching up, but FWIW, I agree this is the way to go. (Convert all existing dot_prods to convert optabs first, and then add the new AArch64 ones.) Having two mechanisms feels like storing up trouble for later. :) Richard