Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
Richard Biener writes: > On Fri, May 17, 2024 at 11:56 AM Tamar Christina > wrote: >> >> > -Original Message- >> > From: Richard Biener >> > Sent: Friday, May 17, 2024 10:46 AM >> > To: Tamar Christina >> > Cc: Victor Do Nascimento ; gcc- >> > patc...@gcc.gnu.org; Richard Sandiford ; Richard >> > Earnshaw ; Victor Do Nascimento >> > >> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in >> > autovectorizer >> > >> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina >> > wrote: >> > > >> > > > -Original Message- >> > > > From: Richard Biener >> > > > Sent: Friday, May 17, 2024 6:51 AM >> > > > To: Victor Do Nascimento >> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford >> > ; >> > > > Richard Earnshaw ; Victor Do Nascimento >> > > > >> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in >> > > > autovectorizer >> > > > >> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento >> > > > wrote: >> > > > > >> > > > > From: Victor Do Nascimento >> > > > > >> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct >> > > > > optabs for dealing with vectorizable dot product code sequences. The >> > > > > consequence of using a direct optab for this is that backend-pattern >> > > > > selection is only ever able to match against one datatype - Either >> > > > > that of the operands or of the accumulated value, never both. >> > > > > >> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] >> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer >> > > > > sufficient for full specification of all the possible dot product >> > > > > machine instructions to be matched to the code sequence; a dot >> > > > > product >> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or >> > > > > VNx8HI values for the 4- and 2-way dot product operations, >> > > > > respectively. >> > > > > >> > > > > This means that the following example fails autovectorization: >> > > > > >> > > > > uint32_t foo(int n, uint16_t* data) { >> > > > > uint32_t sum = 0; >> > > > > for (int i=0; i> > > > > sum += data[i] * data[i]; >> > > > > } >> > > > > return sum; >> > > > > } >> > > > > >> > > > > To remedy the issue a new optab is added, tentatively named >> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking >> > > > > of both input and output types involved in the operation. >> > > > >> > > > I don't like this too much. I'll note we document dot_prod as >> > > > >> > > > @cindex @code{sdot_prod@var{m}} instruction pattern >> > > > @item @samp{sdot_prod@var{m}} >> > > > >> > > > Compute the sum of the products of two signed elements. >> > > > Operand 1 and operand 2 are of the same mode. Their >> > > > product, which is of a wider mode, is computed and added to operand 3. >> > > > Operand 3 is of a mode equal or wider than the mode of the product. The >> > > > result is placed in operand 0, which is of the same mode as operand 3. >> > > > @var{m} is the mode of operand 1 and operand 2. >> > > > >> > > > with no restriction on the wider mode but we don't specify it which is >> > > > bad design. This should have been a convert optab with two modes >> > > > from the start - adding a _twoway variant is just a hack. >> > > >> > > We did discuss this at the time we started implementing it. There was >> > > two >> > > options, one was indeed to change it to a convert dot_prod optab, but >> > > doing >> > > this means we have to update every target that uses it. >> > > >> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and >> > altivec. >> > > >> > > Which sure could be possible, but there's also every use in the backends >> > > that >> > need >> > > to be updated, and tested, which for some targets we don't even know how >> > > to >> > begin. >> > > >> > > So it seems very hard to correct dotprod to a convert optab now. >> > >> > It's still the correct way to go. At _least_ your new pattern should >> > have been this, >> > otherwise what do you do when you have two-way, four-way and eight-way >> > variants? >> > Add yet another optab? >> >> I guess that's fair, but having the new optab only be convert resulted in >> messy >> code as everywhere you must check for both variants. >> >> Additionally that optab would then overlap with the existing optabs as, as >> you >> Say, the documentation only says it's of a wider type and doesn't indicate >> precision. >> >> So to avoid issues down the line then If the new optab isn't acceptable then >> we'll have to do a wholesale conversion then.. > > Yep. It shouldn't be difficult though. Still catching up, but FWIW, I agree this is the way to go. (Convert all existing dot_prods to convert optabs first, and then add the new AArch64 ones.) Having two mechanisms feels like storing up trouble for later. :) Richard
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
Dear Richard and Tamar, Thanks to the both of you for the various bits of feedback. I've implemented all the more straightforward bits of feedback given, leaving "only" the merging of the two- and four-way dot product optabs into one, together with the necessary changes to the various backends which, though a little time-consuming, should be rather mechanical. I had originally implemented the new two-way dotprod optab as a covert optab anyway, so going back to the work on that local branch will give me a good starting point from which to do this. And Tamar, thanks very much for the feedback regarding the unit-tests. I knew my testing as it currently is was rather anaemic and was eager to get the relevant feedback on it. Rest assured it's all been taken on board. Cheers, Victor On 5/17/24 11:13, Richard Biener wrote: On Fri, May 17, 2024 at 11:56 AM Tamar Christina wrote: -Original Message- From: Richard Biener Sent: Friday, May 17, 2024 10:46 AM To: Tamar Christina Cc: Victor Do Nascimento ; gcc- patc...@gcc.gnu.org; Richard Sandiford ; Richard Earnshaw ; Victor Do Nascimento Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer On Fri, May 17, 2024 at 11:05 AM Tamar Christina wrote: -Original Message- From: Richard Biener Sent: Friday, May 17, 2024 6:51 AM To: Victor Do Nascimento Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; Richard Earnshaw ; Victor Do Nascimento Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento wrote: From: Victor Do Nascimento At present, the compiler offers the `{u|s|us}dot_prod_optab' direct optabs for dealing with vectorizable dot product code sequences. The consequence of using a direct optab for this is that backend-pattern selection is only ever able to match against one datatype - Either that of the operands or of the accumulated value, never both. With the introduction of the 2-way (un)signed dot-product insn [1][2] in AArch64 SVE2, the existing direct opcode approach is no longer sufficient for full specification of all the possible dot product machine instructions to be matched to the code sequence; a dot product resulting in VNx4SI may result from either dot products on VNx16QI or VNx8HI values for the 4- and 2-way dot product operations, respectively. This means that the following example fails autovectorization: uint32_t foo(int n, uint16_t* data) { uint32_t sum = 0; for (int i=0; i I don't like this too much. I'll note we document dot_prod as @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} Compute the sum of the products of two signed elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. @var{m} is the mode of operand 1 and operand 2. with no restriction on the wider mode but we don't specify it which is bad design. This should have been a convert optab with two modes from the start - adding a _twoway variant is just a hack. We did discuss this at the time we started implementing it. There was two options, one was indeed to change it to a convert dot_prod optab, but doing this means we have to update every target that uses it. Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and altivec. Which sure could be possible, but there's also every use in the backends that need to be updated, and tested, which for some targets we don't even know how to begin. So it seems very hard to correct dotprod to a convert optab now. It's still the correct way to go. At _least_ your new pattern should have been this, otherwise what do you do when you have two-way, four-way and eight-way variants? Add yet another optab? I guess that's fair, but having the new optab only be convert resulted in messy code as everywhere you must check for both variants. Additionally that optab would then overlap with the existing optabs as, as you Say, the documentation only says it's of a wider type and doesn't indicate precision. So to avoid issues down the line then If the new optab isn't acceptable then we'll have to do a wholesale conversion then.. Yep. It shouldn't be difficult though. Another thing is that when you do it your way you should fix the existing optab to be two-way by documenting how the second mode derives from the first. And sure, it's not the only optab suffering from this issue. Sure, all the zero and sign extending optabs for instance But for example the scalar ones are correct: OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend", gen_extend_conv_libfunc) Richard. Tamar Richard. Tamar Richard. In order to minimize changes to the existing
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
On Fri, May 17, 2024 at 11:56 AM Tamar Christina wrote: > > > -Original Message- > > From: Richard Biener > > Sent: Friday, May 17, 2024 10:46 AM > > To: Tamar Christina > > Cc: Victor Do Nascimento ; gcc- > > patc...@gcc.gnu.org; Richard Sandiford ; Richard > > Earnshaw ; Victor Do Nascimento > > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > > autovectorizer > > > > On Fri, May 17, 2024 at 11:05 AM Tamar Christina > > wrote: > > > > > > > -Original Message- > > > > From: Richard Biener > > > > Sent: Friday, May 17, 2024 6:51 AM > > > > To: Victor Do Nascimento > > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford > > ; > > > > Richard Earnshaw ; Victor Do Nascimento > > > > > > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > > > > autovectorizer > > > > > > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento > > > > wrote: > > > > > > > > > > From: Victor Do Nascimento > > > > > > > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > > > > optabs for dealing with vectorizable dot product code sequences. The > > > > > consequence of using a direct optab for this is that backend-pattern > > > > > selection is only ever able to match against one datatype - Either > > > > > that of the operands or of the accumulated value, never both. > > > > > > > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > > > > in AArch64 SVE2, the existing direct opcode approach is no longer > > > > > sufficient for full specification of all the possible dot product > > > > > machine instructions to be matched to the code sequence; a dot product > > > > > resulting in VNx4SI may result from either dot products on VNx16QI or > > > > > VNx8HI values for the 4- and 2-way dot product operations, > > > > > respectively. > > > > > > > > > > This means that the following example fails autovectorization: > > > > > > > > > > uint32_t foo(int n, uint16_t* data) { > > > > > uint32_t sum = 0; > > > > > for (int i=0; i > > > > sum += data[i] * data[i]; > > > > > } > > > > > return sum; > > > > > } > > > > > > > > > > To remedy the issue a new optab is added, tentatively named > > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > > > > of both input and output types involved in the operation. > > > > > > > > I don't like this too much. I'll note we document dot_prod as > > > > > > > > @cindex @code{sdot_prod@var{m}} instruction pattern > > > > @item @samp{sdot_prod@var{m}} > > > > > > > > Compute the sum of the products of two signed elements. > > > > Operand 1 and operand 2 are of the same mode. Their > > > > product, which is of a wider mode, is computed and added to operand 3. > > > > Operand 3 is of a mode equal or wider than the mode of the product. The > > > > result is placed in operand 0, which is of the same mode as operand 3. > > > > @var{m} is the mode of operand 1 and operand 2. > > > > > > > > with no restriction on the wider mode but we don't specify it which is > > > > bad design. This should have been a convert optab with two modes > > > > from the start - adding a _twoway variant is just a hack. > > > > > > We did discuss this at the time we started implementing it. There was two > > > options, one was indeed to change it to a convert dot_prod optab, but > > > doing > > > this means we have to update every target that uses it. > > > > > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and > > altivec. > > > > > > Which sure could be possible, but there's also every use in the backends > > > that > > need > > > to be updated, and tested, which for some targets we don't even know how > > > to > > begin. > > > > > > So it seems very hard to correct dotprod to a convert optab now. > > > > It's still the correct way to go. At _least_ your new pattern should > > have been this, > > othe
RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> -Original Message- > From: Richard Biener > Sent: Friday, May 17, 2024 10:46 AM > To: Tamar Christina > Cc: Victor Do Nascimento ; gcc- > patc...@gcc.gnu.org; Richard Sandiford ; Richard > Earnshaw ; Victor Do Nascimento > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > autovectorizer > > On Fri, May 17, 2024 at 11:05 AM Tamar Christina > wrote: > > > > > -Original Message- > > > From: Richard Biener > > > Sent: Friday, May 17, 2024 6:51 AM > > > To: Victor Do Nascimento > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford > ; > > > Richard Earnshaw ; Victor Do Nascimento > > > > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > > > autovectorizer > > > > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento > > > wrote: > > > > > > > > From: Victor Do Nascimento > > > > > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > > > optabs for dealing with vectorizable dot product code sequences. The > > > > consequence of using a direct optab for this is that backend-pattern > > > > selection is only ever able to match against one datatype - Either > > > > that of the operands or of the accumulated value, never both. > > > > > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > > > in AArch64 SVE2, the existing direct opcode approach is no longer > > > > sufficient for full specification of all the possible dot product > > > > machine instructions to be matched to the code sequence; a dot product > > > > resulting in VNx4SI may result from either dot products on VNx16QI or > > > > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > > > > > > > This means that the following example fails autovectorization: > > > > > > > > uint32_t foo(int n, uint16_t* data) { > > > > uint32_t sum = 0; > > > > for (int i=0; i > > > sum += data[i] * data[i]; > > > > } > > > > return sum; > > > > } > > > > > > > > To remedy the issue a new optab is added, tentatively named > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > > > of both input and output types involved in the operation. > > > > > > I don't like this too much. I'll note we document dot_prod as > > > > > > @cindex @code{sdot_prod@var{m}} instruction pattern > > > @item @samp{sdot_prod@var{m}} > > > > > > Compute the sum of the products of two signed elements. > > > Operand 1 and operand 2 are of the same mode. Their > > > product, which is of a wider mode, is computed and added to operand 3. > > > Operand 3 is of a mode equal or wider than the mode of the product. The > > > result is placed in operand 0, which is of the same mode as operand 3. > > > @var{m} is the mode of operand 1 and operand 2. > > > > > > with no restriction on the wider mode but we don't specify it which is > > > bad design. This should have been a convert optab with two modes > > > from the start - adding a _twoway variant is just a hack. > > > > We did discuss this at the time we started implementing it. There was two > > options, one was indeed to change it to a convert dot_prod optab, but doing > > this means we have to update every target that uses it. > > > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and > altivec. > > > > Which sure could be possible, but there's also every use in the backends > > that > need > > to be updated, and tested, which for some targets we don't even know how to > begin. > > > > So it seems very hard to correct dotprod to a convert optab now. > > It's still the correct way to go. At _least_ your new pattern should > have been this, > otherwise what do you do when you have two-way, four-way and eight-way > variants? > Add yet another optab? I guess that's fair, but having the new optab only be convert resulted in messy code as everywhere you must check for both variants. Additionally that optab would then overlap with the existing optabs as, as you Say, the documentation only says it's of a wider type and doesn't indicate precision. So to avoid issues down the line then If the new optab isn't acceptable then we'll have to do a wholesale conversion then.. > > Another thing is that when you do it
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
On Fri, May 17, 2024 at 11:05 AM Tamar Christina wrote: > > > -Original Message- > > From: Richard Biener > > Sent: Friday, May 17, 2024 6:51 AM > > To: Victor Do Nascimento > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > > Richard Earnshaw ; Victor Do Nascimento > > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > > autovectorizer > > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento > > wrote: > > > > > > From: Victor Do Nascimento > > > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > > optabs for dealing with vectorizable dot product code sequences. The > > > consequence of using a direct optab for this is that backend-pattern > > > selection is only ever able to match against one datatype - Either > > > that of the operands or of the accumulated value, never both. > > > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > > in AArch64 SVE2, the existing direct opcode approach is no longer > > > sufficient for full specification of all the possible dot product > > > machine instructions to be matched to the code sequence; a dot product > > > resulting in VNx4SI may result from either dot products on VNx16QI or > > > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > > > > > This means that the following example fails autovectorization: > > > > > > uint32_t foo(int n, uint16_t* data) { > > > uint32_t sum = 0; > > > for (int i=0; i > > sum += data[i] * data[i]; > > > } > > > return sum; > > > } > > > > > > To remedy the issue a new optab is added, tentatively named > > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > > of both input and output types involved in the operation. > > > > I don't like this too much. I'll note we document dot_prod as > > > > @cindex @code{sdot_prod@var{m}} instruction pattern > > @item @samp{sdot_prod@var{m}} > > > > Compute the sum of the products of two signed elements. > > Operand 1 and operand 2 are of the same mode. Their > > product, which is of a wider mode, is computed and added to operand 3. > > Operand 3 is of a mode equal or wider than the mode of the product. The > > result is placed in operand 0, which is of the same mode as operand 3. > > @var{m} is the mode of operand 1 and operand 2. > > > > with no restriction on the wider mode but we don't specify it which is > > bad design. This should have been a convert optab with two modes > > from the start - adding a _twoway variant is just a hack. > > We did discuss this at the time we started implementing it. There was two > options, one was indeed to change it to a convert dot_prod optab, but doing > this means we have to update every target that uses it. > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and > altivec. > > Which sure could be possible, but there's also every use in the backends that > need > to be updated, and tested, which for some targets we don't even know how to > begin. > > So it seems very hard to correct dotprod to a convert optab now. It's still the correct way to go. At _least_ your new pattern should have been this, otherwise what do you do when you have two-way, four-way and eight-way variants? Add yet another optab? Another thing is that when you do it your way you should fix the existing optab to be two-way by documenting how the second mode derives from the first. And sure, it's not the only optab suffering from this issue. Richard. > Tamar > > > > > Richard. > > > > > In order to minimize changes to the existing codebase, > > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > > > argument is added to its signature - `const_tree otype', allowing type > > > information to be specified for both input and output types. The > > > existing nterface is retained by defining a new `optab_for_tree_code', > > > which serves as a shim to `optab_for_tree_code_1', passing old > > > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > > > directly, passing it both types, adding the internal logic to the > > > function to distinguish between competing optabs. > > > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > > > ensure the new ic
RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> -Original Message- > From: Hongtao Liu > Sent: Friday, May 17, 2024 3:14 AM > To: Victor Do Nascimento > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > Richard Earnshaw ; Victor Do Nascimento > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > autovectorizer > > > > > > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and > > 2-way dot_prod operations can be generated > > > This is the link https://godbolt.org/z/hcWr64vx3, x86 define > udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod > instructions are generated > That's not the same, the 2-way vs 4-way dot_prod here is that e.g. udot_prod8hi can reduce to either DImode or SImode. udot_prod8hi does not have enough information to distinguish the two and in RTL you can't overload the names. So this is about the ISA having instructions that overlap on the source mode of the instruction. Tamar > > -- > BR, > Hongtao
RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> -Original Message- > From: Richard Biener > Sent: Friday, May 17, 2024 6:51 AM > To: Victor Do Nascimento > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; > Richard Earnshaw ; Victor Do Nascimento > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in > autovectorizer > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento > wrote: > > > > From: Victor Do Nascimento > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > optabs for dealing with vectorizable dot product code sequences. The > > consequence of using a direct optab for this is that backend-pattern > > selection is only ever able to match against one datatype - Either > > that of the operands or of the accumulated value, never both. > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > in AArch64 SVE2, the existing direct opcode approach is no longer > > sufficient for full specification of all the possible dot product > > machine instructions to be matched to the code sequence; a dot product > > resulting in VNx4SI may result from either dot products on VNx16QI or > > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > > > This means that the following example fails autovectorization: > > > > uint32_t foo(int n, uint16_t* data) { > > uint32_t sum = 0; > > for (int i=0; i > sum += data[i] * data[i]; > > } > > return sum; > > } > > > > To remedy the issue a new optab is added, tentatively named > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > of both input and output types involved in the operation. > > I don't like this too much. I'll note we document dot_prod as > > @cindex @code{sdot_prod@var{m}} instruction pattern > @item @samp{sdot_prod@var{m}} > > Compute the sum of the products of two signed elements. > Operand 1 and operand 2 are of the same mode. Their > product, which is of a wider mode, is computed and added to operand 3. > Operand 3 is of a mode equal or wider than the mode of the product. The > result is placed in operand 0, which is of the same mode as operand 3. > @var{m} is the mode of operand 1 and operand 2. > > with no restriction on the wider mode but we don't specify it which is > bad design. This should have been a convert optab with two modes > from the start - adding a _twoway variant is just a hack. We did discuss this at the time we started implementing it. There was two options, one was indeed to change it to a convert dot_prod optab, but doing this means we have to update every target that uses it. Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and altivec. Which sure could be possible, but there's also every use in the backends that need to be updated, and tested, which for some targets we don't even know how to begin. So it seems very hard to correct dotprod to a convert optab now. Tamar > > Richard. > > > In order to minimize changes to the existing codebase, > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > > argument is added to its signature - `const_tree otype', allowing type > > information to be specified for both input and output types. The > > existing nterface is retained by defining a new `optab_for_tree_code', > > which serves as a shim to `optab_for_tree_code_1', passing old > > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > > directly, passing it both types, adding the internal logic to the > > function to distinguish between competing optabs. > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > > ensure the new icode can be correctly selected, given the new optab. > > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve2.md > > (@aarch64_sve_dotvnx4sivnx8hi): > > renamed to `dot_prod_twoway_vnx8hi'. > > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > > update icodes used in line with above rename. > > * optabs-tree.cc (optab_for_tree_code_1): Renamed > > `optab_for_tree_code' and added new argument. > > (optab_for_tree_code): Now a call to `optab_for_tree_cod
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento wrote: > > From: Victor Do Nascimento > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > optabs for dealing with vectorizable dot product code sequences. The > consequence of using a direct optab for this is that backend-pattern > selection is only ever able to match against one datatype - Either > that of the operands or of the accumulated value, never both. > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > in AArch64 SVE2, the existing direct opcode approach is no longer > sufficient for full specification of all the possible dot product > machine instructions to be matched to the code sequence; a dot product > resulting in VNx4SI may result from either dot products on VNx16QI or > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > This means that the following example fails autovectorization: > > uint32_t foo(int n, uint16_t* data) { > uint32_t sum = 0; > for (int i=0; i sum += data[i] * data[i]; > } > return sum; > } > > To remedy the issue a new optab is added, tentatively named > `udot_prod_twoway_optab', whose selection is dependent upon checking > of both input and output types involved in the operation. I don't like this too much. I'll note we document dot_prod as @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} Compute the sum of the products of two signed elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. @var{m} is the mode of operand 1 and operand 2. with no restriction on the wider mode but we don't specify it which is bad design. This should have been a convert optab with two modes from the start - adding a _twoway variant is just a hack. Richard. > In order to minimize changes to the existing codebase, > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > argument is added to its signature - `const_tree otype', allowing type > information to be specified for both input and output types. The > existing nterface is retained by defining a new `optab_for_tree_code', > which serves as a shim to `optab_for_tree_code_1', passing old > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > directly, passing it both types, adding the internal logic to the > function to distinguish between competing optabs. > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > ensure the new icode can be correctly selected, given the new optab. > > [1] > https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > [2] > https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi): > renamed to `dot_prod_twoway_vnx8hi'. > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > update icodes used in line with above rename. > * optabs-tree.cc (optab_for_tree_code_1): Renamed > `optab_for_tree_code' and added new argument. > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > * optabs-tree.h (optab_for_tree_code_1): New. > * optabs.cc (expand_widen_pattern_expr): Expand support for > DOT_PROD_EXPR patterns. > * optabs.def (udot_prod_twoway_optab): New. > (sdot_prod_twoway_optab): Likewise. > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add > support for misc optabs that use two modes. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > gcc/config/aarch64/aarch64-sve2.md| 2 +- > gcc/optabs-tree.cc| 23 -- > gcc/optabs-tree.h | 2 ++ > gcc/optabs.cc | 2 +- > gcc/optabs.def| 2 ++ > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++ > gcc/tree-vect-patterns.cc | 2 +- > 8 files changed, 54 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 0d2edf3f19e..e457db09f66 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -764,8 +764,8 @@ public: >icode = (e.type_suffix (0).float_p >?
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> > > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and > 2-way dot_prod operations can be generated > This is the link https://godbolt.org/z/hcWr64vx3, x86 define udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod instructions are generated -- BR, Hongtao
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
On Thu, May 16, 2024 at 10:40 PM Victor Do Nascimento wrote: > > From: Victor Do Nascimento > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > optabs for dealing with vectorizable dot product code sequences. The > consequence of using a direct optab for this is that backend-pattern > selection is only ever able to match against one datatype - Either > that of the operands or of the accumulated value, never both. > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > in AArch64 SVE2, the existing direct opcode approach is no longer > sufficient for full specification of all the possible dot product > machine instructions to be matched to the code sequence; a dot product > resulting in VNx4SI may result from either dot products on VNx16QI or > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > This means that the following example fails autovectorization: > > uint32_t foo(int n, uint16_t* data) { > uint32_t sum = 0; > for (int i=0; i sum += data[i] * data[i]; > } > return sum; > } > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and 2-way dot_prod operations can be generated > To remedy the issue a new optab is added, tentatively named > `udot_prod_twoway_optab', whose selection is dependent upon checking > of both input and output types involved in the operation. > > In order to minimize changes to the existing codebase, > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > argument is added to its signature - `const_tree otype', allowing type > information to be specified for both input and output types. The > existing nterface is retained by defining a new `optab_for_tree_code', > which serves as a shim to `optab_for_tree_code_1', passing old > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > directly, passing it both types, adding the internal logic to the > function to distinguish between competing optabs. > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > ensure the new icode can be correctly selected, given the new optab. > > [1] > https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > [2] > https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi): > renamed to `dot_prod_twoway_vnx8hi'. > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > update icodes used in line with above rename. > * optabs-tree.cc (optab_for_tree_code_1): Renamed > `optab_for_tree_code' and added new argument. > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > * optabs-tree.h (optab_for_tree_code_1): New. > * optabs.cc (expand_widen_pattern_expr): Expand support for > DOT_PROD_EXPR patterns. > * optabs.def (udot_prod_twoway_optab): New. > (sdot_prod_twoway_optab): Likewise. > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add > support for misc optabs that use two modes. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > gcc/config/aarch64/aarch64-sve2.md| 2 +- > gcc/optabs-tree.cc| 23 -- > gcc/optabs-tree.h | 2 ++ > gcc/optabs.cc | 2 +- > gcc/optabs.def| 2 ++ > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++ > gcc/tree-vect-patterns.cc | 2 +- > 8 files changed, 54 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 0d2edf3f19e..e457db09f66 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -764,8 +764,8 @@ public: >icode = (e.type_suffix (0).float_p >? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf >: e.type_suffix (0).unsigned_p > - ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi > - : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi); > + ? CODE_FOR_udot_prod_twoway_vnx8hi > + : CODE_FOR_sdot_prod_twoway_vnx8hi); > return e.use_unpred_insn (icode); >} > }; > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index 934e57055d3..5677de7108d 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -2021,7 +2021,7 @@
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
On Thu, May 16, 2024, 7:46 PM Tamar Christina wrote: > Hi Victor, > > > -Original Message- > > From: Victor Do Nascimento > > Sent: Thursday, May 16, 2024 3:39 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Sandiford ; Richard Earnshaw > > ; Victor Do Nascimento > > > > Subject: [PATCH] middle-end: Expand {u|s}dot product support in > autovectorizer > > > > From: Victor Do Nascimento > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > > optabs for dealing with vectorizable dot product code sequences. The > > consequence of using a direct optab for this is that backend-pattern > > selection is only ever able to match against one datatype - Either > > that of the operands or of the accumulated value, never both. > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > > in AArch64 SVE2, the existing direct opcode approach is no longer > > sufficient for full specification of all the possible dot product > > machine instructions to be matched to the code sequence; a dot product > > resulting in VNx4SI may result from either dot products on VNx16QI or > > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > > > This means that the following example fails autovectorization: > > > > uint32_t foo(int n, uint16_t* data) { > > uint32_t sum = 0; > > for (int i=0; i > sum += data[i] * data[i]; > > } > > return sum; > > } > > > > To remedy the issue a new optab is added, tentatively named > > `udot_prod_twoway_optab', whose selection is dependent upon checking > > of both input and output types involved in the operation. > > > > In order to minimize changes to the existing codebase, > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > > argument is added to its signature - `const_tree otype', allowing type > > information to be specified for both input and output types. The > > existing nterface is retained by defining a new `optab_for_tree_code', > > which serves as a shim to `optab_for_tree_code_1', passing old > > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > > directly, passing it both types, adding the internal logic to the > > function to distinguish between competing optabs. > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > > ensure the new icode can be correctly selected, given the new optab. > > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve2.md > > (@aarch64_sve_dotvnx4sivnx8hi): > > renamed to `dot_prod_twoway_vnx8hi'. > > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > > update icodes used in line with above rename. > > Please split the target specific bits from the target agnostic parts. > I.e. this patch series should be split in two. > > > * optabs-tree.cc (optab_for_tree_code_1): Renamed > > `optab_for_tree_code' and added new argument. > > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > > * optabs-tree.h (optab_for_tree_code_1): New. > > * optabs.cc (expand_widen_pattern_expr): Expand support for > > DOT_PROD_EXPR patterns. > > * optabs.def (udot_prod_twoway_optab): New. > > (sdot_prod_twoway_optab): Likewise. > > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add > > support for misc optabs that use two modes. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > > --- > > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > > gcc/config/aarch64/aarch64-sve2.md| 2 +- > > gcc/optabs-tree.cc| 23 -- > > gcc/optabs-tree.h | 2 ++ > > gcc/optabs.cc | 2 +- > > gcc/optabs.def| 2 ++ > > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++ > > gcc/tree-vect-patterns.cc | 2 +- > > 8 files changed, 54 insertions(+), 7 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > index 0d2edf3f19e..e457db09f66 100644 > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > @@ -764,8 +764,8 @@ public: > >icode = (e.type_suffix (0).float_p > > ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf > > : e.type_suffix (0).unsigned_p > > -?
RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
Hi Victor, > -Original Message- > From: Victor Do Nascimento > Sent: Thursday, May 16, 2024 3:39 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Richard Earnshaw > ; Victor Do Nascimento > > Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer > > From: Victor Do Nascimento > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > optabs for dealing with vectorizable dot product code sequences. The > consequence of using a direct optab for this is that backend-pattern > selection is only ever able to match against one datatype - Either > that of the operands or of the accumulated value, never both. > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > in AArch64 SVE2, the existing direct opcode approach is no longer > sufficient for full specification of all the possible dot product > machine instructions to be matched to the code sequence; a dot product > resulting in VNx4SI may result from either dot products on VNx16QI or > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > This means that the following example fails autovectorization: > > uint32_t foo(int n, uint16_t* data) { > uint32_t sum = 0; > for (int i=0; i sum += data[i] * data[i]; > } > return sum; > } > > To remedy the issue a new optab is added, tentatively named > `udot_prod_twoway_optab', whose selection is dependent upon checking > of both input and output types involved in the operation. > > In order to minimize changes to the existing codebase, > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > argument is added to its signature - `const_tree otype', allowing type > information to be specified for both input and output types. The > existing nterface is retained by defining a new `optab_for_tree_code', > which serves as a shim to `optab_for_tree_code_1', passing old > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > directly, passing it both types, adding the internal logic to the > function to distinguish between competing optabs. > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > ensure the new icode can be correctly selected, given the new optab. > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE- > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md > (@aarch64_sve_dotvnx4sivnx8hi): > renamed to `dot_prod_twoway_vnx8hi'. > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > update icodes used in line with above rename. Please split the target specific bits from the target agnostic parts. I.e. this patch series should be split in two. > * optabs-tree.cc (optab_for_tree_code_1): Renamed > `optab_for_tree_code' and added new argument. > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > * optabs-tree.h (optab_for_tree_code_1): New. > * optabs.cc (expand_widen_pattern_expr): Expand support for > DOT_PROD_EXPR patterns. > * optabs.def (udot_prod_twoway_optab): New. > (sdot_prod_twoway_optab): Likewise. > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add > support for misc optabs that use two modes. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > gcc/config/aarch64/aarch64-sve2.md| 2 +- > gcc/optabs-tree.cc| 23 -- > gcc/optabs-tree.h | 2 ++ > gcc/optabs.cc | 2 +- > gcc/optabs.def| 2 ++ > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++ > gcc/tree-vect-patterns.cc | 2 +- > 8 files changed, 54 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 0d2edf3f19e..e457db09f66 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -764,8 +764,8 @@ public: >icode = (e.type_suffix (0).float_p > ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf > : e.type_suffix (0).unsigned_p > -? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi > -: CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi); > +? CODE_FOR_udot_prod_twoway_vnx8hi > +: CODE_FOR_sdot_prod_twoway_vnx8hi); > return e.use_unpred_insn (icode); >} > }; > diff --git
Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento < victor.donascime...@arm.com> wrote: > From: Victor Do Nascimento > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct > optabs for dealing with vectorizable dot product code sequences. The > consequence of using a direct optab for this is that backend-pattern > selection is only ever able to match against one datatype - Either > that of the operands or of the accumulated value, never both. > > With the introduction of the 2-way (un)signed dot-product insn [1][2] > in AArch64 SVE2, the existing direct opcode approach is no longer > sufficient for full specification of all the possible dot product > machine instructions to be matched to the code sequence; a dot product > resulting in VNx4SI may result from either dot products on VNx16QI or > VNx8HI values for the 4- and 2-way dot product operations, respectively. > > This means that the following example fails autovectorization: > > uint32_t foo(int n, uint16_t* data) { > uint32_t sum = 0; > for (int i=0; i sum += data[i] * data[i]; > } > return sum; > } > > To remedy the issue a new optab is added, tentatively named > `udot_prod_twoway_optab', whose selection is dependent upon checking > of both input and output types involved in the operation. > > In order to minimize changes to the existing codebase, > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new > argument is added to its signature - `const_tree otype', allowing type > information to be specified for both input and output types. The > existing nterface is retained by defining a new `optab_for_tree_code', > which serves as a shim to `optab_for_tree_code_1', passing old > parameters as-is and setting the new `optype' argument to `NULL_TREE'. > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1' > directly, passing it both types, adding the internal logic to the > function to distinguish between competing optabs. > > Finally, necessary changes are made to `expand_widen_pattern_expr' to > ensure the new icode can be correctly selected, given the new optab. > Since you are adding an optab, please update the internals manual with the documentation of the optab (the standard pattern names section). Thanks, Andrew > [1] > https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product- > [2] > https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product- > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md > (@aarch64_sve_dotvnx4sivnx8hi): > renamed to `dot_prod_twoway_vnx8hi'. > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand): > update icodes used in line with above rename. > * optabs-tree.cc (optab_for_tree_code_1): Renamed > `optab_for_tree_code' and added new argument. > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'. > * optabs-tree.h (optab_for_tree_code_1): New. > * optabs.cc (expand_widen_pattern_expr): Expand support for > DOT_PROD_EXPR patterns. > * optabs.def (udot_prod_twoway_optab): New. > (sdot_prod_twoway_optab): Likewise. > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add > support for misc optabs that use two modes. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 4 ++-- > gcc/config/aarch64/aarch64-sve2.md| 2 +- > gcc/optabs-tree.cc| 23 -- > gcc/optabs-tree.h | 2 ++ > gcc/optabs.cc | 2 +- > gcc/optabs.def| 2 ++ > .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++ > gcc/tree-vect-patterns.cc | 2 +- > 8 files changed, 54 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 0d2edf3f19e..e457db09f66 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -764,8 +764,8 @@ public: >icode = (e.type_suffix (0).float_p >? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf >: e.type_suffix (0).unsigned_p > - ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi > - : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi); > + ? CODE_FOR_udot_prod_twoway_vnx8hi > + : CODE_FOR_sdot_prod_twoway_vnx8hi); > return e.use_unpred_insn (icode); >} > }; > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index 934e57055d3..5677de7108d 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md >