Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
Richard Sandiford writes: > Jakub Jelinek writes: >> On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote: >>> > +(define_split >>> > + [(set (match_operand:GPI 0 "register_operand") >>> > + (LOGICAL:GPI >>> > + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") >>> > +(match_operand:QI 2 "aarch64_shift_imm_")) >>> > +(match_operand:GPI 4 "const_int_operand")) >>> > + (zero_extend:GPI (match_operand 3 "register_operand"] >>> > + "can_create_pseudo_p () >>> > + && REG_P (operands[1]) >>> > + && REG_P (operands[3]) >>> > + && REGNO (operands[1]) == REGNO (operands[3]) >>> > + && ((unsigned HOST_WIDE_INT) >>> > + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >>> > +<< INTVAL (operands[2]), mode) >>> > + == UINTVAL (operands[4]))" >>> >>> IMO this would be easier to understand as: >>> >>>&& (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >>><< INTVAL (operands[2]), mode) >>>== INTVAL (operands[4])) >>> >>> (At first I thought the cast and UINTVAL were trying to escape the >>> sign-extension canonicalisation.) >> >> It is ok to write it that way, you're right, I wrote it with >> UINTVAL etc. because I initially didn't use trunc_int_for_mode >> but that is wrong for SImode if the mask is shifted into bit 31. >> >>> I'm not sure about this one though. The REGNO checks mean that this is >>> effectively for hard registers only. I thought one of the reasons for >>> make_more_copies was to avoid combining hard registers like this, so I'm >>> not sure we should have a pattern that specifically targets them. >>> >>> Segher, have I misunderstood? >> >> Yes, this one works only with the hard regs, the problem is that when >> the hard regs are there, combiner doesn't try anything else, so without >> such splitter it punts on that. >> If I add yet another testcase which doesn't have hard registers, like: >> unsigned >> or_shift2 (void) >> { >> unsigned char i = 0; >> asm volatile ("" : "+r" (i)); >> return i | (i << 11); >> } >> then my patch doesn't handle that case, and the only splitter that would >> help would need to deal with: >> (set (reg/i:SI 0 x0) >> (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0) >> (const_int 11 [0xb])) >> (const_int 522240 [0x7f800])) >> (zero_extend:SI (reg:QI 97 [ i ] >> I have added another combine splitter for this below. But as you can >> see, what combiner simplification comes with isn't really consistent >> and orthogonal, different operations in there look quite differently :(. > > Hmm, OK. Still, the above looks reasonable on first principles. > >>> These two look good to me apart from the cast nit. The last one feels >>> like it's more general than just sign_extends though. I guess it would >>> work for any duplicated operation that can be performed in a single >>> instruction. >> >> True, but only very small portion of them can actually make it through, >> it needs something that combine has been able to propagate into another >> instruction. So if we know about other insns that would look the same >> and would actually be ever matched, we can e.g. define an operator predicate >> for it, but until we have testcases for that, not sure it is worth it. >> >> Here is an updated patch that handles also the zero extends without hard >> registers and doesn't have the UHWI casts (but untested for now except >> for the testcase): >> >> 2021-04-14 Jakub Jelinek >> >> PR target/100056 >> * config/aarch64/aarch64.md (*_3): >> Add combine splitters for *_ashl3 with >> ZERO_EXTEND, SIGN_EXTEND or AND. >> >> * gcc.target/aarch64/pr100056.c: New test. >> >> --- gcc/config/aarch64/aarch64.md.jj 2021-04-13 20:41:45.030040848 +0200 >> +++ gcc/config/aarch64/aarch64.md2021-04-14 19:07:41.641623978 +0200 >> @@ -4431,6 +4431,75 @@ (define_insn "*_>[(set_attr "type" "logic_shift_imm")] >> ) >> >> +(define_split >> + [(set (match_operand:GPI 0 "register_operand") >> +(LOGICAL:GPI >> + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") >> + (match_operand:QI 2 "aarch64_shift_imm_")) >> + (match_operand:GPI 4 "const_int_operand")) >> + (zero_extend:GPI (match_operand 3 "register_operand"] >> + "can_create_pseudo_p () >> + && REG_P (operands[1]) >> + && REG_P (operands[3]) >> + && REGNO (operands[1]) == REGNO (operands[3]) >> + && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >> + << INTVAL (operands[2]), mode) >> + == INTVAL (operands[4]))" >> + [(set (match_dup 4) (zero_extend:GPI (match_dup 3))) >> + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) >> + (match_dup 4)))] >> + "operands[4] = gen_reg_rtx (mode);" >> +) >> + >> +(define_split >> + [(set
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 02:47:59PM -0500, Segher Boessenkool wrote: > On Wed, Apr 14, 2021 at 09:45:35PM +0200, Jakub Jelinek wrote: > > On Wed, Apr 14, 2021 at 02:42:54PM -0500, Segher Boessenkool wrote: > > > > provably doesn't (that is from the splitter I wrote for the non-hard > > > > regs), > > > > nor > > > > [(set (match_operand:GPI 0 "register_operand") > > > > (LOGICAL:GPI > > > > (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > > > >(match_operand:QI 2 > > > > "aarch64_shift_imm_")) > > > >(match_operand:GPI 3 "const_int_operand")) > > > > (zero_extend:GPI (subreg (match_dup 1) 0] > > > > works (and it is unclear how I'd find out the mode of the subreg even > > > > if it > > > > worked). > > > > > > Just > > > (subreg:QI (match_dup 1) 0) > > > should work? > > > > That doesn't work either. > > Why not? What goes wrong with that? It just doesn't match and therefore doesn't split it. Jakub
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 09:45:35PM +0200, Jakub Jelinek wrote: > On Wed, Apr 14, 2021 at 02:42:54PM -0500, Segher Boessenkool wrote: > > > provably doesn't (that is from the splitter I wrote for the non-hard > > > regs), > > > nor > > > [(set (match_operand:GPI 0 "register_operand") > > > (LOGICAL:GPI > > > (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > > >(match_operand:QI 2 > > > "aarch64_shift_imm_")) > > >(match_operand:GPI 3 "const_int_operand")) > > > (zero_extend:GPI (subreg (match_dup 1) 0] > > > works (and it is unclear how I'd find out the mode of the subreg even if > > > it > > > worked). > > > > Just > > (subreg:QI (match_dup 1) 0) > > should work? > > That doesn't work either. Why not? What goes wrong with that? Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 09:25:46PM +0200, Jakub Jelinek wrote: > On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote: > > > and I must say I don't know if make_more_copies was meant to > > > split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and > > > (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo))) > > > or not. > > > > It makes > > > > (set (reg:QI new) (reg:QI x0)) > > (set (reg:SI 96) (zero_extend:SI (reg:QI new))) > > > > The point is it keeps exactly the same form, but no hard regs anymore. > > It doesn't, as make_more_copies does: > rtx dest = SET_DEST (set); > if (!(REG_P (dest) && !HARD_REGISTER_P (dest))) > continue; > > rtx src = SET_SRC (set); > if (!(REG_P (src) && HARD_REGISTER_P (src))) > continue; > but in this case the hard reg is wrapped into the zero_extend already > and so it will continue; Ah, I see. That could/should be improved then. But, GCC 12 :-) Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 02:42:54PM -0500, Segher Boessenkool wrote: > > provably doesn't (that is from the splitter I wrote for the non-hard regs), > > nor > > [(set (match_operand:GPI 0 "register_operand") > > (LOGICAL:GPI > > (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > >(match_operand:QI 2 > > "aarch64_shift_imm_")) > >(match_operand:GPI 3 "const_int_operand")) > > (zero_extend:GPI (subreg (match_dup 1) 0] > > works (and it is unclear how I'd find out the mode of the subreg even if it > > worked). > > Just > (subreg:QI (match_dup 1) 0) > should work? That doesn't work either. Jakub
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 09:20:35PM +0200, Jakub Jelinek wrote: > The question is what to write in the splitter pattern so that > they would match. > [(set (match_operand:GPI 0 "register_operand") > (LOGICAL:GPI > (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator" > [(match_operand 1 "register_operand")]) >(match_operand:QI 2 > "aarch64_shift_imm_")) >(match_operand:GPI 3 "const_int_operand")) > (zero_extend:GPI (match_dup 1] > provably doesn't (that is from the splitter I wrote for the non-hard regs), > nor > [(set (match_operand:GPI 0 "register_operand") > (LOGICAL:GPI > (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") >(match_operand:QI 2 > "aarch64_shift_imm_")) >(match_operand:GPI 3 "const_int_operand")) > (zero_extend:GPI (subreg (match_dup 1) 0] > works (and it is unclear how I'd find out the mode of the subreg even if it > worked). What seems to work and handles both the pseudos (in which case there is a (subreg:GPI (reg:whatever xyz) 0) and (reg:whatever xyz) in the operands) and the hard registers (in which case there is (reg:GPI xyz) and (reg:whatever xyz) in the operands) is: (define_split [(set (match_operand:GPI 0 "register_operand") (LOGICAL:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") (match_operand:QI 2 "aarch64_shift_imm_")) (match_operand:GPI 3 "const_int_operand")) (zero_extend:GPI (match_operand 4 "register_operand"] "can_create_pseudo_p () && ((paradoxical_subreg_p (operands[1]) && rtx_equal_p (SUBREG_REG (operands[1]), operands[4])) || (REG_P (operands[1]) && REG_P (operands[4]) && REGNO (operands[1]) == REGNO (operands[4]))) && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[4])) << INTVAL (operands[2]), mode) == INTVAL (operands[3]))" [(set (match_dup 5) (zero_extend:GPI (match_dup 4))) (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 5) (match_dup 2)) (match_dup 5)))] "operands[5] = gen_reg_rtx (mode);" ) While it is one pattern, it needs different handling for the two cases. Is that acceptable? Jakub
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 09:20:33PM +0200, Jakub Jelinek wrote: > On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote: > > A subreg:QI of the match_dup should match fine. You can use a subreg > > wherever GCC tries to match a reg. > > > > > match_dup means insn-recog.c calls rtx_equal_p and that returns false if > > > the > > > mode is not the same. > > > > Yes, but they are the same :-) > > In the end sure. > > > > (reg:SI whatever) and (subreg:QI (reg:SI whatever) 0) > > The question is what to write in the splitter pattern so that > they would match. > [(set (match_operand:GPI 0 "register_operand") > (LOGICAL:GPI > (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator" > [(match_operand 1 "register_operand")]) >(match_operand:QI 2 > "aarch64_shift_imm_")) >(match_operand:GPI 3 "const_int_operand")) > (zero_extend:GPI (match_dup 1] > provably doesn't (that is from the splitter I wrote for the non-hard regs), > nor > [(set (match_operand:GPI 0 "register_operand") > (LOGICAL:GPI > (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") >(match_operand:QI 2 > "aarch64_shift_imm_")) >(match_operand:GPI 3 "const_int_operand")) > (zero_extend:GPI (subreg (match_dup 1) 0] > works (and it is unclear how I'd find out the mode of the subreg even if it > worked). Just (subreg:QI (match_dup 1) 0) should work? Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote: > > and I must say I don't know if make_more_copies was meant to > > split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and > > (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo))) > > or not. > > It makes > > (set (reg:QI new) (reg:QI x0)) > (set (reg:SI 96) (zero_extend:SI (reg:QI new))) > > The point is it keeps exactly the same form, but no hard regs anymore. It doesn't, as make_more_copies does: rtx dest = SET_DEST (set); if (!(REG_P (dest) && !HARD_REGISTER_P (dest))) continue; rtx src = SET_SRC (set); if (!(REG_P (src) && HARD_REGISTER_P (src))) continue; but in this case the hard reg is wrapped into the zero_extend already and so it will continue; Jakub
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote: > A subreg:QI of the match_dup should match fine. You can use a subreg > wherever GCC tries to match a reg. > > > match_dup means insn-recog.c calls rtx_equal_p and that returns false if the > > mode is not the same. > > Yes, but they are the same :-) In the end sure. > > (reg:SI whatever) and (subreg:QI (reg:SI whatever) 0) The question is what to write in the splitter pattern so that they would match. [(set (match_operand:GPI 0 "register_operand") (LOGICAL:GPI (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator" [(match_operand 1 "register_operand")]) (match_operand:QI 2 "aarch64_shift_imm_")) (match_operand:GPI 3 "const_int_operand")) (zero_extend:GPI (match_dup 1] provably doesn't (that is from the splitter I wrote for the non-hard regs), nor [(set (match_operand:GPI 0 "register_operand") (LOGICAL:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") (match_operand:QI 2 "aarch64_shift_imm_")) (match_operand:GPI 3 "const_int_operand")) (zero_extend:GPI (subreg (match_dup 1) 0] works (and it is unclear how I'd find out the mode of the subreg even if it worked). Jakub
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 08:01:11PM +0200, Jakub Jelinek wrote: > On Wed, Apr 14, 2021 at 12:46:43PM -0500, Segher Boessenkool wrote: > > The REGNO checks work fine for pseudos as well. But, why does it do > > this at all, instead of using match_dup? That should be clearer. > > Because with the hard regs it has different modes, so match_dup > wouldn't work. A subreg:QI of the match_dup should match fine. You can use a subreg wherever GCC tries to match a reg. > match_dup means insn-recog.c calls rtx_equal_p and that returns false if the > mode is not the same. Yes, but they are the same :-) (reg:SI whatever) and (subreg:QI (reg:SI whatever) 0) > Before combine the 3 insns are: > (insn 2 4 3 2 (set (reg/v:SI 96 [ i ]) > (zero_extend:SI (reg:QI 0 x0 [ i ]))) "pr100056.c":10:1 114 > {*zero_extendqisi2_aarch64} > (expr_list:REG_DEAD (reg:QI 0 x0 [ i ]) > (nil))) > and I must say I don't know if make_more_copies was meant to > split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and > (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo))) > or not. It makes (set (reg:QI new) (reg:QI x0)) (set (reg:SI 96) (zero_extend:SI (reg:QI new))) The point is it keeps exactly the same form, but no hard regs anymore. Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 06:55:56PM +0100, Richard Sandiford wrote: > Segher Boessenkool writes: > > The REGNO checks work fine for pseudos as well. But, why does it do > > this at all, instead of using match_dup? That should be clearer. > > The register is appearing in two different modes: GPI for operand 1 > and something smaller than GPI for operand 3 (otherwise the extension > would be ill-formed). That's what makes it specific to hard registers. You write your patterns with subregs for that? That works fine for a mode change on hard regs as well, afaik. > > The point of make_more_copies is that the hard registers from function > > arguments are not pushed down by combine into actual instructions. This > > can be done by RA if it thinks that is a good idea, and not done if it > > thinks it is a bad idea. Having combine usurp part of the register > > allocators role is not a good idea. > > > > There are other reasons hard regs can still end up in RTL insns in > > earlier RTL passes of course, but the other changes that went together > > with make_more_copies stop combine from doing that a lot (the function > > itself makes sure every hard reg is copied to a new pseudo, because > > combining that trivial move (from that new pseudo to the pseudo it was > > copying it to already!) can still be beneficial for other reasons, all > > strange and pretty unhappy, but important on many targets). > > What would your recommendation be for this pattern? Is matching > hard registers a bad idea, or should we go with it? I would just use a subreg, and match_dup. But maybe I am missing something? Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 12:46:43PM -0500, Segher Boessenkool wrote: > The REGNO checks work fine for pseudos as well. But, why does it do > this at all, instead of using match_dup? That should be clearer. Because with the hard regs it has different modes, so match_dup wouldn't work. We are talking here about: Trying 7, 2 -> 8: 7: r98:SI=x0:SI<<0xb&0x7f800 REG_DEAD x0:QI 2: r96:SI=zero_extend(x0:QI) 8: r97:SI=r98:SI|r96:SI REG_DEAD r98:SI REG_DEAD r96:SI Failed to match this instruction: (set (reg:SI 97) (ior:SI (and:SI (ashift:SI (reg:SI 0 x0 [ i ]) (const_int 11 [0xb])) (const_int 522240 [0x7f800])) (zero_extend:SI (reg:QI 0 x0 [ i ] Failed to match this instruction: (set (reg:SI 97) (ior:SI (and:SI (ashift:SI (reg:SI 0 x0 [ i ]) (const_int 11 [0xb])) (const_int 522240 [0x7f800])) (and:SI (reg:SI 0 x0) (const_int 255 [0xff] Splitting with gen_split_28 (aarch64.md:4434) Successfully matched this instruction: (set (reg:SI 99) (zero_extend:SI (reg:QI 0 x0 [ i ]))) Successfully matched this instruction: (set (reg:SI 97) (ior:SI (ashift:SI (reg:SI 99) (const_int 11 [0xb])) (reg:SI 99))) match_dup means insn-recog.c calls rtx_equal_p and that returns false if the mode is not the same. Before combine the 3 insns are: (insn 2 4 3 2 (set (reg/v:SI 96 [ i ]) (zero_extend:SI (reg:QI 0 x0 [ i ]))) "pr100056.c":10:1 114 {*zero_extendqisi2_aarch64} (expr_list:REG_DEAD (reg:QI 0 x0 [ i ]) (nil))) (note 3 2 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 3 8 2 (set (reg:SI 98) (ashift:SI (reg/v:SI 96 [ i ]) (const_int 11 [0xb]))) "pr100056.c":11:17 691 {*aarch64_ashl_sisd_or_int_si3} (nil)) (insn 8 7 13 2 (set (reg:SI 97) (ior:SI (reg:SI 98) (reg/v:SI 96 [ i ]))) "pr100056.c":11:12 488 {iorsi3} (expr_list:REG_DEAD (reg:SI 98) (expr_list:REG_DEAD (reg/v:SI 96 [ i ]) (nil and I must say I don't know if make_more_copies was meant to split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo))) or not. > The point of make_more_copies is that the hard registers from function > arguments are not pushed down by combine into actual instructions. This > can be done by RA if it thinks that is a good idea, and not done if it > thinks it is a bad idea. Having combine usurp part of the register > allocators role is not a good idea. > > There are other reasons hard regs can still end up in RTL insns in > earlier RTL passes of course, but the other changes that went together > with make_more_copies stop combine from doing that a lot (the function > itself makes sure every hard reg is copied to a new pseudo, because > combining that trivial move (from that new pseudo to the pseudo it was > copying it to already!) can still be beneficial for other reasons, all > strange and pretty unhappy, but important on many targets). Jakub
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
Segher Boessenkool writes: > On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote: >> Jakub Jelinek writes: >> > +(define_split >> > + [(set (match_operand:GPI 0 "register_operand") >> > + (LOGICAL:GPI >> > +(and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") >> > + (match_operand:QI 2 "aarch64_shift_imm_")) >> > + (match_operand:GPI 4 "const_int_operand")) >> > +(zero_extend:GPI (match_operand 3 "register_operand"] >> > + "can_create_pseudo_p () >> > + && REG_P (operands[1]) >> > + && REG_P (operands[3]) >> > + && REGNO (operands[1]) == REGNO (operands[3]) >> > + && ((unsigned HOST_WIDE_INT) >> > + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >> > + << INTVAL (operands[2]), mode) >> > + == UINTVAL (operands[4]))" >> >> IMO this would be easier to understand as: >> >>&& (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >> << INTVAL (operands[2]), mode) >>== INTVAL (operands[4])) >> >> (At first I thought the cast and UINTVAL were trying to escape the >> sign-extension canonicalisation.) > > Yeah. Or just that with UINTVAL, the implicit conversion is fine (but > maybe that warns?) But INTVAL is simpler for sure. > >> I'm not sure about this one though. The REGNO checks mean that this is >> effectively for hard registers only. I thought one of the reasons for >> make_more_copies was to avoid combining hard registers like this, so I'm >> not sure we should have a pattern that specifically targets them. >> >> Segher, have I misunderstood? > > The REGNO checks work fine for pseudos as well. But, why does it do > this at all, instead of using match_dup? That should be clearer. The register is appearing in two different modes: GPI for operand 1 and something smaller than GPI for operand 3 (otherwise the extension would be ill-formed). That's what makes it specific to hard registers. > The point of make_more_copies is that the hard registers from function > arguments are not pushed down by combine into actual instructions. This > can be done by RA if it thinks that is a good idea, and not done if it > thinks it is a bad idea. Having combine usurp part of the register > allocators role is not a good idea. > > There are other reasons hard regs can still end up in RTL insns in > earlier RTL passes of course, but the other changes that went together > with make_more_copies stop combine from doing that a lot (the function > itself makes sure every hard reg is copied to a new pseudo, because > combining that trivial move (from that new pseudo to the pseudo it was > copying it to already!) can still be beneficial for other reasons, all > strange and pretty unhappy, but important on many targets). What would your recommendation be for this pattern? Is matching hard registers a bad idea, or should we go with it? Thanks, Richard
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 06:42:33PM +0100, Richard Sandiford wrote: > Otherwise this looks good apart from the open question about whether > we should be doing complex combinations involving hard regs. Let's see > what Segher says about that side. It works find with pseudos as well, but then you need to use the same pseudo twice in here. Before RA you cannot know this will be the same hard reg (and it is a bad idea in general to force that). Disallowing forwarding hard regs (from function args) in combine helps quite a bit on average, but there are cases like this one where the balance swings the other way :-( Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
Hi! On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote: > Jakub Jelinek writes: > > +(define_split > > + [(set (match_operand:GPI 0 "register_operand") > > + (LOGICAL:GPI > > + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > > + (match_operand:QI 2 "aarch64_shift_imm_")) > > + (match_operand:GPI 4 "const_int_operand")) > > + (zero_extend:GPI (match_operand 3 "register_operand"] > > + "can_create_pseudo_p () > > + && REG_P (operands[1]) > > + && REG_P (operands[3]) > > + && REGNO (operands[1]) == REGNO (operands[3]) > > + && ((unsigned HOST_WIDE_INT) > > + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) > > + << INTVAL (operands[2]), mode) > > + == UINTVAL (operands[4]))" > > IMO this would be easier to understand as: > >&& (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) > << INTVAL (operands[2]), mode) >== INTVAL (operands[4])) > > (At first I thought the cast and UINTVAL were trying to escape the > sign-extension canonicalisation.) Yeah. Or just that with UINTVAL, the implicit conversion is fine (but maybe that warns?) But INTVAL is simpler for sure. > I'm not sure about this one though. The REGNO checks mean that this is > effectively for hard registers only. I thought one of the reasons for > make_more_copies was to avoid combining hard registers like this, so I'm > not sure we should have a pattern that specifically targets them. > > Segher, have I misunderstood? The REGNO checks work fine for pseudos as well. But, why does it do this at all, instead of using match_dup? That should be clearer. The point of make_more_copies is that the hard registers from function arguments are not pushed down by combine into actual instructions. This can be done by RA if it thinks that is a good idea, and not done if it thinks it is a bad idea. Having combine usurp part of the register allocators role is not a good idea. There are other reasons hard regs can still end up in RTL insns in earlier RTL passes of course, but the other changes that went together with make_more_copies stop combine from doing that a lot (the function itself makes sure every hard reg is copied to a new pseudo, because combining that trivial move (from that new pseudo to the pseudo it was copying it to already!) can still be beneficial for other reasons, all strange and pretty unhappy, but important on many targets). Segher
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
Jakub Jelinek writes: > On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote: >> > +(define_split >> > + [(set (match_operand:GPI 0 "register_operand") >> > + (LOGICAL:GPI >> > +(and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") >> > + (match_operand:QI 2 "aarch64_shift_imm_")) >> > + (match_operand:GPI 4 "const_int_operand")) >> > +(zero_extend:GPI (match_operand 3 "register_operand"] >> > + "can_create_pseudo_p () >> > + && REG_P (operands[1]) >> > + && REG_P (operands[3]) >> > + && REGNO (operands[1]) == REGNO (operands[3]) >> > + && ((unsigned HOST_WIDE_INT) >> > + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >> > + << INTVAL (operands[2]), mode) >> > + == UINTVAL (operands[4]))" >> >> IMO this would be easier to understand as: >> >>&& (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) >> << INTVAL (operands[2]), mode) >>== INTVAL (operands[4])) >> >> (At first I thought the cast and UINTVAL were trying to escape the >> sign-extension canonicalisation.) > > It is ok to write it that way, you're right, I wrote it with > UINTVAL etc. because I initially didn't use trunc_int_for_mode > but that is wrong for SImode if the mask is shifted into bit 31. > >> I'm not sure about this one though. The REGNO checks mean that this is >> effectively for hard registers only. I thought one of the reasons for >> make_more_copies was to avoid combining hard registers like this, so I'm >> not sure we should have a pattern that specifically targets them. >> >> Segher, have I misunderstood? > > Yes, this one works only with the hard regs, the problem is that when > the hard regs are there, combiner doesn't try anything else, so without > such splitter it punts on that. > If I add yet another testcase which doesn't have hard registers, like: > unsigned > or_shift2 (void) > { > unsigned char i = 0; > asm volatile ("" : "+r" (i)); > return i | (i << 11); > } > then my patch doesn't handle that case, and the only splitter that would > help would need to deal with: > (set (reg/i:SI 0 x0) > (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0) > (const_int 11 [0xb])) > (const_int 522240 [0x7f800])) > (zero_extend:SI (reg:QI 97 [ i ] > I have added another combine splitter for this below. But as you can > see, what combiner simplification comes with isn't really consistent > and orthogonal, different operations in there look quite differently :(. Hmm, OK. Still, the above looks reasonable on first principles. >> These two look good to me apart from the cast nit. The last one feels >> like it's more general than just sign_extends though. I guess it would >> work for any duplicated operation that can be performed in a single >> instruction. > > True, but only very small portion of them can actually make it through, > it needs something that combine has been able to propagate into another > instruction. So if we know about other insns that would look the same > and would actually be ever matched, we can e.g. define an operator predicate > for it, but until we have testcases for that, not sure it is worth it. > > Here is an updated patch that handles also the zero extends without hard > registers and doesn't have the UHWI casts (but untested for now except > for the testcase): > > 2021-04-14 Jakub Jelinek > > PR target/100056 > * config/aarch64/aarch64.md (*_3): > Add combine splitters for *_ashl3 with > ZERO_EXTEND, SIGN_EXTEND or AND. > > * gcc.target/aarch64/pr100056.c: New test. > > --- gcc/config/aarch64/aarch64.md.jj 2021-04-13 20:41:45.030040848 +0200 > +++ gcc/config/aarch64/aarch64.md 2021-04-14 19:07:41.641623978 +0200 > @@ -4431,6 +4431,75 @@ (define_insn "*_[(set_attr "type" "logic_shift_imm")] > ) > > +(define_split > + [(set (match_operand:GPI 0 "register_operand") > + (LOGICAL:GPI > + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > +(match_operand:QI 2 "aarch64_shift_imm_")) > +(match_operand:GPI 4 "const_int_operand")) > + (zero_extend:GPI (match_operand 3 "register_operand"] > + "can_create_pseudo_p () > + && REG_P (operands[1]) > + && REG_P (operands[3]) > + && REGNO (operands[1]) == REGNO (operands[3]) > + && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) > +<< INTVAL (operands[2]), mode) > + == INTVAL (operands[4]))" > + [(set (match_dup 4) (zero_extend:GPI (match_dup 3))) > + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) > +(match_dup 4)))] > + "operands[4] = gen_reg_rtx (mode);" > +) > + > +(define_split > + [(set (match_operand:GPI 0 "register_operand") > + (LOGICAL:GPI > + (and:GPI (ashift:GPI (match_operator:GPI 4
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote: > > +(define_split > > + [(set (match_operand:GPI 0 "register_operand") > > + (LOGICAL:GPI > > + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > > + (match_operand:QI 2 "aarch64_shift_imm_")) > > + (match_operand:GPI 4 "const_int_operand")) > > + (zero_extend:GPI (match_operand 3 "register_operand"] > > + "can_create_pseudo_p () > > + && REG_P (operands[1]) > > + && REG_P (operands[3]) > > + && REGNO (operands[1]) == REGNO (operands[3]) > > + && ((unsigned HOST_WIDE_INT) > > + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) > > + << INTVAL (operands[2]), mode) > > + == UINTVAL (operands[4]))" > > IMO this would be easier to understand as: > >&& (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) > << INTVAL (operands[2]), mode) >== INTVAL (operands[4])) > > (At first I thought the cast and UINTVAL were trying to escape the > sign-extension canonicalisation.) It is ok to write it that way, you're right, I wrote it with UINTVAL etc. because I initially didn't use trunc_int_for_mode but that is wrong for SImode if the mask is shifted into bit 31. > I'm not sure about this one though. The REGNO checks mean that this is > effectively for hard registers only. I thought one of the reasons for > make_more_copies was to avoid combining hard registers like this, so I'm > not sure we should have a pattern that specifically targets them. > > Segher, have I misunderstood? Yes, this one works only with the hard regs, the problem is that when the hard regs are there, combiner doesn't try anything else, so without such splitter it punts on that. If I add yet another testcase which doesn't have hard registers, like: unsigned or_shift2 (void) { unsigned char i = 0; asm volatile ("" : "+r" (i)); return i | (i << 11); } then my patch doesn't handle that case, and the only splitter that would help would need to deal with: (set (reg/i:SI 0 x0) (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0) (const_int 11 [0xb])) (const_int 522240 [0x7f800])) (zero_extend:SI (reg:QI 97 [ i ] I have added another combine splitter for this below. But as you can see, what combiner simplification comes with isn't really consistent and orthogonal, different operations in there look quite differently :(. > These two look good to me apart from the cast nit. The last one feels > like it's more general than just sign_extends though. I guess it would > work for any duplicated operation that can be performed in a single > instruction. True, but only very small portion of them can actually make it through, it needs something that combine has been able to propagate into another instruction. So if we know about other insns that would look the same and would actually be ever matched, we can e.g. define an operator predicate for it, but until we have testcases for that, not sure it is worth it. Here is an updated patch that handles also the zero extends without hard registers and doesn't have the UHWI casts (but untested for now except for the testcase): 2021-04-14 Jakub Jelinek PR target/100056 * config/aarch64/aarch64.md (*_3): Add combine splitters for *_ashl3 with ZERO_EXTEND, SIGN_EXTEND or AND. * gcc.target/aarch64/pr100056.c: New test. --- gcc/config/aarch64/aarch64.md.jj2021-04-13 20:41:45.030040848 +0200 +++ gcc/config/aarch64/aarch64.md 2021-04-14 19:07:41.641623978 +0200 @@ -4431,6 +4431,75 @@ (define_insn "*_")) + (match_operand:GPI 4 "const_int_operand")) + (zero_extend:GPI (match_operand 3 "register_operand"] + "can_create_pseudo_p () + && REG_P (operands[1]) + && REG_P (operands[3]) + && REGNO (operands[1]) == REGNO (operands[3]) + && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) + << INTVAL (operands[2]), mode) + == INTVAL (operands[4]))" + [(set (match_dup 4) (zero_extend:GPI (match_dup 3))) + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) + (match_dup 4)))] + "operands[4] = gen_reg_rtx (mode);" +) + +(define_split + [(set (match_operand:GPI 0 "register_operand") + (LOGICAL:GPI + (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator" +[(match_operand 1 "register_operand")]) + (match_operand:QI 2 "aarch64_shift_imm_")) + (match_operand:GPI 3 "const_int_operand")) + (zero_extend:GPI (match_dup 1] + "can_create_pseudo_p () + && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1])) + << INTVAL (operands[2]), mode) + == INTVAL (operands[3]))" + [(set (match_dup 4) (zero_extend:GPI
Re: [PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
Jakub Jelinek writes: > Hi! > > Before combiner added 2 to 2 combinations, the following testcase functions > have been all compiled into 2 instructions, zero/sign extensions or and > followed by orr with lsl, e.g. for the first function > Trying 7 -> 8: > 7: r96:SI=r94:SI<<0xb > 8: r95:SI=r96:SI|r94:SI > REG_DEAD r96:SI > REG_DEAD r94:SI > Successfully matched this instruction: > (set (reg:SI 95) > (ior:SI (ashift:SI (reg/v:SI 94 [ i ]) > (const_int 11 [0xb])) > (reg/v:SI 94 [ i ]))) > is the important successful try_combine and so we end up with > and w0, w0, 255 > orr w0, w0, w0, lsl 11 > in the body. > With 2 to 2 combination, before that can trigger, another successful > combination: > Trying 2 -> 7: > 2: r94:SI=zero_extend(x0:QI) > REG_DEAD x0:QI > 7: r96:SI=r94:SI<<0xb > is replaced with: > (set (reg/v:SI 94 [ i ]) > (zero_extend:SI (reg:QI 0 x0 [ i ]))) > and > (set (reg:SI 96) > (and:SI (ashift:SI (reg:SI 0 x0 [ i ]) > (const_int 11 [0xb])) > (const_int 522240 [0x7f800]))) > and in the end results in 3 instructions in the body: > and w1, w0, 255 > ubfiz w0, w0, 11, 8 > orr w0, w0, w1 > The following combine splitters help undo that when combiner tries to > combine 3 instructions - the zero/sign extend or and, the other insn > from the 2 to 2 combination ([us]bfiz) and the logical op, the CPUs > don't have an insn to do everything in one op, but we can split it > back into the zero/sign extend or and followed by logical with lsl. > > Bootstrapped/regtested on aarch64-linux, ok for trunk? > > 2021-04-14 Jakub Jelinek > > PR target/100056 > * config/aarch64/aarch64.md (*_3): > Add combine splitters for *_ashl3 with > ZERO_EXTEND, SIGN_EXTEND or AND. > > * gcc.target/aarch64/pr100056.c: New test. > > --- gcc/config/aarch64/aarch64.md.jj 2021-04-13 12:40:57.0 +0200 > +++ gcc/config/aarch64/aarch64.md 2021-04-13 19:54:17.015764651 +0200 > @@ -4431,6 +4431,59 @@ (define_insn "*_[(set_attr "type" "logic_shift_imm")] > ) > > +(define_split > + [(set (match_operand:GPI 0 "register_operand") > + (LOGICAL:GPI > + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > +(match_operand:QI 2 "aarch64_shift_imm_")) > +(match_operand:GPI 4 "const_int_operand")) > + (zero_extend:GPI (match_operand 3 "register_operand"] > + "can_create_pseudo_p () > + && REG_P (operands[1]) > + && REG_P (operands[3]) > + && REGNO (operands[1]) == REGNO (operands[3]) > + && ((unsigned HOST_WIDE_INT) > + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) > +<< INTVAL (operands[2]), mode) > + == UINTVAL (operands[4]))" IMO this would be easier to understand as: && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) << INTVAL (operands[2]), mode) == INTVAL (operands[4])) (At first I thought the cast and UINTVAL were trying to escape the sign-extension canonicalisation.) I'm not sure about this one though. The REGNO checks mean that this is effectively for hard registers only. I thought one of the reasons for make_more_copies was to avoid combining hard registers like this, so I'm not sure we should have a pattern that specifically targets them. Segher, have I misunderstood? > + [(set (match_dup 4) (zero_extend:GPI (match_dup 3))) > + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) > +(match_dup 4)))] > + "operands[4] = gen_reg_rtx (mode);" > +) > + > +(define_split > + [(set (match_operand:GPI 0 "register_operand") > + (LOGICAL:GPI > + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") > +(match_operand:QI 2 "aarch64_shift_imm_")) > +(match_operand:GPI 4 "const_int_operand")) > + (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"] > + "can_create_pseudo_p () > + && pow2_or_zerop (UINTVAL (operands[3]) + 1) > + && ((unsigned HOST_WIDE_INT) > + trunc_int_for_mode (UINTVAL (operands[3]) > +<< INTVAL (operands[2]), mode) > + == UINTVAL (operands[4]))" > + [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3))) > + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) > +(match_dup 4)))] > + "operands[4] = gen_reg_rtx (mode);" > +) > + > +(define_split > + [(set (match_operand:GPI 0 "register_operand") > + (LOGICAL:GPI > + (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand")) > + (match_operand:QI 2 "aarch64_shift_imm_")) > + (sign_extend:GPI (match_dup 1] > + "can_create_pseudo_p ()" > + [(set (match_dup 4) (sign_extend:GPI (match_dup 1))) > + (set (match_dup 0)
[PATCH] aarch64: Fix several *_ashl3 related regressions [PR100056]
Hi! Before combiner added 2 to 2 combinations, the following testcase functions have been all compiled into 2 instructions, zero/sign extensions or and followed by orr with lsl, e.g. for the first function Trying 7 -> 8: 7: r96:SI=r94:SI<<0xb 8: r95:SI=r96:SI|r94:SI REG_DEAD r96:SI REG_DEAD r94:SI Successfully matched this instruction: (set (reg:SI 95) (ior:SI (ashift:SI (reg/v:SI 94 [ i ]) (const_int 11 [0xb])) (reg/v:SI 94 [ i ]))) is the important successful try_combine and so we end up with and w0, w0, 255 orr w0, w0, w0, lsl 11 in the body. With 2 to 2 combination, before that can trigger, another successful combination: Trying 2 -> 7: 2: r94:SI=zero_extend(x0:QI) REG_DEAD x0:QI 7: r96:SI=r94:SI<<0xb is replaced with: (set (reg/v:SI 94 [ i ]) (zero_extend:SI (reg:QI 0 x0 [ i ]))) and (set (reg:SI 96) (and:SI (ashift:SI (reg:SI 0 x0 [ i ]) (const_int 11 [0xb])) (const_int 522240 [0x7f800]))) and in the end results in 3 instructions in the body: and w1, w0, 255 ubfiz w0, w0, 11, 8 orr w0, w0, w1 The following combine splitters help undo that when combiner tries to combine 3 instructions - the zero/sign extend or and, the other insn from the 2 to 2 combination ([us]bfiz) and the logical op, the CPUs don't have an insn to do everything in one op, but we can split it back into the zero/sign extend or and followed by logical with lsl. Bootstrapped/regtested on aarch64-linux, ok for trunk? 2021-04-14 Jakub Jelinek PR target/100056 * config/aarch64/aarch64.md (*_3): Add combine splitters for *_ashl3 with ZERO_EXTEND, SIGN_EXTEND or AND. * gcc.target/aarch64/pr100056.c: New test. --- gcc/config/aarch64/aarch64.md.jj2021-04-13 12:40:57.0 +0200 +++ gcc/config/aarch64/aarch64.md 2021-04-13 19:54:17.015764651 +0200 @@ -4431,6 +4431,59 @@ (define_insn "*_")) + (match_operand:GPI 4 "const_int_operand")) + (zero_extend:GPI (match_operand 3 "register_operand"] + "can_create_pseudo_p () + && REG_P (operands[1]) + && REG_P (operands[3]) + && REGNO (operands[1]) == REGNO (operands[3]) + && ((unsigned HOST_WIDE_INT) + trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3])) + << INTVAL (operands[2]), mode) + == UINTVAL (operands[4]))" + [(set (match_dup 4) (zero_extend:GPI (match_dup 3))) + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) + (match_dup 4)))] + "operands[4] = gen_reg_rtx (mode);" +) + +(define_split + [(set (match_operand:GPI 0 "register_operand") + (LOGICAL:GPI + (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand") + (match_operand:QI 2 "aarch64_shift_imm_")) + (match_operand:GPI 4 "const_int_operand")) + (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"] + "can_create_pseudo_p () + && pow2_or_zerop (UINTVAL (operands[3]) + 1) + && ((unsigned HOST_WIDE_INT) + trunc_int_for_mode (UINTVAL (operands[3]) + << INTVAL (operands[2]), mode) + == UINTVAL (operands[4]))" + [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3))) + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) + (match_dup 4)))] + "operands[4] = gen_reg_rtx (mode);" +) + +(define_split + [(set (match_operand:GPI 0 "register_operand") + (LOGICAL:GPI + (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand")) + (match_operand:QI 2 "aarch64_shift_imm_")) + (sign_extend:GPI (match_dup 1] + "can_create_pseudo_p ()" + [(set (match_dup 4) (sign_extend:GPI (match_dup 1))) + (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2)) + (match_dup 4)))] + "operands[4] = gen_reg_rtx (mode);" +) + (define_insn "*_rol3" [(set (match_operand:GPI 0 "register_operand" "=r") (LOGICAL:GPI (rotate:GPI --- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj 2021-04-13 14:20:53.334784184 +0200 +++ gcc/testsuite/gcc.target/aarch64/pr100056.c 2021-04-13 19:44:09.358529648 +0200 @@ -0,0 +1,50 @@ +/* PR target/100056 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */ + +int +or_shift_u8 (unsigned char i) +{ + return i | (i << 11); +} + +int +or_shift_u3a (unsigned i) +{ + i &= 7; + return i | (i << 11); +} + +int +or_shift_u3b (unsigned i) +{ + i = (i << 29) >> 29; + return i | (i << 11); +} + +int +or_shift_s16 (signed short i) +{ + return i | (i << 11); +} + +int +or_shift_s8 (signed char i) +{ + return i | (i << 11); +} + +int +or_shift_s13 (int i) +{ + i = (i << 19) >> 19; + return i | (i << 11); +} + +int +or_shift_s3