Re: Truncate optimisation question
> Therefore, I think that the best thing to do is to go back to Uros' original > idea of distributing the SUBREG only within a PLUS or a MINUS: > http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01682.html > The outer operation sort of guarantees that operating in this mode is OK for > the target, so the transformation can be done unconditionally. It could be > also done in simplify-rtx.c instead of combine.c. It turns out that it's too late, Uros immediately after installed a couple of patches in the x86 back-end: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg01032.html http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg01074.html that (very likely) rely on simplify_gen_subreg distributing the SUBREG. OK, at this point I'm going to propose the minimal kludge... -- Eric Botcazou
Re: Truncate optimisation question
Eric Botcazou writes: > Therefore, I think that the best thing to do is to go back to Uros' original > idea of distributing the SUBREG only within a PLUS or a MINUS: > http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01682.html > The outer operation sort of guarantees that operating in this mode is OK for > the target, so the transformation can be done unconditionally. It could be > also done in simplify-rtx.c instead of combine.c. Please do you whatever you think is best. Clearly I screwed this up, sorry. Richard
Re: Truncate optimisation question
> No, the second follow-up was about moving the simplification to combine.c > and guarding it with !WORD_REGISTER_OPERATIONS, or whatever other check > you prefer, so that no undoing is necessary. In fact the mere distribution of the SUBREG seems to be questionable, whatever the target. For the testcase of PR58295, the redundant masking is eliminated because: (insn 9 7 10 2 (set (reg:SI 120) (plus:SI (reg:SI 119) (const_int -48 [0xffd0]))) pr58295.c:8 5 {*arm_addsi3} (expr_list:REG_DEAD (reg:SI 119) (nil))) (insn 10 9 11 2 (set (reg:SI 121) (and:SI (reg:SI 120) (const_int 255 [0xff]))) pr58295.c:8 80 {*arm_andsi3_insn} (expr_list:REG_DEAD (reg:SI 120) (nil))) (insn 11 10 12 2 (set (reg:CC 100 cc) (compare:CC (reg:SI 121) (const_int 9 [0x9]))) pr58295.c:8 227 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg:SI 121) (nil))) is combined into: (set (reg:CC 100 cc) (compare:CC (plus:SI (reg:SI 119) (const_int -48 [0xffd0])) (const_int 9 [0x9]))) and then split, thanks to: case SUBREG: /* Check for the case where we are comparing A - C1 with C2, that is (subreg:MODE (plus (A) (-C1))) op (C2) with C1 a constant, and try to lift the SUBREG, i.e. to do the comparison in the wider mode. One of the following two conditions must be true in order for this to be valid: 1. The mode extension results in the same bit pattern being added on both sides and the comparison is equality or unsigned. As C2 has been truncated to fit in MODE, the pattern can only be all 0s or all 1s. 2. The mode extension results in the sign bit being copied on each side. The difficulty here is that we have predicates for A but not for (A - C1) so we need to check that C1 is within proper bounds so as to perturbate A as little as possible. */ in simplify_comparison (and the SUBREG is generated from the AND by another case of simplify_comparison). Distributing the SUBREG disables the code. Therefore, I think that the best thing to do is to go back to Uros' original idea of distributing the SUBREG only within a PLUS or a MINUS: http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01682.html The outer operation sort of guarantees that operating in this mode is OK for the target, so the transformation can be done unconditionally. It could be also done in simplify-rtx.c instead of combine.c. -- Eric Botcazou
Re: Truncate optimisation question
Eric Botcazou writes: >> But that's the problem with trying to do the optimisation in this way. >> We first simplify a truncation of an SImode addition X. Then we simplify >> a zero extension of that truncation. Then we have the opportunity to >> realise that the zero extension wasn't necessary after all, so we actually >> want to undo both simplifications and go back to the original addition >> pattern. So undoing the simplifications is precisely what we're aiming for >> here, again regardless of RISCness. > > In other words, you want to add the reverse transformation in combine.c to > undo the effects of the simplication in simplify-rtx.c? And move the latter > simplification back to combine.c in the process? No, the second follow-up was about moving the simplification to combine.c and guarding it with !WORD_REGISTER_OPERATIONS, or whatever other check you prefer, so that no undoing is necessary. To be clear, that isn't what I want. :-) It just seems less bad than guarding a simplify-rtx.c operation based on the target. What I really want is to do redundancy elimination as a separate pass, without tying it combine and without doing it the way that combine does. But it's not like I've got time to do that. The quote above was just me griping about the way that, although combine is asking simplify-rtx.c to simplify i1-combined-with-i2, it's actually relying on the simplifications being no-ops, so that the original i1 is still available when i1-with-i2 is further combined into i3 and the redundancy of i2 becomes known, giving i1-with-i3. It just seems like a fragile way to do redundancy elimination. Thanks, Richard
Re: Truncate optimisation question
> But that's the problem with trying to do the optimisation in this way. > We first simplify a truncation of an SImode addition X. Then we simplify > a zero extension of that truncation. Then we have the opportunity to > realise that the zero extension wasn't necessary after all, so we actually > want to undo both simplifications and go back to the original addition > pattern. So undoing the simplifications is precisely what we're aiming for > here, again regardless of RISCness. In other words, you want to add the reverse transformation in combine.c to undo the effects of the simplication in simplify-rtx.c? And move the latter simplification back to combine.c in the process? -- Eric Botcazou
Re: Truncate optimisation question
On Fri, Dec 6, 2013 at 9:42 AM, Richard Sandiford wrote: > Eric Botcazou writes: >>> Well, I think making the simplify-rtx code conditional on the target >>> would be the wrong way to go. If we really can't live with it being >>> unconditional then I think we should revert it. But like I say I think >>> it would be better to make combine recognise the redundancy even with >>> the new form. (Or as I say, longer term, not to rely on combine to >>> eliminate redundant extensions.) But I don't have time to do that myself... >> >> It helps x86 so we won't revert it. My fear is that we'll need to add code >> in >> other places to RISCify back the result of this "simplification". > > Sorry, realised I didn't respond to this yesterday. I wasn't suggesting > we just revert and walk away. ISTR the original suggestion was to patch > combine instead of simplify-rtx.c, so we could back to that. I think that looks most sensible. Richard. > Thanks, > Richard
Re: Truncate optimisation question
Eric Botcazou writes: >> Well, I think making the simplify-rtx code conditional on the target >> would be the wrong way to go. If we really can't live with it being >> unconditional then I think we should revert it. But like I say I think >> it would be better to make combine recognise the redundancy even with >> the new form. (Or as I say, longer term, not to rely on combine to >> eliminate redundant extensions.) But I don't have time to do that myself... > > It helps x86 so we won't revert it. My fear is that we'll need to add code > in > other places to RISCify back the result of this "simplification". Sorry, realised I didn't respond to this yesterday. I wasn't suggesting we just revert and walk away. ISTR the original suggestion was to patch combine instead of simplify-rtx.c, so we could back to that. Thanks, Richard
Re: Truncate optimisation question
Eric Botcazou writes: >> The comment says that we're trying to match: >> >> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int >> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int))) >> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int))) >> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int))) >> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int >> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) >> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) >> 8. (set (reg:SI) (leu:SI (plus:QI ...))) >> >> And I think that's what we should be matching in cases where the >> extension isn't redundant, even on RISC targets. > > Which one(s) exactly? Most of the RISC targets we have are parameterized > (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes > smaller than the word mode. The first one, sorry. >> The problem here isn't really about which mode is on the plus, >> but whether we recognise that the extension instruction is redundant. >> I.e. we start with: >> >> (insn 9 8 10 2 (set (reg:SI 120) >> (plus:SI (subreg:SI (reg:QI 118) 0) >> (const_int -48 [0xffd0]))) test.c:6 -1 >> (nil)) >> (insn 10 9 11 2 (set (reg:SI 121) >> (and:SI (reg:SI 120) >> (const_int 255 [0xff]))) test.c:6 -1 >> (nil)) >> (insn 11 10 12 2 (set (reg:CC 100 cc) >> (compare:CC (reg:SI 121) >> (const_int 9 [0x9]))) test.c:6 -1 >> (nil)) >> >> and what we want combine to do is to recognise that insn 10 is redundant >> and reduce the sequence to: >> >> (insn 9 8 10 2 (set (reg:SI 120) >> (plus:SI (subreg:SI (reg:QI 118) 0) >> (const_int -48 [0xffd0]))) test.c:6 -1 >> (nil)) >> (insn 11 10 12 2 (set (reg:CC 100 cc) >> (compare:CC (reg:SI 120) >> (const_int 9 [0x9]))) test.c:6 -1 >> (nil)) >> >> But insn 11 is redundant on all targets, not just RISC ones. >> It isn't about whether the target has a QImode addition or not. > > That's theoritical though since, on x86 for example, the redundant > instruction > isn't even generated because of the QImode addition... Not for this testcase, sure, but we use an SImode addition and keep the equivalent redundant extension until combine for: int foo (unsigned char *x) { return (((unsigned int) *x - 48) & 0xff) < 10; } immediately before combine: (insn 7 6 8 2 (parallel [ (set (reg:SI 93 [ D.1753 ]) (plus:SI (reg:SI 92 [ D.1753 ]) (const_int -48 [0xffd0]))) (clobber (reg:CC 17 flags)) ]) /tmp/foo.c:3 261 {*addsi_1} (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 8 7 9 2 (set (reg:SI 94 [ D.1753 ]) (zero_extend:SI (subreg:QI (reg:SI 93 [ D.1753 ]) 0))) /tmp/foo.c:3 133 {*zero_extendqisi2} (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ]) (nil))) (insn 9 8 10 2 (set (reg:CC 17 flags) (compare:CC (reg:SI 94 [ D.1753 ]) (const_int 9 [0x9]))) /tmp/foo.c:3 7 {*cmpsi_1} (expr_list:REG_DEAD (reg:SI 94 [ D.1753 ]) (nil))) What saves us isn't QImode addition but QImode comparison: combine: (insn 7 6 8 2 (parallel [ (set (reg:SI 93 [ D.1753 ]) (plus:SI (reg:SI 92 [ D.1753 ]) (const_int -48 [0xffd0]))) (clobber (reg:CC 17 flags)) ]) /tmp/foo.c:3 261 {*addsi_1} (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (note 8 7 9 2 NOTE_INSN_DELETED) (insn 9 8 10 2 (set (reg:CC 17 flags) (compare:CC (subreg:QI (reg:SI 93 [ D.1753 ]) 0) (const_int 9 [0x9]))) /tmp/foo.c:3 5 {*cmpqi_1} (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ]) (nil))) movzbl (%rdi), %eax subl$48, %eax cmpb$9, %al setbe %al movzbl %al, %eax ret (The patch didn't affect things here.) FWIW, change the testcase to: int foo (unsigned char *x) { return (((unsigned int) *x - 48) & 0x1ff) < 10; } and we keep the redundant AND, again regardless of whether the patch is applied. >> Well, I think making the simplify-rtx code conditional on the target >> would be the wrong way to go. If we really can't live with it being >> unconditional then I think we should revert it. But like I say I think >> it would be better to make combine recognise the redundancy even with >> the new form. (Or as I say, longer term, not to rely on combine to >> eliminate redundant extensions.) But I don't have time to do that myself... > > It helps x86 so we won't revert it. My fear is that we'll need to add > code in other places to RISCify back the result of this > "simplification". But that's the problem with trying to do the optimisation in this way. We first simplify a truncation of an SImode addition
Re: Truncate optimisation question
> The comment says that we're trying to match: > > 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int > 2. (set (reg:QI) (plus:QI (mem:QI) (const_int))) > 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int))) > 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int))) > 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int > 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) > 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) > 8. (set (reg:SI) (leu:SI (plus:QI ...))) > > And I think that's what we should be matching in cases where the > extension isn't redundant, even on RISC targets. Which one(s) exactly? Most of the RISC targets we have are parameterized (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes smaller than the word mode. > The problem here isn't really about which mode is on the plus, > but whether we recognise that the extension instruction is redundant. > I.e. we start with: > > (insn 9 8 10 2 (set (reg:SI 120) > (plus:SI (subreg:SI (reg:QI 118) 0) > (const_int -48 [0xffd0]))) test.c:6 -1 > (nil)) > (insn 10 9 11 2 (set (reg:SI 121) > (and:SI (reg:SI 120) > (const_int 255 [0xff]))) test.c:6 -1 > (nil)) > (insn 11 10 12 2 (set (reg:CC 100 cc) > (compare:CC (reg:SI 121) > (const_int 9 [0x9]))) test.c:6 -1 > (nil)) > > and what we want combine to do is to recognise that insn 10 is redundant > and reduce the sequence to: > > (insn 9 8 10 2 (set (reg:SI 120) > (plus:SI (subreg:SI (reg:QI 118) 0) > (const_int -48 [0xffd0]))) test.c:6 -1 > (nil)) > (insn 11 10 12 2 (set (reg:CC 100 cc) > (compare:CC (reg:SI 120) > (const_int 9 [0x9]))) test.c:6 -1 > (nil)) > > But insn 11 is redundant on all targets, not just RISC ones. > It isn't about whether the target has a QImode addition or not. That's theoritical though since, on x86 for example, the redundant instruction isn't even generated because of the QImode addition... > Well, I think making the simplify-rtx code conditional on the target > would be the wrong way to go. If we really can't live with it being > unconditional then I think we should revert it. But like I say I think > it would be better to make combine recognise the redundancy even with > the new form. (Or as I say, longer term, not to rely on combine to > eliminate redundant extensions.) But I don't have time to do that myself... It helps x86 so we won't revert it. My fear is that we'll need to add code in other places to RISCify back the result of this "simplification". -- Eric Botcazou
Re: Truncate optimisation question
Eric Botcazou writes: >> Combine is asking simplify-rtx.c to truncate an addition to QImode >> and simplify-rtx.c is providing a reasonable representation of that. >> It's the representation we should use when matching against .md patterns, >> for example. The problem is that combine doesn't want to keep the >> truncation in this case, but doesn't know that yet. > > I disagree, I don't find it reasonable to turn an addition in SImode into an > addition in QImode when the machine supports the former but not the latter. > I agree that it may help in some contexts, but then the transformation should > be restricted to these contexts. > >> Right, but the only complaint I know of is about its effect on combine. >> And my point is that that complaint isn't about combine failing to combine >> instructions per se. It's that combine is failing to remove a redundant >> operation. With the right input, the same rtl sequence could conceivably >> be generated on a CISC target like x86_64, since it defines all the required >> patterns (SImode addition, QI->SI zero extension, SImode comparison). It >> could also occur with a sequence that starts out as a QImode addition. So >> trying to make the simplification depend on CISCness seems like papering >> over the problem. > > The problem is that this changes the combinations tried by the combiner in a > way that is adverse to most RISC targets. Sure, we could change all the > affected back-ends but what's the point exactly? What do we gain here? > > Look for example at comment #4 in PR rtl-optimization/58295. The comment says that we're trying to match: 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int 2. (set (reg:QI) (plus:QI (mem:QI) (const_int))) 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int))) 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int))) 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) 8. (set (reg:SI) (leu:SI (plus:QI ...))) And I think that's what we should be matching in cases where the extension isn't redundant, even on RISC targets. The problem here isn't really about which mode is on the plus, but whether we recognise that the extension instruction is redundant. I.e. we start with: (insn 9 8 10 2 (set (reg:SI 120) (plus:SI (subreg:SI (reg:QI 118) 0) (const_int -48 [0xffd0]))) test.c:6 -1 (nil)) (insn 10 9 11 2 (set (reg:SI 121) (and:SI (reg:SI 120) (const_int 255 [0xff]))) test.c:6 -1 (nil)) (insn 11 10 12 2 (set (reg:CC 100 cc) (compare:CC (reg:SI 121) (const_int 9 [0x9]))) test.c:6 -1 (nil)) and what we want combine to do is to recognise that insn 10 is redundant and reduce the sequence to: (insn 9 8 10 2 (set (reg:SI 120) (plus:SI (subreg:SI (reg:QI 118) 0) (const_int -48 [0xffd0]))) test.c:6 -1 (nil)) (insn 11 10 12 2 (set (reg:CC 100 cc) (compare:CC (reg:SI 120) (const_int 9 [0x9]))) test.c:6 -1 (nil)) But insn 11 is redundant on all targets, not just RISC ones. It isn't about whether the target has a QImode addition or not. >> If you think the patch was wrong or if you feel the fallout is too great >> then please feel free to revert it. > > I think that the fallout is too great for RISC targets, yes, so I'm trying to > find a reasonable compromise. Well, I think making the simplify-rtx code conditional on the target would be the wrong way to go. If we really can't live with it being unconditional then I think we should revert it. But like I say I think it would be better to make combine recognise the redundancy even with the new form. (Or as I say, longer term, not to rely on combine to eliminate redundant extensions.) But I don't have time to do that myself... Thanks, Richard
Re: Truncate optimisation question
> Combine is asking simplify-rtx.c to truncate an addition to QImode > and simplify-rtx.c is providing a reasonable representation of that. > It's the representation we should use when matching against .md patterns, > for example. The problem is that combine doesn't want to keep the > truncation in this case, but doesn't know that yet. I disagree, I don't find it reasonable to turn an addition in SImode into an addition in QImode when the machine supports the former but not the latter. I agree that it may help in some contexts, but then the transformation should be restricted to these contexts. > Right, but the only complaint I know of is about its effect on combine. > And my point is that that complaint isn't about combine failing to combine > instructions per se. It's that combine is failing to remove a redundant > operation. With the right input, the same rtl sequence could conceivably > be generated on a CISC target like x86_64, since it defines all the required > patterns (SImode addition, QI->SI zero extension, SImode comparison). It > could also occur with a sequence that starts out as a QImode addition. So > trying to make the simplification depend on CISCness seems like papering > over the problem. The problem is that this changes the combinations tried by the combiner in a way that is adverse to most RISC targets. Sure, we could change all the affected back-ends but what's the point exactly? What do we gain here? Look for example at comment #4 in PR rtl-optimization/58295. > If you think the patch was wrong or if you feel the fallout is too great > then please feel free to revert it. I think that the fallout is too great for RISC targets, yes, so I'm trying to find a reasonable compromise. -- Eric Botcazou
Re: Truncate optimisation question
Eric Botcazou writes: >> I don't think this is the way to go. AIUI the problem here isn't that >> RISC architectures don't have QImode adds as such. If we were going >> to combine insn 6 and insn 7 _in isolation_ then we would have either: >> >>(zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R)) >>(const_int X >> >> before the patch or: >> >>(zero_extend:SI (plus:QI (reg:QI R) >> (const_int X))) >> >> after the patch. And IMO the second form is a simplification over the >> first. (Despite being a RISC port, MIPS Octeon does have a pattern for >> zero-extending byte addition, so this isn't entirely academic.) > > Well, if you look at the transformation in isolation, you cannot > reasonably say that it's a simplification for most RISC architectures > either. That it happens to help zero-extensions on MIPS is fine, but > it pessimizes other cases on other architectures (and maybe on MIPS as > well). I think we're talking about different meanings of "simplification". simplify-rtx.c can't and IMO shouldn't be making decisions about simplicity in the sense of "X is cheaper on the target than Y" or even "the target has an optab for X but not Y". But it should try wherever possible to (a) allow recursive simplification and (b) reduce the number of ways that the same thing can be represented. And the patch did that. E.g. the old representation quoted above allowed the constant to have redundant upper bits. The new representation causes them to be removed, so we have a predictable constant. Combine is asking simplify-rtx.c to truncate an addition to QImode and simplify-rtx.c is providing a reasonable representation of that. It's the representation we should use when matching against .md patterns, for example. The problem is that combine doesn't want to keep the truncation in this case, but doesn't know that yet. >> It sounds like the problem is instead that we're relying on combine >> to remove redundant zero extensions and that combine's no longer able >> to do that when folding insns 6 and 7 into the comparison. Is that right? >> I think it would better to use a dedicated global pass to remove redundant >> extensions instead. IIRC there were various attempts to do that. > > The regressions affect 4.8 and mainline and it's probably too late to add new > passes in order to mitigate them. > >> IMO combine should just be about instruction selection. The patch >> still seems good to me from that POV. > > The patch is in simplify-rtx.c though, not in combine.c, so it's more general. Right, but the only complaint I know of is about its effect on combine. And my point is that that complaint isn't about combine failing to combine instructions per se. It's that combine is failing to remove a redundant operation. With the right input, the same rtl sequence could conceivably be generated on a CISC target like x86_64, since it defines all the required patterns (SImode addition, QI->SI zero extension, SImode comparison). It could also occur with a sequence that starts out as a QImode addition. So trying to make the simplification depend on CISCness seems like papering over the problem. If we want to keep this as a combine optimisation for 4.9 then I think we should teach it to handle zero-extended arithmetic too. But there again, I stood down as an rtl maintainer for a reason. :-) If you think the patch was wrong or if you feel the fallout is too great then please feel free to revert it. Thanks, Richard
Re: Truncate optimisation question
> To me promote_mode sounds like the best fit. But doesn't combine do > instruction validation? So in this case the target claims to support the > narrow operation? Part of the problem is that it's not in the combiner, it's in simplify-rtx.c, so it's applied liberally when you're manipulating the RTL. -- Eric Botcazou
Re: Truncate optimisation question
> I don't think this is the way to go. AIUI the problem here isn't that > RISC architectures don't have QImode adds as such. If we were going > to combine insn 6 and insn 7 _in isolation_ then we would have either: > >(zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R)) >(const_int X > > before the patch or: > >(zero_extend:SI (plus:QI (reg:QI R) > (const_int X))) > > after the patch. And IMO the second form is a simplification over the > first. (Despite being a RISC port, MIPS Octeon does have a pattern for > zero-extending byte addition, so this isn't entirely academic.) Well, if you look at the transformation in isolation, you cannot reasonably say that it's a simplification for most RISC architectures either. That it happens to help zero-extensions on MIPS is fine, but it pessimizes other cases on other architectures (and maybe on MIPS as well). > It sounds like the problem is instead that we're relying on combine > to remove redundant zero extensions and that combine's no longer able > to do that when folding insns 6 and 7 into the comparison. Is that right? > I think it would better to use a dedicated global pass to remove redundant > extensions instead. IIRC there were various attempts to do that. The regressions affect 4.8 and mainline and it's probably too late to add new passes in order to mitigate them. > IMO combine should just be about instruction selection. The patch > still seems good to me from that POV. The patch is in simplify-rtx.c though, not in combine.c, so it's more general. -- Eric Botcazou
Re: Truncate optimisation question
Eric Botcazou wrote: >> For code: >> >> unsigned char foo (unsigned char c) >> { >>return (c >= '0') && (c <= '9'); >> } >> >> we end up generating: >> >> sub r0, r0, #48 >> uxtbr0, r0 >> cmp r0, #9 >> movhi r0, #0 >> movls r0, #1 >> bx lr >> >> The extra uxtb (extend) is causing the test to fail. We started >generating >> the extra extend when a particular optimisation went in with >(revision >> r191928). > >That's PR rtl-optimization/58295. We have pessimizations on the SPARC >too. > >> The comment in simplify-rtx.c says it transforms >> (truncate:SI (op:DI (x:DI) (y:DI))) >> >> into >> >> (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))) >> >> but from what I can see it also transforms >> >> (truncate:QI (op:SI (x:SI) (y:SI))) >> >> into >> >> (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI))) >> >> From the combine dump I see that the sub and extend operations come >from >> the RTL: >> >> (insn 6 3 7 2 (set (reg:SI 116) >> (plus:SI (reg:SI 0 r0 [ c ]) >> (const_int -48 [0xffd0]))) >> >> (insn 7 6 8 2 (set (reg:SI 117) >> (zero_extend:SI (subreg:QI (reg:SI 116) 0))) >> >> >> If I add a QImode compare pattern to the arm backend it gets matched >and the >> extra extend goes away, but it seems to me that that's not the >correct >> solution. Ideally, if a QImode operation is performed as an SImode >> operation on a target (like the sub and compare operations on arm) >then we >> should not be doing this optimisation? > >Yes, that's my opinion as well, but Richard (CCed) seemed to disagree. >In any >case, that's certainly not a simplification. > >> My question is, how does one express that information in the >simplify-rtx.c >> code? It seems that the PR that optimisation fixed (54457) only cared >about >> DI -> SI truncations, so perhaps we should disable it for conversions >> between other modes where it's not beneficial altogether? > >I can think of 3 possible solutions: > - WORD_REGISTER_OPERATIONS, > - promote_mode, > - optabs. > >The 3rd solution seems to be the most straightforward, but this would >be the >first time that we test optabs from simplify-rtx.c. To me promote_mode sounds like the best fit. But doesn't combine do instruction validation? So in this case the target claims to support the narrow operation? Richard.
Re: Truncate optimisation question
Eric Botcazou writes: >> For code: >> >> unsigned char foo (unsigned char c) >> { >>return (c >= '0') && (c <= '9'); >> } >> >> we end up generating: >> >> sub r0, r0, #48 >> uxtbr0, r0 >> cmp r0, #9 >> movhi r0, #0 >> movls r0, #1 >> bx lr >> >> The extra uxtb (extend) is causing the test to fail. We started generating >> the extra extend when a particular optimisation went in with (revision >> r191928). > > That's PR rtl-optimization/58295. We have pessimizations on the SPARC too. > >> The comment in simplify-rtx.c says it transforms >> (truncate:SI (op:DI (x:DI) (y:DI))) >> >> into >> >> (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))) >> >> but from what I can see it also transforms >> >> (truncate:QI (op:SI (x:SI) (y:SI))) >> >> into >> >> (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI))) >> >> From the combine dump I see that the sub and extend operations come from >> the RTL: >> >> (insn 6 3 7 2 (set (reg:SI 116) >> (plus:SI (reg:SI 0 r0 [ c ]) >> (const_int -48 [0xffd0]))) >> >> (insn 7 6 8 2 (set (reg:SI 117) >> (zero_extend:SI (subreg:QI (reg:SI 116) 0))) >> >> >> If I add a QImode compare pattern to the arm backend it gets matched and the >> extra extend goes away, but it seems to me that that's not the correct >> solution. Ideally, if a QImode operation is performed as an SImode >> operation on a target (like the sub and compare operations on arm) then we >> should not be doing this optimisation? > > Yes, that's my opinion as well, but Richard (CCed) seemed to disagree. In > any > case, that's certainly not a simplification. > >> My question is, how does one express that information in the simplify-rtx.c >> code? It seems that the PR that optimisation fixed (54457) only cared about >> DI -> SI truncations, so perhaps we should disable it for conversions >> between other modes where it's not beneficial altogether? > > I can think of 3 possible solutions: > - WORD_REGISTER_OPERATIONS, > - promote_mode, > - optabs. > > The 3rd solution seems to be the most straightforward, but this would be the > first time that we test optabs from simplify-rtx.c. I don't think this is the way to go. AIUI the problem here isn't that RISC architectures don't have QImode adds as such. If we were going to combine insn 6 and insn 7 _in isolation_ then we would have either: (zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R)) (const_int X before the patch or: (zero_extend:SI (plus:QI (reg:QI R) (const_int X))) after the patch. And IMO the second form is a simplification over the first. (Despite being a RISC port, MIPS Octeon does have a pattern for zero-extending byte addition, so this isn't entirely academic.) It sounds like the problem is instead that we're relying on combine to remove redundant zero extensions and that combine's no longer able to do that when folding insns 6 and 7 into the comparison. Is that right? I think it would better to use a dedicated global pass to remove redundant extensions instead. IIRC there were various attempts to do that. IMO combine should just be about instruction selection. The patch still seems good to me from that POV. Thanks, Richard
Re: Truncate optimisation question
> For code: > > unsigned char foo (unsigned char c) > { >return (c >= '0') && (c <= '9'); > } > > we end up generating: > > sub r0, r0, #48 > uxtbr0, r0 > cmp r0, #9 > movhi r0, #0 > movls r0, #1 > bx lr > > The extra uxtb (extend) is causing the test to fail. We started generating > the extra extend when a particular optimisation went in with (revision > r191928). That's PR rtl-optimization/58295. We have pessimizations on the SPARC too. > The comment in simplify-rtx.c says it transforms > (truncate:SI (op:DI (x:DI) (y:DI))) > > into > > (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))) > > but from what I can see it also transforms > > (truncate:QI (op:SI (x:SI) (y:SI))) > > into > > (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI))) > > From the combine dump I see that the sub and extend operations come from > the RTL: > > (insn 6 3 7 2 (set (reg:SI 116) > (plus:SI (reg:SI 0 r0 [ c ]) > (const_int -48 [0xffd0]))) > > (insn 7 6 8 2 (set (reg:SI 117) > (zero_extend:SI (subreg:QI (reg:SI 116) 0))) > > > If I add a QImode compare pattern to the arm backend it gets matched and the > extra extend goes away, but it seems to me that that's not the correct > solution. Ideally, if a QImode operation is performed as an SImode > operation on a target (like the sub and compare operations on arm) then we > should not be doing this optimisation? Yes, that's my opinion as well, but Richard (CCed) seemed to disagree. In any case, that's certainly not a simplification. > My question is, how does one express that information in the simplify-rtx.c > code? It seems that the PR that optimisation fixed (54457) only cared about > DI -> SI truncations, so perhaps we should disable it for conversions > between other modes where it's not beneficial altogether? I can think of 3 possible solutions: - WORD_REGISTER_OPERATIONS, - promote_mode, - optabs. The 3rd solution seems to be the most straightforward, but this would be the first time that we test optabs from simplify-rtx.c. -- Eric Botcazou
Truncate optimisation question
Hi all, I'm investigating a testsuite failure on arm: gcc.target/arm/unsigned-extend-1.c For code: unsigned char foo (unsigned char c) { return (c >= '0') && (c <= '9'); } we end up generating: sub r0, r0, #48 uxtbr0, r0 cmp r0, #9 movhi r0, #0 movls r0, #1 bx lr The extra uxtb (extend) is causing the test to fail. We started generating the extra extend when a particular optimisation went in with (revision r191928). The comment in simplify-rtx.c says it transforms (truncate:SI (op:DI (x:DI) (y:DI))) into (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))) but from what I can see it also transforms (truncate:QI (op:SI (x:SI) (y:SI))) into (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI))) From the combine dump I see that the sub and extend operations come from the RTL: (insn 6 3 7 2 (set (reg:SI 116) (plus:SI (reg:SI 0 r0 [ c ]) (const_int -48 [0xffd0]))) (insn 7 6 8 2 (set (reg:SI 117) (zero_extend:SI (subreg:QI (reg:SI 116) 0))) If I add a QImode compare pattern to the arm backend it gets matched and the extra extend goes away, but it seems to me that that's not the correct solution. Ideally, if a QImode operation is performed as an SImode operation on a target (like the sub and compare operations on arm) then we should not be doing this optimisation? My question is, how does one express that information in the simplify-rtx.c code? It seems that the PR that optimisation fixed (54457) only cared about DI -> SI truncations, so perhaps we should disable it for conversions between other modes where it's not beneficial altogether? Thanks, Kyrill