Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Fri, 2020-03-13 at 09:09 +0100, Christophe Lyon wrote: > Hi, > > > On Thu, 12 Mar 2020 at 23:12, Jeff Law via Gcc-patches > wrote: > > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: > > > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote: > > > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote: > > > > > I don't think each stanza of code should use it's own "noop-ness", no. > > > > Richard's patch is actually better than mine in that regard as it > > > > handles > > > > mem > > > > and > > > > reg nop moves in an indentical way. I don't think refactoring the cse.c > > > > code > > > > is > > > > advisable now though -- but I do want to fix the multiply-reported ICE > > > > on > > > > ARM > > > > and > > > > Richard's cse changes are the cleanest way to do that that I can see. > > > > > > It looks pretty simple, yeah... All of CSE is hopelessly fragile, but > > > this patch does not make things worse. > > > > > > > > I don't know if this patch makes matters worse or not. It doesn't > > > > > seem > > > > > suitable for stage 4 though. And Richard said the cse.c part breaks > > > > > rs6000, if that is true, yes I do object ;-) > > > > The rs6000 port breakage is trivial to fix. In fact, I did so and ran > > > > it > > > > through > > > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 > > > > native > > > > and more. > > > > > > I don't see anything rs6000 below? Is it just this generic code? > > > > > > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn) > > > > } > > > > > > > > /* Similarly, lots of targets don't allow no-op > > > > -(set (mem x) (mem x)) moves. */ > > > > +(set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) > > > > +might be impossible for certain registers (like CC > > > > registers). */ > > > > else if (n_sets == 1 > > > > - && MEM_P (trial) > > > > + && ! CALL_P (insn) > > > > + && (MEM_P (trial) || REG_P (trial)) > > > >&& MEM_P (dest) > > > >&& rtx_equal_p (trial, dest) > > > >&& !side_effects_p (dest) > > > > > > This adds the !CALL_P (no space btw) condition, why is that? > > > > > > (Is that CCmode reg-reg move comment about rs6000? Huh, we *do* have > > > patterns for that, or *should* have at least!) > > I fixed the extraneous whitespace and committed the change. > > > > The new test fails on ARM: THanks. I see what's happening, though I'm not sure *how* it happened. Anyway, doing some testing on the fix now. jeff >
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
Jeff, thanks for picking this up. Jeff Law writes: > On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote: >> On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote: >> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: >> > > > else if (n_sets == 1 >> > > > - && MEM_P (trial) >> > > > + && ! CALL_P (insn) >> > > > + && (MEM_P (trial) || REG_P (trial)) >> > > > && MEM_P (dest) >> > > > && rtx_equal_p (trial, dest) >> > > > && !side_effects_p (dest) >> > > >> > > This adds the !CALL_P (no space btw) condition, why is that? >> > Because n_sets is not valid for CALL_P insns which resulted in a failure on >> > ppc. >> > See find_sets_in_insn which ignores the set on the LHS of a call. So >> > imagine >> > if >> > we had a nop register set in parallel with a (set (reg) (call ...)). We'd >> > end up >> > deleting the entire PARALLEL which is obviously wrong. >> >> Ah, I see. So this is exposed on Power by the TOC stuff, I guess? CSE >> sees a TOC set parallel with a call as a no-op because it is set to the >> same value (an unspec, not an unspec_volatile) that GCC can derive is >> already in the TOC reg? Or is this some other case? > Not entirely sure. Richard's message didn't include the precise details. Yeah, that was exactly it. On the bright side, removing many calls as dead made for an easy-to-debug bootstrap failure :-) Richard
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
Hi, On Thu, 12 Mar 2020 at 23:12, Jeff Law via Gcc-patches wrote: > > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: > > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote: > > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote: > > > > I don't think each stanza of code should use it's own "noop-ness", no. > > > Richard's patch is actually better than mine in that regard as it handles > > > mem > > > and > > > reg nop moves in an indentical way. I don't think refactoring the cse.c > > > code > > > is > > > advisable now though -- but I do want to fix the multiply-reported ICE on > > > ARM > > > and > > > Richard's cse changes are the cleanest way to do that that I can see. > > > > It looks pretty simple, yeah... All of CSE is hopelessly fragile, but > > this patch does not make things worse. > > > > > > I don't know if this patch makes matters worse or not. It doesn't seem > > > > suitable for stage 4 though. And Richard said the cse.c part breaks > > > > rs6000, if that is true, yes I do object ;-) > > > The rs6000 port breakage is trivial to fix. In fact, I did so and ran it > > > through > > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 > > > native > > > and more. > > > > I don't see anything rs6000 below? Is it just this generic code? > > > > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn) > > > } > > > > > > /* Similarly, lots of targets don't allow no-op > > > -(set (mem x) (mem x)) moves. */ > > > +(set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) > > > +might be impossible for certain registers (like CC registers). > > > */ > > > else if (n_sets == 1 > > > - && MEM_P (trial) > > > + && ! CALL_P (insn) > > > + && (MEM_P (trial) || REG_P (trial)) > > >&& MEM_P (dest) > > >&& rtx_equal_p (trial, dest) > > >&& !side_effects_p (dest) > > > > This adds the !CALL_P (no space btw) condition, why is that? > > > > (Is that CCmode reg-reg move comment about rs6000? Huh, we *do* have > > patterns for that, or *should* have at least!) > I fixed the extraneous whitespace and committed the change. > The new test fails on ARM: FAIL: gcc.c-torture/compile/pr90275.c -O3 -g (internal compiler error) during RTL pass: cse_local /gcc/testsuite/gcc.c-torture/compile/pr90275.c: In function 'e': /gcc/testsuite/gcc.c-torture/compile/pr90275.c:25:1: internal compiler error: in insert_regs, at cse.c:1128 0x15725bd insert_regs /gcc/cse.c:1128 0x1579731 cse_insn /gcc/cse.c:5957 0x157aff6 cse_extended_basic_block /gcc/cse.c:6615 0x157aff6 cse_main /gcc/cse.c:6794 0x157bc0d rest_of_handle_cse_after_global_opts /gcc/cse.c:7766 0x157bc0d execute /gcc/cse.c:7817 Please submit a full bug report, Is the patch supposed to fix all the ICEs on ARM? I see this with cross-compilers, it seems OK on native builds? (I can't find error reports about this on gcc-testresults) Christophe > THanks, > jeff > > >
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote: > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote: > > > I don't think each stanza of code should use it's own "noop-ness", no. > > Richard's patch is actually better than mine in that regard as it handles > > mem > > and > > reg nop moves in an indentical way. I don't think refactoring the cse.c > > code > > is > > advisable now though -- but I do want to fix the multiply-reported ICE on > > ARM > > and > > Richard's cse changes are the cleanest way to do that that I can see. > > It looks pretty simple, yeah... All of CSE is hopelessly fragile, but > this patch does not make things worse. > > > > I don't know if this patch makes matters worse or not. It doesn't seem > > > suitable for stage 4 though. And Richard said the cse.c part breaks > > > rs6000, if that is true, yes I do object ;-) > > The rs6000 port breakage is trivial to fix. In fact, I did so and ran it > > through > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 > > native > > and more. > > I don't see anything rs6000 below? Is it just this generic code? > > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn) > > } > > > > /* Similarly, lots of targets don't allow no-op > > -(set (mem x) (mem x)) moves. */ > > +(set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) > > +might be impossible for certain registers (like CC registers). */ > > else if (n_sets == 1 > > - && MEM_P (trial) > > + && ! CALL_P (insn) > > + && (MEM_P (trial) || REG_P (trial)) > >&& MEM_P (dest) > >&& rtx_equal_p (trial, dest) > >&& !side_effects_p (dest) > > This adds the !CALL_P (no space btw) condition, why is that? > > (Is that CCmode reg-reg move comment about rs6000? Huh, we *do* have > patterns for that, or *should* have at least!) I fixed the extraneous whitespace and committed the change. THanks, jeff >
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote: > On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote: > > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: > > > > else if (n_sets == 1 > > > > - && MEM_P (trial) > > > > + && ! CALL_P (insn) > > > > + && (MEM_P (trial) || REG_P (trial)) > > > >&& MEM_P (dest) > > > >&& rtx_equal_p (trial, dest) > > > >&& !side_effects_p (dest) > > > > > > This adds the !CALL_P (no space btw) condition, why is that? > > Because n_sets is not valid for CALL_P insns which resulted in a failure on > > ppc. > > See find_sets_in_insn which ignores the set on the LHS of a call. So > > imagine > > if > > we had a nop register set in parallel with a (set (reg) (call ...)). We'd > > end up > > deleting the entire PARALLEL which is obviously wrong. > > Ah, I see. So this is exposed on Power by the TOC stuff, I guess? CSE > sees a TOC set parallel with a call as a no-op because it is set to the > same value (an unspec, not an unspec_volatile) that GCC can derive is > already in the TOC reg? Or is this some other case? Not entirely sure. Richard's message didn't include the precise details. > > The change sounds fine, fwiw. > > > One could argue that find_sets_in_insn should be fixed as well. I'd be > > worried > > about fallout from that. > > Should it ignore all SETs in parallel with a call? Or what do you want > to fix there? I was thinking the other way. Make it include sets even those for the function return value. Having n_sets's meaning be different for CALL_INSNs vs other INSNs just seems like a poor implementation decision because of the inconsistency. jeff
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote: > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: > > > else if (n_sets == 1 > > > -&& MEM_P (trial) > > > +&& ! CALL_P (insn) > > > +&& (MEM_P (trial) || REG_P (trial)) > > > && MEM_P (dest) > > > && rtx_equal_p (trial, dest) > > > && !side_effects_p (dest) > > > > This adds the !CALL_P (no space btw) condition, why is that? > Because n_sets is not valid for CALL_P insns which resulted in a failure on > ppc. > See find_sets_in_insn which ignores the set on the LHS of a call. So imagine > if > we had a nop register set in parallel with a (set (reg) (call ...)). We'd > end up > deleting the entire PARALLEL which is obviously wrong. Ah, I see. So this is exposed on Power by the TOC stuff, I guess? CSE sees a TOC set parallel with a call as a no-op because it is set to the same value (an unspec, not an unspec_volatile) that GCC can derive is already in the TOC reg? Or is this some other case? The change sounds fine, fwiw. > One could argue that find_sets_in_insn should be fixed as well. I'd be > worried > about fallout from that. Should it ignore all SETs in parallel with a call? Or what do you want to fix there? Segher
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote: > > > > I don't know if this patch makes matters worse or not. It doesn't seem > > > suitable for stage 4 though. And Richard said the cse.c part breaks > > > rs6000, if that is true, yes I do object ;-) > > The rs6000 port breakage is trivial to fix. In fact, I did so and ran it > > through > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 > > native > > and more. > > I don't see anything rs6000 below? Is it just this generic code? It's just generic code. THe rs6000 issue is fixed by the !CALL_P condition. > > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn) > > } > > > > /* Similarly, lots of targets don't allow no-op > > -(set (mem x) (mem x)) moves. */ > > +(set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) > > +might be impossible for certain registers (like CC registers). */ > > else if (n_sets == 1 > > - && MEM_P (trial) > > + && ! CALL_P (insn) > > + && (MEM_P (trial) || REG_P (trial)) > >&& MEM_P (dest) > >&& rtx_equal_p (trial, dest) > >&& !side_effects_p (dest) > > This adds the !CALL_P (no space btw) condition, why is that? Because n_sets is not valid for CALL_P insns which resulted in a failure on ppc. See find_sets_in_insn which ignores the set on the LHS of a call. So imagine if we had a nop register set in parallel with a (set (reg) (call ...)). We'd end up deleting the entire PARALLEL which is obviously wrong. One could argue that find_sets_in_insn should be fixed as well. I'd be worried about fallout from that. jeff
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote: > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote: > > I don't think each stanza of code should use it's own "noop-ness", no. > Richard's patch is actually better than mine in that regard as it handles mem > and > reg nop moves in an indentical way. I don't think refactoring the cse.c code > is > advisable now though -- but I do want to fix the multiply-reported ICE on ARM > and > Richard's cse changes are the cleanest way to do that that I can see. It looks pretty simple, yeah... All of CSE is hopelessly fragile, but this patch does not make things worse. > > I don't know if this patch makes matters worse or not. It doesn't seem > > suitable for stage 4 though. And Richard said the cse.c part breaks > > rs6000, if that is true, yes I do object ;-) > The rs6000 port breakage is trivial to fix. In fact, I did so and ran it > through > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native > and more. I don't see anything rs6000 below? Is it just this generic code? > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn) > } > > /* Similarly, lots of targets don't allow no-op > - (set (mem x) (mem x)) moves. */ > + (set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) > + might be impossible for certain registers (like CC registers). */ > else if (n_sets == 1 > -&& MEM_P (trial) > +&& ! CALL_P (insn) > +&& (MEM_P (trial) || REG_P (trial)) > && MEM_P (dest) > && rtx_equal_p (trial, dest) > && !side_effects_p (dest) This adds the !CALL_P (no space btw) condition, why is that? (Is that CCmode reg-reg move comment about rs6000? Huh, we *do* have patterns for that, or *should* have at least!) Segher
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote: > On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote: > > On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote: > > > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote: > > > > Yea, it's closely related. In your case you need to effectively ignore > > > > the nop insn to get the optimization you want. In mine that nop insn > > > > causes an ICE. > > > > > > > > I think we could take your cse bits + adding a !CALL_P separately from > > > > the simplify-rtx stuff which Segher objected to. THat'd likely solve > > > > the ARM ICEs and take you a tiny step forward on optimizing that SVE > > > > case. Thoughts? > > > > > > CSE should consistently keep track of what insns are no-op moves (in its > > > definition, all passes have a slightly different definition of this), > > > and use that everywhere consistently. > > So does that mean you object to the cse.c portion of Richard's patch? > > It's more a "what we need to do in the future" thing, it is stage 4 now, > it is too big a change to do now. I suspect you're referring to the simplify-rtx bits from his patch which I agree are not appropriate for stage4. The cse bits from that patch are are simple. > > What patch? The "34" patch? https://gcc.gnu.org/r278411 . > > I don't think each stanza of code should use it's own "noop-ness", no. Richard's patch is actually better than mine in that regard as it handles mem and reg nop moves in an indentical way. I don't think refactoring the cse.c code is advisable now though -- but I do want to fix the multiply-reported ICE on ARM and Richard's cse changes are the cleanest way to do that that I can see. > > I don't know if this patch makes matters worse or not. It doesn't seem > suitable for stage 4 though. And Richard said the cse.c part breaks > rs6000, if that is true, yes I do object ;-) The rs6000 port breakage is trivial to fix. In fact, I did so and ran it through my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native and more. Concretely I'm proposing the following patch to address 90275 and its duplicates. PR rtl-optimization/90275 * cse.c (cse_insn): Delete no-op register moves too. PR rtl-optimization/90275 * gcc.c-torture/compile/pr90275.c: New test. diff --git a/gcc/cse.c b/gcc/cse.c index 79ee0ce80e3..1779bb9a333 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -4625,7 +4625,7 @@ cse_insn (rtx_insn *insn) for (i = 0; i < n_sets; i++) { bool repeat = false; - bool mem_noop_insn = false; + bool noop_insn = false; rtx src, dest; rtx src_folded; struct table_elt *elt = 0, *p; @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn) } /* Similarly, lots of targets don't allow no-op -(set (mem x) (mem x)) moves. */ +(set (mem x) (mem x)) moves. Even (set (reg x) (reg x)) +might be impossible for certain registers (like CC registers). */ else if (n_sets == 1 - && MEM_P (trial) + && ! CALL_P (insn) + && (MEM_P (trial) || REG_P (trial)) && MEM_P (dest) && rtx_equal_p (trial, dest) && !side_effects_p (dest) @@ -5334,7 +5336,7 @@ cse_insn (rtx_insn *insn) || insn_nothrow_p (insn))) { SET_SRC (sets[i].rtl) = trial; - mem_noop_insn = true; + noop_insn = true; break; } @@ -5562,8 +5564,8 @@ cse_insn (rtx_insn *insn) sets[i].rtl = 0; } - /* Similarly for no-op MEM moves. */ - else if (mem_noop_insn) + /* Similarly for no-op moves. */ + else if (noop_insn) { if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn)) cse_cfg_altered = true; diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c b/gcc/testsuite/gcc.c-torture/compile/pr90275.c new file mode 100644 index 000..83e0df77226 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c @@ -0,0 +1,27 @@ +a, b, c; + +long long d; + +e() { + + char f; + + for (;;) { + +c = a = c ? 5 : 0; + +if (f) { + + b = a; + + f = d; + +} + +(d || b) < (a > e) ?: (b ? 0 : f) || (d -= f); + + } + +} + +
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote: > On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote: > > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote: > > > Yea, it's closely related. In your case you need to effectively ignore > > > the nop insn to get the optimization you want. In mine that nop insn > > > causes an ICE. > > > > > > I think we could take your cse bits + adding a !CALL_P separately from > > > the simplify-rtx stuff which Segher objected to. THat'd likely solve > > > the ARM ICEs and take you a tiny step forward on optimizing that SVE > > > case. Thoughts? > > > > CSE should consistently keep track of what insns are no-op moves (in its > > definition, all passes have a slightly different definition of this), > > and use that everywhere consistently. > So does that mean you object to the cse.c portion of Richard's patch? It's more a "what we need to do in the future" thing, it is stage 4 now, it is too big a change to do now. What patch? The "34" patch? https://gcc.gnu.org/r278411 . I don't think each stanza of code should use it's own "noop-ness", no. I don't know if this patch makes matters worse or not. It doesn't seem suitable for stage 4 though. And Richard said the cse.c part breaks rs6000, if that is true, yes I do object ;-) Segher
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote: > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote: > > Yea, it's closely related. In your case you need to effectively ignore > > the nop insn to get the optimization you want. In mine that nop insn > > causes an ICE. > > > > I think we could take your cse bits + adding a !CALL_P separately from > > the simplify-rtx stuff which Segher objected to. THat'd likely solve > > the ARM ICEs and take you a tiny step forward on optimizing that SVE > > case. Thoughts? > > CSE should consistently keep track of what insns are no-op moves (in its > definition, all passes have a slightly different definition of this), > and use that everywhere consistently. So does that mean you object to the cse.c portion of Richard's patch? Jeff >
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote: > Yea, it's closely related. In your case you need to effectively ignore > the nop insn to get the optimization you want. In mine that nop insn > causes an ICE. > > I think we could take your cse bits + adding a !CALL_P separately from > the simplify-rtx stuff which Segher objected to. THat'd likely solve > the ARM ICEs and take you a tiny step forward on optimizing that SVE > case. Thoughts? CSE should consistently keep track of what insns are no-op moves (in its definition, all passes have a slightly different definition of this), and use that everywhere consistently. (Or we should rewrite CSE). Segher
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Wed, 2020-02-05 at 13:30 +, Richard Sandiford wrote: > Jeff Law writes: > > Richard & Segher, if y'all could check my analysis here, it'd be > > appreciated. > > > > pr90275 is a P2 regression that is only triggering on ARM. David's > > testcase in c#1 is the best for this problem as it doesn't require > > magic flags like -fno-dce to trigger. > > > > The block in question: > > > > > (code_label 89 88 90 24 15 (nil) [0 uses]) > > > (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK) > > > (insn 97 90 98 24 (parallel [ > > > (set (reg:CC 100 cc) > > > (compare:CC (reg:SI 131 [ d_lsm.21 ]) > > > (const_int 0 [0]))) > > > (set (reg:SI 135 [ d_lsm.21 ]) > > > (reg:SI 131 [ d_lsm.21 ])) > > > ]) "pr90275.c":21:45 248 {*movsi_compare0} > > > (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ]) > > > (nil))) > > > (insn 98 97 151 24 (set (reg:SI 136 [+4 ]) > > > (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 > > > {*arm_movsi_insn} > > > (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ]) > > > (expr_list:REG_DEAD (reg:CC 100 cc) > > > (nil > > > (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ]) > > > (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn} > > > (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ]) > > > (nil))) > > > (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ]) > > > (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn} > > > (expr_list:REG_DEAD (reg:SI 136 [+4 ]) > > > (nil))) > > > > > insns 97 and 151 are the most important. > > > > We process insn 97 which creates an equivalency between r135 and r131. > > This is expressed by putting both on on the "same_value" chain > > (table_elt->{next,prev}_same_value). > > > > When we put the REGs on the chain we'll set REG_QTY to a positive > > number which indicates their values are valid. > > > > We continue processing insns forward and run into insn 151 which is a > > self-copy. > > > > First CSE will invalidate r131 (because its set). Invalidation is > > accomplished by setting REG_QTY for r131 to a negative value. It does > > not remove r131 from the same value chains. > > > > Then CSE will call insert_regs for r131. The qty is not valid, so we > > get into this code: > > > > > if (modified || ! qty_valid) > > > { > > > if (classp) > > > for (classp = classp->first_same_value; > > > classp != 0; > > > classp = classp->next_same_value) > > > if (REG_P (classp->exp) > > > && GET_MODE (classp->exp) == GET_MODE (x)) > > > { > > > unsigned c_regno = REGNO (classp->exp); > > > > > > gcc_assert (REGNO_QTY_VALID_P (c_regno)); > > > [ ... ] > > > > So we walk the chain of same values for r131. WHen walking we run into > > r131 itself. Since r131 has been invalidated we trip the assert. > > > > > > The fix is pretty simple. We just arrange to stop processing insns > > that are nop reg->reg copies much like we already do for mem->mem > > copies and (set (pc) (pc)). > > > > This has bootstrapped and regression tested on x86_64. I've also > > verified it fixes the testcase in c#1 of pr90275, the test in pr93125 > > and pr92388. Interestingly enough I couldn't trigger the original > > testcase in 90275, but I'm confident this is ultimately all the same > > problem. > > This looks similar to the infamous (to me): > >https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html > > which had to be reverted because it broke powerpc64 bootstrap. > The problem was that n_sets is misleading for calls: > >https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html > > That's easy to fix (and I have a fix). But given the damage this caused > last time, I think it's probably best left to GCC 11. Yea, it's closely related. In your case you need to effectively ignore the nop insn to get the optimization you want. In mine that nop insn causes an ICE. I think we could take your cse bits + adding a !CALL_P separately from the simplify-rtx stuff which Segher objected to. THat'd likely solve the ARM ICEs and take you a tiny step forward on optimizing that SVE case. Thoughts? Jeff
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
Jeff Law writes: > Richard & Segher, if y'all could check my analysis here, it'd be > appreciated. > > pr90275 is a P2 regression that is only triggering on ARM. David's > testcase in c#1 is the best for this problem as it doesn't require > magic flags like -fno-dce to trigger. > > The block in question: > >> (code_label 89 88 90 24 15 (nil) [0 uses]) >> (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK) >> (insn 97 90 98 24 (parallel [ >> (set (reg:CC 100 cc) >> (compare:CC (reg:SI 131 [ d_lsm.21 ]) >> (const_int 0 [0]))) >> (set (reg:SI 135 [ d_lsm.21 ]) >> (reg:SI 131 [ d_lsm.21 ])) >> ]) "pr90275.c":21:45 248 {*movsi_compare0} >> (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ]) >> (nil))) >> (insn 98 97 151 24 (set (reg:SI 136 [+4 ]) >> (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn} >> (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ]) >> (expr_list:REG_DEAD (reg:CC 100 cc) >> (nil >> (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ]) >> (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn} >> (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ]) >> (nil))) >> (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ]) >> (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn} >> (expr_list:REG_DEAD (reg:SI 136 [+4 ]) >> (nil))) >> > insns 97 and 151 are the most important. > > We process insn 97 which creates an equivalency between r135 and r131. > This is expressed by putting both on on the "same_value" chain > (table_elt->{next,prev}_same_value). > > When we put the REGs on the chain we'll set REG_QTY to a positive > number which indicates their values are valid. > > We continue processing insns forward and run into insn 151 which is a > self-copy. > > First CSE will invalidate r131 (because its set). Invalidation is > accomplished by setting REG_QTY for r131 to a negative value. It does > not remove r131 from the same value chains. > > Then CSE will call insert_regs for r131. The qty is not valid, so we > get into this code: > >> if (modified || ! qty_valid) >> { >> if (classp) >> for (classp = classp->first_same_value; >> classp != 0; >> classp = classp->next_same_value) >> if (REG_P (classp->exp) >> && GET_MODE (classp->exp) == GET_MODE (x)) >> { >> unsigned c_regno = REGNO (classp->exp); >> >> gcc_assert (REGNO_QTY_VALID_P (c_regno)); >> [ ... ] > > So we walk the chain of same values for r131. WHen walking we run into > r131 itself. Since r131 has been invalidated we trip the assert. > > > The fix is pretty simple. We just arrange to stop processing insns > that are nop reg->reg copies much like we already do for mem->mem > copies and (set (pc) (pc)). > > This has bootstrapped and regression tested on x86_64. I've also > verified it fixes the testcase in c#1 of pr90275, the test in pr93125 > and pr92388. Interestingly enough I couldn't trigger the original > testcase in 90275, but I'm confident this is ultimately all the same > problem. This looks similar to the infamous (to me): https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html which had to be reverted because it broke powerpc64 bootstrap. The problem was that n_sets is misleading for calls: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html That's easy to fix (and I have a fix). But given the damage this caused last time, I think it's probably best left to GCC 11. Thanks, Richard
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
Hi all, On Wed, Feb 05, 2020 at 07:26:03AM +0100, Jakub Jelinek wrote: > On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote: > > --- a/gcc/cse.c > > +++ b/gcc/cse.c > > @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn) > > sets[i].rtl = 0; > > } > > > > + /* Similarly for no-op moves. */ It says "no-op MEM moves" right above, so it should say "no-op REG moves" here? > > + if (n_sets == 1 > > + && GET_CODE (src) == REG > > Just nits: > REG_P (src) ? Hey that is my nit! Find your own! ;-) > > + && src == dest) > > Is pointer comparison ok? It isn't, it doesn't work for hard registers. > I mean, shouldn't we instead do > rtx_equal_p (src, dest), This does not see e.g. (set (reg:SI 123) (subreg:SI (reg:DI 123) 0)) as no-op move. > set_noop_p (sets[i].rtl) This doesn't catch all such cases either. > or noop_move_p (insn)? And this one is plain wrong (it should be called something with "maybe" in the name, it returns false if it thinks that may lead to better optimisation, see the REG_EQUAL handling). What we need here is a test whether CSE can ignore this insn, and we will run into this problem if we don't (Jeff's analysis). Does CSE already have everything it uses to make that decision scribbled away somewhere, when we get here? It would be good if we could use the exact same condition (same predicate function for example) as what would lead to the problem later, or we'll be playing whack-a-mole for a while (or CSE is completely rewritten soon, my preferred option, but "soon" on a GCC timescale will take way too long for the PR). Segher
Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse
On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote: > --- a/gcc/cse.c > +++ b/gcc/cse.c > @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn) > sets[i].rtl = 0; > } > > + /* Similarly for no-op moves. */ > + if (n_sets == 1 > + && GET_CODE (src) == REG Just nits: REG_P (src) ? > + && src == dest) Is pointer comparison ok? I mean, shouldn't we instead do rtx_equal_p (src, dest), set_noop_p (sets[i].rtl) or noop_move_p (insn)? > + * gcc.c-torture/compile/pr90275.c: New test Missing full stop. > +++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c > @@ -0,0 +1,27 @@ > +a, b, c; int > + > +long long d; > + > +e() { void (unless the ommission of those makes it not reproduce anymore, which I doubt). > + > + char f; > + > + for (;;) { > + > +c = a = c ? 5 : 0; > + > +if (f) { > + > + b = a; > + > + f = d; > + > +} > + > +(d || b) < (a > e) ?: (b ? 0 : f) || (d -= f); > + > + } > + > +} Jakub