[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721 Bug 19721 depends on bug 19790, which changed state. Bug 19790 Summary: equality not noticed when signedness differs. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19790 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721 Bug 19721 depends on bug 72712, which changed state. Bug 72712 Summary: [7 Regression] Tenfold compile time regression https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72712 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721 --- Comment #28 from Steven Bosscher --- (In reply to comment #25) FWIW this case is handled at the GIMPLE level since at least GCC 4.3.
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721 --- Comment #27 from Steven Bosscher 2013-05-09 10:39:57 UTC --- (In reply to comment #26) > With TARGET_LEGITIMATE_ADDRESS_P rejecting (costly) symbols_refs inside > memory references, cse_local brings the number of __malloc_av references down > when compiling newlib's malloc-r.c: Can you please open a fresh PR for this, with the information necessary to reproduce this problem, and make the new PR block this one? This PR is a meta-bug, reporting specific problems is best done in new PRs.
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721 --- Comment #26 from Jorn Wolfgang Rennecke 2013-05-09 00:32:28 UTC --- The tree optimizers have become extremely aggressive on constant propagation, so cse is needed more than ever to undo the damage. With TARGET_LEGITIMATE_ADDRESS_P rejecting (costly) symbols_refs inside memory references, cse_local brings the number of __malloc_av references down when compiling newlib's malloc-r.c: $ grep -c 'symbol_ref.*__malloc_av_' mallocr-4.4.i.* mallocr-4.4.i.165r.expand:70 mallocr-4.4.i.166r.vregs:35 mallocr-4.4.i.167r.into_cfglayout:35 mallocr-4.4.i.168r.jump:70 mallocr-4.4.i.169r.subreg1:35 mallocr-4.4.i.170r.dfinit:35 mallocr-4.4.i.171r.cse1:70 mallocr-4.4.i.172r.fwprop1:41 mallocr-4.4.i.173r.cprop1:54 mallocr-4.4.i.175r.hoist:42 mallocr-4.4.i.176r.cprop2:30 mallocr-4.4.i.178r.cse_local:26
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 23911, which changed state. Bug 23911 Summary: Failure to propagate constants from a const initializer for _Complex http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23911 What|Old Value |New Value Status|UNCONFIRMED |NEW Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||23911 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||23455 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-08-17 20:07 --- Subject: Re: [meta-bug] optimizations that CSE still catches >>unsigned outcnt; >>extern void flush_outbuf(void); >> >>void >>bi_windup(unsigned char *outbuf, unsigned char bi_buf) >>{ >>outbuf[outcnt] = bi_buf; >>if (outcnt == 16384) >>flush_outbuf(); >>outbuf[outcnt] = bi_buf; >>} >> >> >Presumably the store into outbuf prevents the SSA optimizers from >commonizing the first two loads of outcnt and the call to flush_outbuf >prevents the SSA optimizers from commonizing the last load of outcnt on >the path which bypasses the call to flush_outbuf. Right? > > Not really. First of all, as stevenb pointed out on IRC, this is quite specific to powerpc-apple-darwin and other targets where programs are compiled as PIC by default. Steven's SPEC testing under Linux has not shown this behavior, but shared libraries there *will* suffer from the same problem! We'd want the code to become void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { int t1 = outcnt; outbuf[t1] = bi_buf; int t2 = outcnt, t3; if (t2 == 16384) { flush_outbuf(); t3 = outcnt; } else t3 = t2; outbuf[t3] = bi_buf; } If we disable CSE path following, and keep only one GCSE pass, we "waste" the opportunity to do this optimization, because we generate temporaries for the partially redundant address of outcnt. With two GCSE passes, the second is able to eliminate the partially redundant load. Of course what we really miss is load PRE on the tree level, but it is good that --param max-gcse-passes=2 can be a replacement of -fcse-skip-blocks -fcse-follow-jumps. Testing mainline GCC against a patch including no path following + 2 GCSE passes + my forward propagation pass, I'm seeing SPEC improvements of +2 to +8% on powerpc-apple-darwin. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From law at redhat dot com 2005-08-17 19:31 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Wed, 2005-08-17 at 08:03 +, bonzini at gcc dot gnu dot org wrote: > --- Additional Comments From bonzini at gcc dot gnu dot org 2005-08-17 > 08:03 --- > This small testcase is a typical case of the optimizations that CSE path > following catches on PowerPC: > > unsigned outcnt; > extern void flush_outbuf(void); > > void > bi_windup(unsigned char *outbuf, unsigned char bi_buf) > { > outbuf[outcnt] = bi_buf; > if (outcnt == 16384) > flush_outbuf(); > outbuf[outcnt] = bi_buf; > } > > Loading outcnt takes *three* insns: one to load the high part of the address, > one to load the low part, one to load from memory. CSE reduces them to two by > combining the loading of the low part with the load from memory. With CSE > path > following, in addition, CSE is able to factor the loads of the high part of > the > address, and do just one of them. > > Now here comes GCSE. If CSE path following is on, GCSE sees that the third > occurrence of outcnt is the same as the second, and eliminates it. If it is > off, GCSE is wasted to factor the loading of the address high parts. > > So, if we remove a pseudo-global cse pass by disabling path following, it > would > make sense to bump the default max-gcse-passes to 2. Presumably the store into outbuf prevents the SSA optimizers from commonizing the first two loads of outcnt and the call to flush_outbuf prevents the SSA optimizers from commonizing the last load of outcnt on the path which bypasses the call to flush_outbuf. Right? Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-08-17 08:03 --- This small testcase is a typical case of the optimizations that CSE path following catches on PowerPC: unsigned outcnt; extern void flush_outbuf(void); void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { outbuf[outcnt] = bi_buf; if (outcnt == 16384) flush_outbuf(); outbuf[outcnt] = bi_buf; } Loading outcnt takes *three* insns: one to load the high part of the address, one to load the low part, one to load from memory. CSE reduces them to two by combining the loading of the low part with the load from memory. With CSE path following, in addition, CSE is able to factor the loads of the high part of the address, and do just one of them. Now here comes GCSE. If CSE path following is on, GCSE sees that the third occurrence of outcnt is the same as the second, and eliminates it. If it is off, GCSE is wasted to factor the loading of the address high parts. So, if we remove a pseudo-global cse pass by disabling path following, it would make sense to bump the default max-gcse-passes to 2. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 16961, which changed state. Bug 16961 Summary: Poor x86-64 performance with 128bit ints http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16961 What|Old Value |New Value Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 19791, which changed state. Bug 19791 Summary: [tcb] A constant not fully propagated http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19791 What|Old Value |New Value Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From bonzini at gcc dot gnu dot org 2005-04-26 17:35 --- Another thing that CSE does is promoting paradoxical subregs to regs. On PowerPC at least, recursive calls of fold_rtx are almost ineffective except for this. Such promotion helps because equiv_constant does not look into subregs. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-04-16 15:15 --- It is apparently not possible to convince people that any optimizations in CSE can be removed, so working on this is pointless for me. See http://gcc.gnu.org/ml/gcc-patches/2005-04/msg01498.html. -- What|Removed |Added Status|NEW |SUSPENDED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 19789, which changed state. Bug 19789 Summary: tree optimizers do not know that constant global variables do not change http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19789 What|Old Value |New Value Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 19659, which changed state. Bug 19659 Summary: GCC does not remove an "if" statement that never triggers. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19659 What|Old Value |New Value Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||20514 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 19788, which changed state. Bug 19788 Summary: Inconsistent handling of -1. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19788 What|Old Value |New Value Status|WAITING |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 20132, which changed state. Bug 20132 Summary: Pessimization of induction variable and missed hoisting opportunity http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20132 What|Old Value |New Value Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From amylaar at gcc dot gnu dot org 2005-03-11 19:43 --- (In reply to comment #18) > IMHO. One of the tricks with the mult and divmod expanders is precisely > when should we expand them into their component operations. We clearly > don't want to do it at the very start or the very end of hte SSA path, > but somewhere in the middle. One of the sh64 patches that I intend to merge expands signed integer division into calculating the inverse of the divisor (at runtime) and then multiplying the dividend with that inverse. It's broken up into operations that assign to one pseudo register each, so this stuff gets full exposure to cse, gcse and rtl loop optimizations. If the inverse calculation and the mutiply with the dividend end up in the same basic block and the inverse is only used once, some combiner patterns combine & split this stuff again to get a more scheduler-friendly data flow. Do you think we should have machine-dependent tree expanders so that such details can already be exposed to (g)cse and loop optimizations at the tree level? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 20130, which changed state. Bug 20130 Summary: Fold a * -1 - 1 into ~a; http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20130 What|Old Value |New Value Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-03-06 22:14 --- Just to give people an idea of how close we are to optimizing well enough that the calls to fold_rtx in CSE are almost all no-ops, here are some numbers taken over all cc1-i files on amd64: Number of times fold_rtx is called: 13882333 Number of times it returns something other than the incoming rtx x: 70001 Number of times fold_rtx is called by other functions than itself: 9323647 Number of times it returns something other than x: 8526 A few rtxes that fold_rtx handles: Loads from constant pool: Trying to fold rtx: (float_extend:DF (mem/u/i:SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S4 A32])) Trying to fold rtx: (mem/u/i:SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S4 A32]) Trying to fold rtx: (symbol_ref/u:DI ("*.LC0") [flags 0x2]) Returning X unchanged. Returning new rtx: (const_double:SF 1.0e+0 [0x0.8p+1]) Returning new rtx: (const_double:DF 1.0e+0 [0x0.8p+1]) Folded jumps: Trying to fold rtx: (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 73) (pc)) Trying to fold rtx: (pc) Returning X unchanged. Trying to fold rtx: (eq (reg:CCZ 17 flags) (const_int 0 [0x0])) Trying to fold rtx: (reg:SI 66 [ D.10402 ]) Returning X unchanged. Trying to fold rtx: (const_int 4 [0x4]) Returning X unchanged. Returning new rtx: (const_int 1 [0x1]) Returning new rtx: (label_ref 73) Apparently an equivalent expression with lower cost: Trying to fold rtx: (plus:QI (subreg:QI (reg:SI 251) 0) (subreg:QI (reg:SI 251) 0)) Trying to fold rtx: (subreg:QI (reg:SI 251) 0) Trying to fold rtx: (reg:SI 251) Returning X unchanged. Returning X unchanged. Trying to fold rtx: (subreg:QI (reg:SI 251) 0) Trying to fold rtx: (reg:SI 251) Returning X unchanged. Returning X unchanged. Returning new rtx: (ashift:QI (subreg:QI (reg:SI 251) 0) (const_int 1 [0x1])) Likewise: Trying to fold rtx: (mult:DI (reg:DI 63 [ .comb_vect.length ]) (const_int 4 [0x4])) Returning new rtx: (ashift:DI (reg:DI 63 [ .comb_vect.length ]) (const_int 2 [0x2])) It'd be interesting to find out how many of these things combine and later CSE passes would catch (or miss), and the tree-cleanup-branch compares. I will look at the latter first. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From law at redhat dot com 2005-03-06 19:56 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Sun, 2005-03-06 at 09:30 +, stevenb at suse dot de wrote: > --- Additional Comments From stevenb at suse dot de 2005-03-06 09:30 > --- > Subject: Re: [meta-bug] optimizations that CSE still catches > > On Sunday 06 March 2005 06:59, law at redhat dot com wrote: > > Ah. Yes. What did it look like in the tree dumps? Unless > > one of the expanders is creating the negation I would think this > > would be pretty easy to catch in fold-const.c > > This is PR20130. We don't fold -1*x to -x, ie. we never > fold the MULT_EXPR to a NEGATE_EXPR. PR20130 has a patch. Ok. That should be pretty easy to fix. > > > expand_mult? Sigh. That's been in the back of my mind for a couple > > years now -- it's probably one of the largest RTL expanders which > > needs to have a lot of its functionality moved into trees. > > That'd be nice. > > In this case, Roger found out that for DImode negative constants > it completely bypasses expand_mult_const. Fixing that would help > for now. expand_mult, expand_divmod and the switch expanders are the biggies IMHO. One of the tricks with the mult and divmod expanders is precisely when should we expand them into their component operations. We clearly don't want to do it at the very start or the very end of hte SSA path, but somewhere in the middle. jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From stevenb at suse dot de 2005-03-06 09:30 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Sunday 06 March 2005 06:59, law at redhat dot com wrote: > Ah. Yes. What did it look like in the tree dumps? Unless > one of the expanders is creating the negation I would think this > would be pretty easy to catch in fold-const.c This is PR20130. We don't fold -1*x to -x, ie. we never fold the MULT_EXPR to a NEGATE_EXPR. PR20130 has a patch. > expand_mult? Sigh. That's been in the back of my mind for a couple > years now -- it's probably one of the largest RTL expanders which > needs to have a lot of its functionality moved into trees. That'd be nice. In this case, Roger found out that for DImode negative constants it completely bypasses expand_mult_const. Fixing that would help for now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From law at redhat dot com 2005-03-06 05:59 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Sat, 2005-03-05 at 10:39 +, stevenb at suse dot de wrote: > --- Additional Comments From stevenb at suse dot de 2005-03-05 10:39 > --- > Subject: Re: [meta-bug] optimizations that CSE still catches > > > Am I missing something here? I guess I'm not sure what point you're > > trying to make. > > It just seems that we could do better on initial RTL generation, e.g. > > ;; j = k * -1 > (insn 23 21 0 (parallel [ > (set (reg/v:DI 64 [ j ]) > (mult:DI (reg/v:DI 67 [ k ]) > (const_int -1 [0x]))) > (clobber (reg:CC 17 flags)) > ]) -1 (nil) > (nil)) > > which we later simplify in CSE: > > Working on insn: > (insn 23 21 24 0 (parallel [ > (set (reg/v:DI 64 [ j ]) > (mult:DI (reg/v:DI 67 [ k ]) > (const_int -1 [0x]))) > (clobber (reg:CC 17 flags)) > ]) 243 {*muldi3_1_rex64} (nil) > (nil)) > Trying to fold rtx: > (mult:DI (reg/v:DI 67 [ k ]) > (const_int -1 [0x])) > Returning new rtx: > (neg:DI (reg/v:DI 67 [ k ])) Ah. Yes. What did it look like in the tree dumps? Unless one of the expanders is creating the negation I would think this would be pretty easy to catch in fold-const.c [ ... ] > > These are the from the detailed .expand dump > (i.e. "cc1 t.c -O2 --fdump-rtl-expand-details -fdump-rtl-cse") > > So it seems to come from the MULT_EXPR expander in this case, but > we'll have to study expand a bit closer to be sure. expand_mult? Sigh. That's been in the back of my mind for a couple years now -- it's probably one of the largest RTL expanders which needs to have a lot of its functionality moved into trees. jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-03-06 00:32 --- The first case of comment #14 turns out to be PR20130. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From stevenb at suse dot de 2005-03-05 10:39 --- Subject: Re: [meta-bug] optimizations that CSE still catches > Am I missing something here? I guess I'm not sure what point you're > trying to make. It just seems that we could do better on initial RTL generation, e.g. ;; j = k * -1 (insn 23 21 0 (parallel [ (set (reg/v:DI 64 [ j ]) (mult:DI (reg/v:DI 67 [ k ]) (const_int -1 [0x]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) which we later simplify in CSE: Working on insn: (insn 23 21 24 0 (parallel [ (set (reg/v:DI 64 [ j ]) (mult:DI (reg/v:DI 67 [ k ]) (const_int -1 [0x]))) (clobber (reg:CC 17 flags)) ]) 243 {*muldi3_1_rex64} (nil) (nil)) Trying to fold rtx: (mult:DI (reg/v:DI 67 [ k ]) (const_int -1 [0x])) Returning new rtx: (neg:DI (reg/v:DI 67 [ k ])) Similarly, on a 64-bits host: ;; j = k * 4294967295 (insn 15 13 16 (set (reg:DI 63) (reg/v:DI 62 [ k ])) -1 (nil) (nil)) (insn 16 15 17 (parallel [ (set (reg:DI 64) (ashift:DI (reg:DI 63) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ]) (const_int 4294967296 [0x1])) (nil))) (insn 17 16 18 (parallel [ (set (reg:DI 65) (minus:DI (reg:DI 64) (reg/v:DI 62 [ k ]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ]) (const_int 4294967295 [0x])) (nil))) (insn 18 17 0 (set (reg/v:DI 59 [ j ]) (reg:DI 65)) -1 (nil) (nil)) which CSE turns into: Working on insn: (insn 15 13 16 0 (set (reg:DI 63 [ k ]) (reg/v:DI 62 [ k ])) 81 {*movdi_1_rex64} (nil) (nil)) Trying to fold rtx: (reg/v:DI 62 [ k ]) Returning X unchanged. Working on insn: (insn 16 15 17 0 (parallel [ (set (reg:DI 64) (ashift:DI (reg:DI 63 [ k ]) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ]) (const_int 4294967296 [0x1])) (nil))) Trying to fold rtx: (mult:DI (reg/v:DI 62 [ k ]) (const_int 4294967296 [0x1])) Returning new rtx: (ashift:DI (reg/v:DI 62 [ k ]) (const_int 32 [0x20])) Working on insn: (insn 17 16 18 0 (parallel [ (set (reg:DI 65) (minus:DI (reg:DI 64) (reg/v:DI 62 [ k ]))) (clobber (reg:CC 17 flags)) ]) 223 {*subdi_1_rex64} (nil) (expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ]) (const_int 4294967295 [0x])) (nil))) Trying to fold rtx: (minus:DI (reg:DI 64) (reg/v:DI 62 [ k ])) Returning X unchanged. Working on insn: (insn 18 17 19 0 (set (reg/v:DI 59 [ j ]) (reg:DI 65)) 81 {*movdi_1_rex64} (nil) (nil)) Trying to fold rtx: (reg:DI 65) Returning X unchanged. These are the from the detailed .expand dump (i.e. "cc1 t.c -O2 --fdump-rtl-expand-details -fdump-rtl-cse") So it seems to come from the MULT_EXPR expander in this case, but we'll have to study expand a bit closer to be sure. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From law at redhat dot com 2005-03-02 18:23 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Wed, 2005-03-02 at 11:50 +, steven at gcc dot gnu dot org wrote: > --- Additional Comments From steven at gcc dot gnu dot org 2005-03-02 > 11:50 --- > Here is a nice one: > > Working on insn: > (insn 215 214 216 15 (parallel [ > (set (reg:DI 176) > (ashift:DI (reg:DI 175) > (const_int 3 [0x3]))) > (clobber (reg:CC 17 flags)) > ]) -1 (nil) > (expr_list:REG_EQUAL (mult:DI (reg:DI 174) > (const_int 8 [0x8])) > (nil))) > Trying to fold rtx: > (mult:DI (reg:DI 174) > (const_int 8 [0x8])) > Returning new rtx: > (ashift:DI (reg:DI 174) > (const_int 3 [0x3])) > > Sometimes I just hate REG_EQUAL notes... Am I missing something here? I guess I'm not sure what point you're trying to make. It seems to me that (reg 174) must be equal to (reg 175) for the REG_EQUAL note to be valid. Which means they must either be set from equivalent expressions or we must have a copy insn between them. In the former case (set from equivalent expressions) we should figure out why DOM or PRE didn't catch the redundancy. In the latter case we'd want to see why we didn't copy propagate the copy. [ It's possible the copy occurs due to tree->rtl expansion -- there's still a fair number of ways to get silly copies at that phase. In which case we need to look into ways to eliminate the silly copies. IIRC some come from lameness in the API for some of our conversion routines. ] jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-03-02 11:50 --- Here is a nice one: Working on insn: (insn 215 214 216 15 (parallel [ (set (reg:DI 176) (ashift:DI (reg:DI 175) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (expr_list:REG_EQUAL (mult:DI (reg:DI 174) (const_int 8 [0x8])) (nil))) Trying to fold rtx: (mult:DI (reg:DI 174) (const_int 8 [0x8])) Returning new rtx: (ashift:DI (reg:DI 174) (const_int 3 [0x3])) Sometimes I just hate REG_EQUAL notes... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- Bug 19721 depends on bug 19938, which changed state. Bug 19938 Summary: Missed jump threading opportunity due to signedness difference http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19938 What|Old Value |New Value Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||20132 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||20130 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||19938 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-06 17:41 --- Arguably, PR16961 is not directly related. But if we fix that bug and the similar "long long" issues on 32 bits hosts, then the "64 bits arith on 32 bits hosts" thing should be a non-issue (assuming the tree optimizers do well). -- What|Removed |Added BugsThisDependsOn||16961 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From hubicka at ucw dot cz 2005-02-02 11:50 --- Subject: Re: [meta-bug] optimizations that CSE still catches > > --- Additional Comments From stevenb at suse dot de 2005-02-02 09:21 > --- > Subject: Re: [meta-bug] optimizations that CSE still catches > > On Monday 31 January 2005 22:35, law at redhat dot com wrote: > > Note I would _STRONGLY_ recommend people look at more than just the > > compiler when evaluating the old CSE code. In particular it is > > important that we look at things like 64bit arithmetic on 32bit > > hosts (which happens often in kernels, but not nearly as often > > in user level benchmarks). > > I was told crafty has a lot of 64bits arithmetic, so the -m32 > numbers for crafty should be an indication of possible regressions > in that area. And those numbers look OK to me. Crafty is special by using 64bit values as bitmaps rather then numbers, so it don't do addition/multiplication and friends much that produces most lousy artefacts. Honza > > If I can find some time, I'll try another benchmark suite to see > the effects of CSE path following are significant enough to still > be worth its cost. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721 > > --- You are receiving this mail because: --- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From stevenb at suse dot de 2005-02-02 09:21 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Monday 31 January 2005 22:35, law at redhat dot com wrote: > Note I would _STRONGLY_ recommend people look at more than just the > compiler when evaluating the old CSE code. In particular it is > important that we look at things like 64bit arithmetic on 32bit > hosts (which happens often in kernels, but not nearly as often > in user level benchmarks). I was told crafty has a lot of 64bits arithmetic, so the -m32 numbers for crafty should be an indication of possible regressions in that area. And those numbers look OK to me. If I can find some time, I'll try another benchmark suite to see the effects of CSE path following are significant enough to still be worth its cost. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From law at redhat dot com 2005-01-31 21:35 --- Subject: Re: [meta-bug] optimizations that CSE still catches On Mon, 2005-01-31 at 20:14 +, stevenb at suse dot de wrote: > --- Additional Comments From stevenb at suse dot de 2005-01-31 20:14 > --- > Subject: Re: [meta-bug] optimizations that CSE still catches > > My numbers for not disabling CSE completely but disabling path following > are a lot less pessimistic. This was on an AMD Opteron at 1600MHz: Right. That's what I'd focus on first -- that's what I was looking at when I realized eons ago when I realized that if we don't do a good job at jump threading, then we have little hope of ever drastically simplifying CSE. I've been stuck in jump threading hell ever since :-) Note I would _STRONGLY_ recommend people look at more than just the compiler when evaluating the old CSE code. In particular it is important that we look at things like 64bit arithmetic on 32bit hosts (which happens often in kernels, but not nearly as often in user level benchmarks). jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From stevenb at suse dot de 2005-01-31 20:14 --- Subject: Re: [meta-bug] optimizations that CSE still catches My numbers for not disabling CSE completely but disabling path following are a lot less pessimistic. This was on an AMD Opteron at 1600MHz: GCC was configured as: configure --enable-threads=posix --enable-languages="c,c++,f95" GCC bootstrap times for 'make -j1 bootstrap && make install': Bootstrap time base compiler: 2208 s Bootstrap time peak compiler: 2150 s (-2.6%) SPECint 64 bits Total time for base compilation: 192 s Total time for peak compilation: 180 s (-6.7%) base peakpeak/base 164.gzip 794799+0.63% 175.vpr 729715-1.92% 176.gcc 958963+0.52% 181.mcf 410411+0.24% 186.crafty1362 1380 +1.32% 197.parser558558= 252.eon X X 253.perlbmk 962964+0.21% 254.gap 774776+0.26% 255.vortex1159 1162 +0.26% 256.bzip2 779772-0.90% 300.twolf 836876+4.78% SPECfp 64 bits Total time for base compilation: 212 s Total time for peak compilation: 208 s (-1.9%) base peakpeak/base 168.wupwise 781793+1.53% 171.swim 690687-0.43% 172.mgrid 513514+0.02% 173.applu 624624= 177.mesa 1000998-0.20% 178.galgel X X 179.art 941953+1.28% 183.equake817820+0.37% 187.facerec 674677+0.44% 188.ammp 859859= 189.lucas 858858= 191.fma3d 699698-0.14% 200.sixtrack 382382= 301.apsi 770771+0.12% SPECint 32 bits Total time for base compilation: 257 s Total time for peak compilation: 246 s (-4.5%) base peakpeak/base 164.gzip 696700+0.57% 175.vpr 691710+2.74% 176.gcc 884875-1.02% 181.mcf 528530+0.38% 186.crafty920922+0.22% 197.parser629634+0.79% 252.eon 970963-0.72% 253.perlbmk 935938+0.32% 254.gap X X 255.vortex X X 256.bzip2 678681+0.04% 300.twolf 974966-0.82% SPECfp 32 bits Total time for base compilation: 210 s Total time for peak compilation: 204 s (-2.9%) base peakpeak/base 168.wupwise 672658-2.08% 171.swim 692696+0.58% 172.mgrid 370370= 173.applu 580580= 177.mesa 678655-3.39% 178.galgel X X 179.art 484483-0.21% 183.equake822821-0.12% 187.facerec 616617+0.16% 188.ammp 712713+0.14% 189.lucas 693695+0.20% 191.fma3d 716716= 200.sixtrack 422422= 301.apsi 685685= The SPEC numbers are the mean of three runs, so that's pretty solid. Index: params.def === RCS file: /cvs/gcc/gcc/gcc/params.def,v retrieving revision 1.53 diff -u -3 -p -r1.53 params.def --- params.def 20 Jan 2005 12:45:12 - 1.53 +++ params.def 31 Jan 2005 17:09:21 - @@ -321,7 +321,7 @@ DEFPARAM(PARAM_MIN_CROSSJUMP_INSNS, DEFPARAM(PARAM_MAX_CSE_PATH_LENGTH, "max-cse-path-length", "The maximum length of path considered in cse", -10, 0, 0) +1, 0, 0) /* The cost of expression in loop invariant motion that is considered expensive. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From dnovillo at gcc dot gnu dot org 2005-01-31 15:26 --- Similarly for em64t. Build times for SPECint were reduced by 9.2%. Build times for SPECfp were reduced by 7.5%. Compiler bootstrap times were reduced by 4.4%. Comparison between 20050127/spec-20050127.stats and 20050130/spec-20050130.stats (base) Compiler used in 20050127/spec-20050127.stats (Before) Compiler: gcc version 4.0.0 20050127 (experimental) Base flags: -O2 -march=nocona Peak flags: -O3 -march=nocona Processor: Genuine Intel(R) CPU 2.40GHz (1866.740 Mhz) Memory: 4064772 kB Cache: 1024 KB Compiler used in 20050130/spec-20050130.stats (After) Compiler: gcc version 4.0.0 20050130 (experimental) Base flags: -O2 -march=nocona Peak flags: -O3 -march=nocona Processor: Genuine Intel(R) CPU 2.40GHz (1866.740 Mhz) Memory: 4064772 kB Cache: 1024 KB SPECint results for base Benchmark Before After % diff 164.gzip 508.55 482.96 - 5.03% 175.vpr 443.91 445.11 + 0.27% 176.gcc 760.30 762.51 + 0.29% 181.mcf 397.27 397.01 - 0.07% 186.crafty 768.76 745.77 - 2.99% 197.parser 458.55 458.83 + 0.06% 252.eon 0.000.00 INF 253.perlbmk 758.73 767.23 + 1.12% 254.gap 836.44 834.91 - 0.18% 255.vortex 0.00 850.80 INF 256.bzip2 557.58 557.19 - 0.07% 300.twolf 576.46 557.28 - 3.33% mean 587.55 602.07 + 2.47% SPECfp result for base Benchmark Before After % diff 168.wupwise 0.000.00 INF 171.swim 0.000.00 INF 172.mgrid 0.000.00 INF 173.applu 0.000.00 INF 177.mesa 767.75 765.15 - 0.34% 178.galgel 0.000.00 INF 179.art 735.31 741.52 + 0.84% 183.equake 1043.34 1007.83 - 3.40% 187.facerec 0.000.00 INF 188.ammp 558.49 536.86 - 3.87% 189.lucas 0.000.00 INF 191.fma3d 0.000.00 INF 200.sixtrack 0.000.00 INF 301.apsi 0.000.00 INF mean 757.33 744.36 - 1.71% -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From dnovillo at gcc dot gnu dot org 2005-01-31 15:23 --- SPEC comparisons for i686 before/after kazu's patch to completely disable CSE. The 20050127 compiler has CSE enabled. The 20050129 compiler has CSE disabled. Compile times for SPECint were reduced by 9%. Compile times for SPECfp were reduced by 7.1%. Bootstrap times were reduced by 4.5% Comparison between 20050127/spec-20050127.stats and 20050129/spec-20050129.stats (base) Compiler used in 20050127/spec-20050127.stats (Before) Compiler: gcc version 4.0.0 20050127 (experimental) Base flags: -O2 -march=i686 Peak flags: -O3 -march=i686 Processor: Intel(R) Pentium(R) 4 CPU 2.26GHz (2259.264 Mhz) Memory: 1034472 kB Cache: 512 KB Compiler used in 20050129/spec-20050129.stats (After) Compiler: gcc version 4.0.0 20050129 (experimental) Base flags: -O2 -march=i686 Peak flags: -O3 -march=i686 Processor: Intel(R) Pentium(R) 4 CPU 2.26GHz (2259.264 Mhz) Memory: 1034472 kB Cache: 512 KB SPECint results for base Benchmark Before After % diff 164.gzip 650.42 578.34 - 11.08% 175.vpr 421.04 418.82 - 0.53% 176.gcc 717.60 710.60 - 0.98% 181.mcf 426.30 426.49 + 0.05% 186.crafty 635.60 632.86 - 0.43% 197.parser 546.62 563.78 + 3.14% 252.eon 541.23 566.44 + 4.66% 253.perlbmk 704.34 685.23 - 2.71% 254.gap 741.52 708.46 - 4.46% 255.vortex 822.37 823.91 + 0.19% 256.bzip2 524.96 524.44 - 0.10% 300.twolf 544.79 552.95 + 1.50% mean 594.14 588.36 - 0.97% SPECfp result for base Benchmark Before After % diff 168.wupwise 579.39 626.80 + 8.18% 171.swim 501.51 490.96 - 2.10% 172.mgrid 372.63 374.65 + 0.54% 173.applu 557.58 529.18 - 5.09% 177.mesa 417.03 412.20 - 1.16% 178.galgel 485.88 482.41 - 0.71% 179.art 207.13 205.69 - 0.70% 183.equake 820.26 797.45 - 2.78% 187.facerec 346.83 337.74 - 2.62% 188.ammp 343.35 333.60 - 2.84% 189.lucas 498.16 505.99 + 1.57% 191.fma3d 465.00 433.92 - 6.68% 200.sixtrack 383.56 371.22 - 3.22% 301.apsi 422.75 423.89 + 0.27% mean 437.11 431.45 - 1.29% -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed||1 Last reconfirmed|-00-00 00:00:00 |2005-01-31 15:14:49 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added CC||law at redhat dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-01-31 12:42 --- Created an attachment (id=8112) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8112&action=view) gcov coverage testing of CVS HEAD 20050131 on AMD64 This is the coverage data of cse.c for 517 preprocessed C files from the GCC sources (517 files, including all components of cc1). Note especially how ineffective fold_rtx is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added BugsThisDependsOn||19659 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
--- Additional Comments From steven at gcc dot gnu dot org 2005-01-31 12:39 --- To get something started, I have done SPECint and SPECfp runs on AMD64 with CVS HEAD 20050130, unmodified vs. a cse.c with path following disabled (by setting the max-cse-path-length to 1). The overall scores go *up* (!!!) with that change, but some individual benchmarks regress. Still this is a lot better than half a year ago, when terminating CSE would cause an overall regression of more than 10%. Diego Novillo also had his SPEC tester on i686 run, with CSE completely disabled, and there, too, the overall performance drop was not as large as one would maybe have expected. Note that completely disabling is a step further than setting the max-cse-path-length to 1. The latter effectively makes CSE a purely local pass. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches
-- What|Removed |Added CC||kazu at cs dot umass dot ||edu, pinskia at gcc dot gnu ||dot org, dnovillo at gcc dot ||gnu dot org, hubicka at gcc ||dot gnu dot org Summary|[meta-bug] optimizations|[meta-bug] optimizations |that CSE still catches |that CSE still catches http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721