[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2017-12-11 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
Bug 19721 depends on bug 19790, which changed state.

Bug 19790 Summary: equality not noticed when signedness differs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19790

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2017-02-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
Bug 19721 depends on bug 72712, which changed state.

Bug 72712 Summary: [7 Regression] Tenfold compile time regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72712

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2013-05-09 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721

--- Comment #27 from Steven Bosscher steven at gcc dot gnu.org 2013-05-09 
10:39:57 UTC ---
(In reply to comment #26)
 With TARGET_LEGITIMATE_ADDRESS_P rejecting (costly) symbols_refs inside
 memory references, cse_local brings the number of __malloc_av references down
 when compiling newlib's malloc-r.c:

Can you please open a fresh PR for this, with the information necessary
to reproduce this problem, and make the new PR block this one? This PR
is a meta-bug, reporting specific problems is best done in new PRs.


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2013-05-09 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721

--- Comment #28 from Steven Bosscher steven at gcc dot gnu.org ---
(In reply to comment #25)

FWIW this case is handled at the GIMPLE level since at least GCC 4.3.


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2013-05-08 Thread amylaar at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721

--- Comment #26 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org 
2013-05-09 00:32:28 UTC ---
The tree optimizers have become extremely aggressive on constant propagation,
so cse is needed more than ever to undo the damage.
With TARGET_LEGITIMATE_ADDRESS_P rejecting (costly) symbols_refs inside
memory references, cse_local brings the number of __malloc_av references down
when compiling newlib's malloc-r.c:
$ grep -c 'symbol_ref.*__malloc_av_' mallocr-4.4.i.*
mallocr-4.4.i.165r.expand:70
mallocr-4.4.i.166r.vregs:35
mallocr-4.4.i.167r.into_cfglayout:35
mallocr-4.4.i.168r.jump:70
mallocr-4.4.i.169r.subreg1:35
mallocr-4.4.i.170r.dfinit:35
mallocr-4.4.i.171r.cse1:70
mallocr-4.4.i.172r.fwprop1:41
mallocr-4.4.i.173r.cprop1:54
mallocr-4.4.i.175r.hoist:42
mallocr-4.4.i.176r.cprop2:30
mallocr-4.4.i.178r.cse_local:26


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-09-29 Thread steven at gcc dot gnu dot org


-- 
Bug 19721 depends on bug 23911, which changed state.

Bug 23911 Summary: Failure to propagate constants from a const initializer for 
_Complex
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23911

   What|Old Value   |New Value

 Status|UNCONFIRMED |NEW
 Status|NEW |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-09-16 Thread steven at gcc dot gnu dot org


-- 
   What|Removed |Added

  BugsThisDependsOn||23911


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-08-18 Thread bonzini at gcc dot gnu dot org


-- 
   What|Removed |Added

  BugsThisDependsOn||23455


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-08-17 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-08-17 
08:03 ---
This small testcase is a typical case of the optimizations that CSE path
following catches on PowerPC:

unsigned outcnt;
extern void flush_outbuf(void);

void
bi_windup(unsigned char *outbuf, unsigned char bi_buf)
{
outbuf[outcnt] = bi_buf;
if (outcnt == 16384)
flush_outbuf();
outbuf[outcnt] = bi_buf;
}

Loading outcnt takes *three* insns: one to load the high part of the address,
one to load the low part, one to load from memory.  CSE reduces them to two by
combining the loading of the low part with the load from memory.  With CSE path
following, in addition, CSE is able to factor the loads of the high part of the
address, and do just one of them.

Now here comes GCSE.  If CSE path following is on, GCSE sees that the third
occurrence of outcnt is the same as the second, and eliminates it.  If it is
off, GCSE is wasted to factor the loading of the address high parts.

So, if we remove a pseudo-global cse pass by disabling path following, it would
make sense to bump the default max-gcse-passes to 2.

Paolo

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-08-17 Thread law at redhat dot com

--- Additional Comments From law at redhat dot com  2005-08-17 19:31 ---
Subject: Re:  [meta-bug] optimizations that CSE still
catches

On Wed, 2005-08-17 at 08:03 +, bonzini at gcc dot gnu dot org wrote:
 --- Additional Comments From bonzini at gcc dot gnu dot org  2005-08-17 
 08:03 ---
 This small testcase is a typical case of the optimizations that CSE path
 following catches on PowerPC:
 
 unsigned outcnt;
 extern void flush_outbuf(void);
 
 void
 bi_windup(unsigned char *outbuf, unsigned char bi_buf)
 {
 outbuf[outcnt] = bi_buf;
 if (outcnt == 16384)
 flush_outbuf();
 outbuf[outcnt] = bi_buf;
 }
 
 Loading outcnt takes *three* insns: one to load the high part of the address,
 one to load the low part, one to load from memory.  CSE reduces them to two by
 combining the loading of the low part with the load from memory.  With CSE 
 path
 following, in addition, CSE is able to factor the loads of the high part of 
 the
 address, and do just one of them.
 
 Now here comes GCSE.  If CSE path following is on, GCSE sees that the third
 occurrence of outcnt is the same as the second, and eliminates it.  If it is
 off, GCSE is wasted to factor the loading of the address high parts.
 
 So, if we remove a pseudo-global cse pass by disabling path following, it 
 would
 make sense to bump the default max-gcse-passes to 2.
Presumably the store into outbuf prevents the SSA optimizers from
commonizing the first two loads of outcnt and the call to flush_outbuf
prevents the SSA optimizers from commonizing the last load of outcnt on
the path which bypasses the call to flush_outbuf.  Right?

Jeff



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-08-17 Thread paolo dot bonzini at lu dot unisi dot ch

--- Additional Comments From paolo dot bonzini at lu dot unisi dot ch  
2005-08-17 20:07 ---
Subject: Re:  [meta-bug] optimizations that CSE still
 catches


unsigned outcnt;
extern void flush_outbuf(void);

void
bi_windup(unsigned char *outbuf, unsigned char bi_buf)
{
outbuf[outcnt] = bi_buf;
if (outcnt == 16384)
flush_outbuf();
outbuf[outcnt] = bi_buf;
}


Presumably the store into outbuf prevents the SSA optimizers from
commonizing the first two loads of outcnt and the call to flush_outbuf
prevents the SSA optimizers from commonizing the last load of outcnt on
the path which bypasses the call to flush_outbuf.  Right?
  

Not really.  First of all, as stevenb pointed out on IRC, this is quite 
specific to powerpc-apple-darwin and other targets where programs are 
compiled as PIC by default.  Steven's SPEC testing under Linux has not 
shown this behavior, but shared libraries there *will* suffer from the 
same problem!

We'd want the code to become

void
bi_windup(unsigned char *outbuf, unsigned char bi_buf)
{
int t1 = outcnt;
outbuf[t1] = bi_buf;
int t2 = outcnt, t3;
if (t2 == 16384) {
flush_outbuf();
t3 = outcnt;
} else
t3 = t2;
outbuf[t3] = bi_buf;
}


If we disable CSE path following, and keep only one GCSE pass, we 
waste the opportunity to do this optimization, because we generate 
temporaries for the partially redundant address of outcnt.  With two 
GCSE passes, the second is able to eliminate the partially redundant load.

Of course what we really miss is load PRE on the tree level, but it is 
good that --param max-gcse-passes=2 can be a replacement of 
-fcse-skip-blocks -fcse-follow-jumps.  Testing mainline GCC against a 
patch including no path following + 2 GCSE passes + my forward 
propagation pass, I'm seeing SPEC improvements of +2 to +8% on 
powerpc-apple-darwin.

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-07-19 Thread falk at debian dot org


-- 
Bug 19721 depends on bug 16961, which changed state.

Bug 16961 Summary: Poor x86-64 performance with 128bit ints
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16961

   What|Old Value   |New Value

 Status|NEW |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-05-06 Thread pinskia at gcc dot gnu dot org


-- 
Bug 19721 depends on bug 19791, which changed state.

Bug 19791 Summary: [tcb] A constant not fully propagated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19791

   What|Old Value   |New Value

 Status|NEW |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-04-26 Thread bonzini at gcc dot gnu dot org

--- Additional Comments From bonzini at gcc dot gnu dot org  2005-04-26 
17:35 ---
Another thing that CSE does is promoting paradoxical subregs to regs.  On
PowerPC at least, recursive calls of fold_rtx are almost ineffective except for
this.  Such promotion helps because equiv_constant does not look into subregs.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-04-16 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-04-16 
15:15 ---
It is apparently not possible to convince people that any optimizations 
in CSE can be removed, so working on this is pointless for me.  See 
http://gcc.gnu.org/ml/gcc-patches/2005-04/msg01498.html. 

-- 
   What|Removed |Added

 Status|NEW |SUSPENDED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-04-13 Thread pinskia at gcc dot gnu dot org


-- 
Bug 19721 depends on bug 19659, which changed state.

Bug 19659 Summary: GCC does not remove an if statement that never triggers.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19659

   What|Old Value   |New Value

 Status|NEW |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-04-13 Thread pinskia at gcc dot gnu dot org


-- 
Bug 19721 depends on bug 19789, which changed state.

Bug 19789 Summary: tree optimizers do not know that constant global variables 
do not change
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19789

   What|Old Value   |New Value

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-17 Thread aoliva at gcc dot gnu dot org


-- 
   What|Removed |Added

  BugsThisDependsOn||20514


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-16 Thread kazu at cs dot umass dot edu


-- 
Bug 19721 depends on bug 19788, which changed state.

Bug 19788 Summary: Inconsistent handling of -1.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19788

   What|Old Value   |New Value

 Status|WAITING |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-11 Thread amylaar at gcc dot gnu dot org

--- Additional Comments From amylaar at gcc dot gnu dot org  2005-03-11 
19:43 ---
(In reply to comment #18)
 IMHO.  One of the tricks with the mult and divmod expanders is precisely
 when should we expand them into their component operations.  We clearly
 don't want to do it at the very start or the very end of hte SSA path,
 but somewhere in the middle.

One of the sh64 patches that I intend to merge expands signed integer
division into calculating the inverse of the divisor (at runtime) and
then multiplying the dividend with that inverse.  It's broken up into
operations that assign to one pseudo register each, so this stuff gets
full exposure to cse, gcse and rtl loop optimizations.  If the inverse
calculation and the mutiply with the dividend end up in the same basic
block and the inverse is only used once, some combiner patterns
combine  split this stuff again to get a more scheduler-friendly data
flow.
Do you think we should have machine-dependent tree expanders so that
such details can already be exposed to (g)cse and loop optimizations at
the tree level?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-11 Thread pinskia at gcc dot gnu dot org


-- 
Bug 19721 depends on bug 20132, which changed state.

Bug 20132 Summary: Pessimization of induction variable and missed hoisting 
opportunity
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20132

   What|Old Value   |New Value

 Status|NEW |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-10 Thread phython at gcc dot gnu dot org


-- 
Bug 19721 depends on bug 20130, which changed state.

Bug 20130 Summary: Fold a * -1 - 1 into ~a;
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20130

   What|Old Value   |New Value

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-06 Thread stevenb at suse dot de

--- Additional Comments From stevenb at suse dot de  2005-03-06 09:30 
---
Subject: Re:  [meta-bug] optimizations that CSE still catches

On Sunday 06 March 2005 06:59, law at redhat dot com wrote:
 Ah.  Yes.  What did it look like in the tree dumps?   Unless
 one of the expanders is creating the negation I would think this
 would be pretty easy to catch in fold-const.c

This is PR20130.  We don't fold -1*x to -x, ie. we never
fold the MULT_EXPR to a NEGATE_EXPR.  PR20130 has a patch.

 expand_mult?  Sigh.  That's been in the back of my mind for a couple
 years now -- it's probably one of the largest RTL expanders which
 needs to have a lot of its functionality moved into trees.

That'd be nice.

In this case, Roger found out that for DImode negative constants
it completely bypasses expand_mult_const.  Fixing that would help
for now.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-06 Thread law at redhat dot com

--- Additional Comments From law at redhat dot com  2005-03-06 19:56 ---
Subject: Re:  [meta-bug] optimizations that CSE still
catches

On Sun, 2005-03-06 at 09:30 +, stevenb at suse dot de wrote:
 --- Additional Comments From stevenb at suse dot de  2005-03-06 09:30 
 ---
 Subject: Re:  [meta-bug] optimizations that CSE still catches
 
 On Sunday 06 March 2005 06:59, law at redhat dot com wrote:
  Ah.  Yes.  What did it look like in the tree dumps?   Unless
  one of the expanders is creating the negation I would think this
  would be pretty easy to catch in fold-const.c
 
 This is PR20130.  We don't fold -1*x to -x, ie. we never
 fold the MULT_EXPR to a NEGATE_EXPR.  PR20130 has a patch.
Ok.  That should be pretty easy to fix.

 
  expand_mult?  Sigh.  That's been in the back of my mind for a couple
  years now -- it's probably one of the largest RTL expanders which
  needs to have a lot of its functionality moved into trees.
 
 That'd be nice.
 
 In this case, Roger found out that for DImode negative constants
 it completely bypasses expand_mult_const.  Fixing that would help
 for now.
expand_mult, expand_divmod and the switch expanders are the biggies
IMHO.  One of the tricks with the mult and divmod expanders is precisely
when should we expand them into their component operations.  We clearly
don't want to do it at the very start or the very end of hte SSA path,
but somewhere in the middle.

jeff




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-06 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-03-06 
22:14 ---
Just to give people an idea of how close we are to optimizing well enough that 
the calls to fold_rtx in CSE are almost all no-ops, here are some numbers 
taken over all cc1-i files on amd64: 
 
Number of times fold_rtx is called: 13882333 
Number of times it returns something other than the incoming rtx x: 70001 
Number of times fold_rtx is called by other functions than itself: 9323647 
Number of times it returns something other than x: 8526 
 
A few rtxes that fold_rtx handles: 
 
Loads from constant pool: 
Trying to fold rtx: 
(float_extend:DF (mem/u/i:SF (symbol_ref/u:DI (*.LC0) [flags 0x2]) [2 S4 
A32])) 
  Trying to fold rtx: 
  (mem/u/i:SF (symbol_ref/u:DI (*.LC0) [flags 0x2]) [2 S4 A32]) 
Trying to fold rtx: 
(symbol_ref/u:DI (*.LC0) [flags 0x2]) 
Returning X unchanged. 
  Returning new rtx: 
  (const_double:SF 1.0e+0 [0x0.8p+1]) 
Returning new rtx: 
(const_double:DF 1.0e+0 [0x0.8p+1]) 
 
Folded jumps: 
Trying to fold rtx: 
(if_then_else (eq (reg:CCZ 17 flags) 
(const_int 0 [0x0])) 
(label_ref 73) 
(pc)) 
  Trying to fold rtx: 
  (pc) 
  Returning X unchanged. 
  Trying to fold rtx: 
  (eq (reg:CCZ 17 flags) 
(const_int 0 [0x0])) 
Trying to fold rtx: 
(reg:SI 66 [ D.10402 ]) 
Returning X unchanged. 
Trying to fold rtx: 
(const_int 4 [0x4]) 
Returning X unchanged. 
  Returning new rtx: 
  (const_int 1 [0x1]) 
Returning new rtx: 
(label_ref 73) 
 
Apparently an equivalent expression with lower cost: 
Trying to fold rtx: 
(plus:QI (subreg:QI (reg:SI 251) 0) 
(subreg:QI (reg:SI 251) 0)) 
  Trying to fold rtx: 
  (subreg:QI (reg:SI 251) 0) 
Trying to fold rtx: 
(reg:SI 251) 
Returning X unchanged. 
  Returning X unchanged. 
  Trying to fold rtx: 
  (subreg:QI (reg:SI 251) 0) 
Trying to fold rtx: 
(reg:SI 251) 
Returning X unchanged. 
  Returning X unchanged. 
Returning new rtx: 
(ashift:QI (subreg:QI (reg:SI 251) 0) 
(const_int 1 [0x1])) 
 
Likewise: 
Trying to fold rtx: 
(mult:DI (reg:DI 63 [ variable.comb_vect.length ]) 
(const_int 4 [0x4])) 
Returning new rtx: 
(ashift:DI (reg:DI 63 [ variable.comb_vect.length ]) 
(const_int 2 [0x2])) 
 
It'd be interesting to find out how many of these things combine and later CSE 
passes would catch (or miss), and the tree-cleanup-branch compares.  I will 
look at the latter first. 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-05 Thread stevenb at suse dot de

--- Additional Comments From stevenb at suse dot de  2005-03-05 10:39 
---
Subject: Re:  [meta-bug] optimizations that CSE still catches

 Am I missing something here?  I guess I'm not sure what point you're
 trying to make.

It just seems that we could do better on initial RTL generation, e.g.

;; j = k * -1
(insn 23 21 0 (parallel [
(set (reg/v:DI 64 [ j ])
(mult:DI (reg/v:DI 67 [ k ])
(const_int -1 [0x])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))

which we later simplify in CSE:

Working on insn:
(insn 23 21 24 0 (parallel [
(set (reg/v:DI 64 [ j ])
(mult:DI (reg/v:DI 67 [ k ])
(const_int -1 [0x])))
(clobber (reg:CC 17 flags))
]) 243 {*muldi3_1_rex64} (nil)
(nil))
Trying to fold rtx:
(mult:DI (reg/v:DI 67 [ k ])
(const_int -1 [0x]))
Returning new rtx:
(neg:DI (reg/v:DI 67 [ k ]))



Similarly, on a 64-bits host:

;; j = k * 4294967295
(insn 15 13 16 (set (reg:DI 63)
(reg/v:DI 62 [ k ])) -1 (nil)
(nil))

(insn 16 15 17 (parallel [
(set (reg:DI 64)
(ashift:DI (reg:DI 63)
(const_int 32 [0x20])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ])
(const_int 4294967296 [0x1]))
(nil)))

(insn 17 16 18 (parallel [
(set (reg:DI 65)
(minus:DI (reg:DI 64)
(reg/v:DI 62 [ k ])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ])
(const_int 4294967295 [0x]))
(nil)))

(insn 18 17 0 (set (reg/v:DI 59 [ j ])
(reg:DI 65)) -1 (nil)
(nil))

which CSE turns into:

Working on insn:
(insn 15 13 16 0 (set (reg:DI 63 [ k ])
(reg/v:DI 62 [ k ])) 81 {*movdi_1_rex64} (nil)
(nil))
Trying to fold rtx:
(reg/v:DI 62 [ k ])
Returning X unchanged.

Working on insn:
(insn 16 15 17 0 (parallel [
(set (reg:DI 64)
(ashift:DI (reg:DI 63 [ k ])
(const_int 32 [0x20])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ])
(const_int 4294967296 [0x1]))
(nil)))
Trying to fold rtx:
(mult:DI (reg/v:DI 62 [ k ])
(const_int 4294967296 [0x1]))
Returning new rtx:
(ashift:DI (reg/v:DI 62 [ k ])
(const_int 32 [0x20]))

Working on insn:
(insn 17 16 18 0 (parallel [
(set (reg:DI 65)
(minus:DI (reg:DI 64)
(reg/v:DI 62 [ k ])))
(clobber (reg:CC 17 flags))
]) 223 {*subdi_1_rex64} (nil)
(expr_list:REG_EQUAL (mult:DI (reg/v:DI 62 [ k ])
(const_int 4294967295 [0x]))
(nil)))
Trying to fold rtx:
(minus:DI (reg:DI 64)
(reg/v:DI 62 [ k ]))
Returning X unchanged.

Working on insn:
(insn 18 17 19 0 (set (reg/v:DI 59 [ j ])
(reg:DI 65)) 81 {*movdi_1_rex64} (nil)
(nil))
Trying to fold rtx:
(reg:DI 65)
Returning X unchanged.


These are the from the detailed .expand dump
(i.e. cc1 t.c -O2 --fdump-rtl-expand-details -fdump-rtl-cse)

So it seems to come from the MULT_EXPR expander in this case, but
we'll have to study expand a bit closer to be sure.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-05 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-03-06 
00:32 ---
The first case of comment #14 turns out to be PR20130. 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-05 Thread law at redhat dot com

--- Additional Comments From law at redhat dot com  2005-03-06 05:59 ---
Subject: Re:  [meta-bug] optimizations that CSE still
catches

On Sat, 2005-03-05 at 10:39 +, stevenb at suse dot de wrote:
 --- Additional Comments From stevenb at suse dot de  2005-03-05 10:39 
 ---
 Subject: Re:  [meta-bug] optimizations that CSE still catches
 
  Am I missing something here?  I guess I'm not sure what point you're
  trying to make.
 
 It just seems that we could do better on initial RTL generation, e.g.
 
 ;; j = k * -1
 (insn 23 21 0 (parallel [
 (set (reg/v:DI 64 [ j ])
 (mult:DI (reg/v:DI 67 [ k ])
 (const_int -1 [0x])))
 (clobber (reg:CC 17 flags))
 ]) -1 (nil)
 (nil))
 
 which we later simplify in CSE:
 
 Working on insn:
 (insn 23 21 24 0 (parallel [
 (set (reg/v:DI 64 [ j ])
 (mult:DI (reg/v:DI 67 [ k ])
 (const_int -1 [0x])))
 (clobber (reg:CC 17 flags))
 ]) 243 {*muldi3_1_rex64} (nil)
 (nil))
 Trying to fold rtx:
 (mult:DI (reg/v:DI 67 [ k ])
 (const_int -1 [0x]))
 Returning new rtx:
 (neg:DI (reg/v:DI 67 [ k ]))
Ah.  Yes.  What did it look like in the tree dumps?   Unless
one of the expanders is creating the negation I would think this
would be pretty easy to catch in fold-const.c

[ ... ]


 
 These are the from the detailed .expand dump
 (i.e. cc1 t.c -O2 --fdump-rtl-expand-details -fdump-rtl-cse)
 
 So it seems to come from the MULT_EXPR expander in this case, but
 we'll have to study expand a bit closer to be sure.
expand_mult?  Sigh.  That's been in the back of my mind for a couple
years now -- it's probably one of the largest RTL expanders which
needs to have a lot of its functionality moved into trees.

jeff




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-02 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-03-02 
11:50 ---
Here is a nice one: 
 
Working on insn: 
(insn 215 214 216 15 (parallel [ 
(set (reg:DI 176) 
(ashift:DI (reg:DI 175) 
(const_int 3 [0x3]))) 
(clobber (reg:CC 17 flags)) 
]) -1 (nil) 
(expr_list:REG_EQUAL (mult:DI (reg:DI 174) 
(const_int 8 [0x8])) 
(nil))) 
Trying to fold rtx: 
(mult:DI (reg:DI 174) 
(const_int 8 [0x8])) 
Returning new rtx: 
(ashift:DI (reg:DI 174) 
(const_int 3 [0x3])) 
 
Sometimes I just hate REG_EQUAL notes... 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-03-02 Thread law at redhat dot com

--- Additional Comments From law at redhat dot com  2005-03-02 18:23 ---
Subject: Re:  [meta-bug] optimizations that CSE still
catches

On Wed, 2005-03-02 at 11:50 +, steven at gcc dot gnu dot org wrote:
 --- Additional Comments From steven at gcc dot gnu dot org  2005-03-02 
 11:50 ---
 Here is a nice one: 
  
 Working on insn: 
 (insn 215 214 216 15 (parallel [ 
 (set (reg:DI 176) 
 (ashift:DI (reg:DI 175) 
 (const_int 3 [0x3]))) 
 (clobber (reg:CC 17 flags)) 
 ]) -1 (nil) 
 (expr_list:REG_EQUAL (mult:DI (reg:DI 174) 
 (const_int 8 [0x8])) 
 (nil))) 
 Trying to fold rtx: 
 (mult:DI (reg:DI 174) 
 (const_int 8 [0x8])) 
 Returning new rtx: 
 (ashift:DI (reg:DI 174) 
 (const_int 3 [0x3])) 
  
 Sometimes I just hate REG_EQUAL notes... 
Am I missing something here?  I guess I'm not sure what point you're
trying to make.

It seems to me that (reg 174) must be equal to (reg 175) for the
REG_EQUAL note to be valid.  Which means they must either be set
from equivalent expressions or we must have a copy insn between
them.

In the former case (set from equivalent expressions) we should
figure out why DOM or PRE didn't catch the redundancy.

In the latter case we'd want to see why we didn't copy propagate
the copy.

[ It's possible the copy occurs due to tree-rtl expansion -- there's
  still a fair number of ways to get silly copies at that phase.  In
  which case we need to look into ways to eliminate the silly copies.

  IIRC some come from lameness in the API for some of our conversion
  routines. ]

jeff  



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-25 Thread kazu at cs dot umass dot edu


-- 
Bug 19721 depends on bug 19938, which changed state.

Bug 19938 Summary: Missed jump threading opportunity due to signedness 
difference
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19938

   What|Old Value   |New Value

 Status|NEW |RESOLVED
 Resolution||FIXED

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-21 Thread kazu at cs dot umass dot edu


-- 
   What|Removed |Added

  BugsThisDependsOn||20130


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-21 Thread kazu at cs dot umass dot edu


-- 
   What|Removed |Added

  BugsThisDependsOn||20132


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-13 Thread kazu at cs dot umass dot edu


-- 
   What|Removed |Added

  BugsThisDependsOn||19938


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-06 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-02-06 
17:41 ---
Arguably, PR16961 is not directly related.  But if we fix that bug and the 
similar long long issues on 32 bits hosts, then the 64 bits arith on 32 
bits hosts thing should be a non-issue (assuming the tree optimizers do 
well). 

-- 
   What|Removed |Added

  BugsThisDependsOn||16961


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-02 Thread stevenb at suse dot de

--- Additional Comments From stevenb at suse dot de  2005-02-02 09:21 
---
Subject: Re:  [meta-bug] optimizations that CSE still catches

On Monday 31 January 2005 22:35, law at redhat dot com wrote:
 Note I would _STRONGLY_ recommend people look at more than just the
 compiler when evaluating the old CSE code.  In particular it is
 important that we look at things like 64bit arithmetic on 32bit
 hosts (which happens often in kernels, but not nearly as often
 in user level benchmarks).

I was told crafty has a lot of 64bits arithmetic, so the -m32
numbers for crafty should be an indication of possible regressions
in that area.  And those numbers look OK to me.

If I can find some time, I'll try another benchmark suite to see
the effects of CSE path following are significant enough to still
be worth its cost.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-02-02 Thread hubicka at ucw dot cz

--- Additional Comments From hubicka at ucw dot cz  2005-02-02 11:50 ---
Subject: Re:  [meta-bug] optimizations that CSE still catches

 
 --- Additional Comments From stevenb at suse dot de  2005-02-02 09:21 
 ---
 Subject: Re:  [meta-bug] optimizations that CSE still catches
 
 On Monday 31 January 2005 22:35, law at redhat dot com wrote:
  Note I would _STRONGLY_ recommend people look at more than just the
  compiler when evaluating the old CSE code.  In particular it is
  important that we look at things like 64bit arithmetic on 32bit
  hosts (which happens often in kernels, but not nearly as often
  in user level benchmarks).
 
 I was told crafty has a lot of 64bits arithmetic, so the -m32
 numbers for crafty should be an indication of possible regressions
 in that area.  And those numbers look OK to me.

Crafty is special by using 64bit values as bitmaps rather then numbers,
so it don't do addition/multiplication and friends much that produces
most lousy artefacts.

Honza
 
 If I can find some time, I'll try another benchmark suite to see
 the effects of CSE path following are significant enough to still
 be worth its cost.
 
 
 -- 
 
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721
 
 --- You are receiving this mail because: ---
 You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread steven at gcc dot gnu dot org


-- 
   What|Removed |Added

 CC||kazu at cs dot umass dot
   ||edu, pinskia at gcc dot gnu
   ||dot org, dnovillo at gcc dot
   ||gnu dot org, hubicka at gcc
   ||dot gnu dot org
Summary|[meta-bug] optimizations|[meta-bug] optimizations
   |that CSE still catches  |that CSE still catches


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-31 
12:39 ---
To get something started, I have done SPECint and SPECfp runs on AMD64 with CVS
HEAD 20050130, unmodified vs. a cse.c with path following disabled (by setting
the max-cse-path-length to 1).  The overall scores go *up* (!!!) with that
change, but some individual benchmarks regress.  Still this is a lot better than
half a year ago, when terminating CSE would cause an overall regression of more
than 10%.

Diego Novillo also had his SPEC tester on i686 run, with CSE completely
disabled, and there, too, the overall performance drop was not as large as one
would maybe have expected.

Note that completely disabling is a step further than setting the
max-cse-path-length to 1.  The latter effectively makes CSE a purely local pass.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread kazu at cs dot umass dot edu


-- 
   What|Removed |Added

  BugsThisDependsOn||19659


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-31 
12:42 ---
Created an attachment (id=8112)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8112action=view)
gcov coverage testing of CVS HEAD 20050131 on AMD64

This is the coverage data of cse.c for 517 preprocessed C files from the GCC
sources (517 files, including all components of cc1).  Note especially how
ineffective fold_rtx is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread steven at gcc dot gnu dot org


-- 
   What|Removed |Added

 CC||law at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread pinskia at gcc dot gnu dot org


-- 
   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed||1
   Last reconfirmed|-00-00 00:00:00 |2005-01-31 15:14:49
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread dnovillo at gcc dot gnu dot org

--- Additional Comments From dnovillo at gcc dot gnu dot org  2005-01-31 
15:23 ---

SPEC comparisons for i686 before/after kazu's patch to completely disable CSE.
The 20050127 compiler has CSE enabled.  The 20050129 compiler has CSE disabled.

Compile times for SPECint were reduced by 9%.
Compile times for SPECfp were reduced by 7.1%.
Bootstrap times were reduced by 4.5%

Comparison between 20050127/spec-20050127.stats and 20050129/spec-20050129.stats
(base)

Compiler used in 20050127/spec-20050127.stats (Before)

Compiler:   gcc version 4.0.0 20050127 (experimental)
Base flags: -O2 -march=i686
Peak flags: -O3 -march=i686
Processor:  Intel(R) Pentium(R) 4 CPU 2.26GHz (2259.264 Mhz)
Memory: 1034472 kB
Cache:  512 KB

Compiler used in 20050129/spec-20050129.stats (After)

Compiler:   gcc version 4.0.0 20050129 (experimental)
Base flags: -O2 -march=i686
Peak flags: -O3 -march=i686
Processor:  Intel(R) Pentium(R) 4 CPU 2.26GHz (2259.264 Mhz)
Memory: 1034472 kB
Cache:  512 KB


SPECint results for base

Benchmark   Before   After  % diff
 164.gzip   650.42  578.34  - 11.08%
  175.vpr   421.04  418.82  -  0.53%
  176.gcc   717.60  710.60  -  0.98%
  181.mcf   426.30  426.49  +  0.05%
   186.crafty   635.60  632.86  -  0.43%
   197.parser   546.62  563.78  +  3.14%
  252.eon   541.23  566.44  +  4.66%
  253.perlbmk   704.34  685.23  -  2.71%
  254.gap   741.52  708.46  -  4.46%
   255.vortex   822.37  823.91  +  0.19%
256.bzip2   524.96  524.44  -  0.10%
300.twolf   544.79  552.95  +  1.50%
 mean   594.14  588.36  -  0.97%


SPECfp result for base

Benchmark   Before   After  % diff
  168.wupwise   579.39  626.80  +  8.18%
 171.swim   501.51  490.96  -  2.10%
172.mgrid   372.63  374.65  +  0.54%
173.applu   557.58  529.18  -  5.09%
 177.mesa   417.03  412.20  -  1.16%
   178.galgel   485.88  482.41  -  0.71%
  179.art   207.13  205.69  -  0.70%
   183.equake   820.26  797.45  -  2.78%
  187.facerec   346.83  337.74  -  2.62%
 188.ammp   343.35  333.60  -  2.84%
189.lucas   498.16  505.99  +  1.57%
191.fma3d   465.00  433.92  -  6.68%
 200.sixtrack   383.56  371.22  -  3.22%
 301.apsi   422.75  423.89  +  0.27%
 mean   437.11  431.45  -  1.29%


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread dnovillo at gcc dot gnu dot org

--- Additional Comments From dnovillo at gcc dot gnu dot org  2005-01-31 
15:26 ---

Similarly for em64t.

Build times for SPECint were reduced by 9.2%.
Build times for SPECfp were reduced by 7.5%.
Compiler bootstrap times were reduced by 4.4%.



Comparison between 20050127/spec-20050127.stats and 20050130/spec-20050130.stats
(base)

Compiler used in 20050127/spec-20050127.stats (Before)

Compiler:   gcc version 4.0.0 20050127 (experimental)
Base flags: -O2 -march=nocona
Peak flags: -O3 -march=nocona
Processor:  Genuine Intel(R) CPU 2.40GHz (1866.740 Mhz)
Memory: 4064772 kB
Cache:  1024 KB

Compiler used in 20050130/spec-20050130.stats (After)

Compiler:   gcc version 4.0.0 20050130 (experimental)
Base flags: -O2 -march=nocona
Peak flags: -O3 -march=nocona
Processor:  Genuine Intel(R) CPU 2.40GHz (1866.740 Mhz)
Memory: 4064772 kB
Cache:  1024 KB


SPECint results for base

Benchmark   Before   After  % diff
 164.gzip   508.55  482.96  -  5.03%
  175.vpr   443.91  445.11  +  0.27%
  176.gcc   760.30  762.51  +  0.29%
  181.mcf   397.27  397.01  -  0.07%
   186.crafty   768.76  745.77  -  2.99%
   197.parser   458.55  458.83  +  0.06%
  252.eon 0.000.00  INF
  253.perlbmk   758.73  767.23  +  1.12%
  254.gap   836.44  834.91  -  0.18%
   255.vortex 0.00  850.80  INF
256.bzip2   557.58  557.19  -  0.07%
300.twolf   576.46  557.28  -  3.33%
 mean   587.55  602.07  +  2.47%


SPECfp result for base

Benchmark   Before   After  % diff
  168.wupwise 0.000.00  INF
 171.swim 0.000.00  INF
172.mgrid 0.000.00  INF
173.applu 0.000.00  INF
 177.mesa   767.75  765.15  -  0.34%
   178.galgel 0.000.00  INF
  179.art   735.31  741.52  +  0.84%
   183.equake   1043.34 1007.83 -  3.40%
  187.facerec 0.000.00  INF
 188.ammp   558.49  536.86  -  3.87%
189.lucas 0.000.00  INF
191.fma3d 0.000.00  INF
 200.sixtrack 0.000.00  INF
 301.apsi 0.000.00  INF
 mean   757.33  744.36  -  1.71%

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread stevenb at suse dot de

--- Additional Comments From stevenb at suse dot de  2005-01-31 20:14 
---
Subject: Re:  [meta-bug] optimizations that CSE still catches

My numbers for not disabling CSE completely but disabling path following
are a lot less pessimistic.  This was on an AMD Opteron at 1600MHz:

GCC was configured as: configure --enable-threads=posix 
--enable-languages=c,c++,f95
GCC bootstrap times for 'make -j1 bootstrap  make install':
Bootstrap time base compiler: 2208 s
Bootstrap time peak compiler: 2150 s (-2.6%)

SPECint 64 bits
Total time for base compilation: 192 s
Total time for peak compilation: 180 s (-6.7%)
base   peakpeak/base
   164.gzip  794799+0.63%
   175.vpr   729715-1.92%
   176.gcc   958963+0.52%
   181.mcf   410411+0.24%
   186.crafty1362   1380   +1.32%
   197.parser558558=
   252.eon X  X 
   253.perlbmk   962964+0.21%
   254.gap   774776+0.26%
   255.vortex1159   1162   +0.26%
   256.bzip2 779772-0.90%
   300.twolf 836876+4.78%

SPECfp 64 bits
Total time for base compilation: 212 s
Total time for peak compilation: 208 s (-1.9%)
base   peakpeak/base
   168.wupwise   781793+1.53%
   171.swim  690687-0.43%
   172.mgrid 513514+0.02%
   173.applu 624624=
   177.mesa 1000998-0.20%
   178.galgel  X  X
   179.art   941953+1.28%
   183.equake817820+0.37%
   187.facerec   674677+0.44%
   188.ammp  859859=
   189.lucas 858858=
   191.fma3d 699698-0.14%
   200.sixtrack  382382=
   301.apsi  770771+0.12%

SPECint 32 bits
Total time for base compilation: 257 s
Total time for peak compilation: 246 s (-4.5%)
base   peakpeak/base
   164.gzip  696700+0.57%
   175.vpr   691710+2.74%
   176.gcc   884875-1.02%
   181.mcf   528530+0.38%
   186.crafty920922+0.22%
   197.parser629634+0.79%
   252.eon   970963-0.72%
   253.perlbmk   935938+0.32%
   254.gap X  X
   255.vortex  X  X
   256.bzip2 678681+0.04%
   300.twolf 974966-0.82%

SPECfp 32 bits
Total time for base compilation: 210 s
Total time for peak compilation: 204 s (-2.9%)
base   peakpeak/base
   168.wupwise   672658-2.08%
   171.swim  692696+0.58%
   172.mgrid 370370=
   173.applu 580580=
   177.mesa  678655-3.39%
   178.galgel  X  X
   179.art   484483-0.21%
   183.equake822821-0.12%
   187.facerec   616617+0.16%
   188.ammp  712713+0.14%
   189.lucas 693695+0.20%
   191.fma3d 716716=
   200.sixtrack  422422=
   301.apsi  685685=

The SPEC numbers are the mean of three runs, so that's pretty solid.


Index: params.def
===
RCS file: /cvs/gcc/gcc/gcc/params.def,v
retrieving revision 1.53
diff -u -3 -p -r1.53 params.def
--- params.def  20 Jan 2005 12:45:12 -  1.53
+++ params.def  31 Jan 2005 17:09:21 -
@@ -321,7 +321,7 @@ DEFPARAM(PARAM_MIN_CROSSJUMP_INSNS,
 DEFPARAM(PARAM_MAX_CSE_PATH_LENGTH,
 max-cse-path-length,
 The maximum length of path considered in cse,
-10, 0, 0)
+1, 0, 0)
 
 /* The cost of expression in loop invariant motion that is considered
expensive.  */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721


[Bug middle-end/19721] [meta-bug] optimizations that CSE still catches

2005-01-31 Thread law at redhat dot com

--- Additional Comments From law at redhat dot com  2005-01-31 21:35 ---
Subject: Re:  [meta-bug] optimizations that CSE still
catches

On Mon, 2005-01-31 at 20:14 +, stevenb at suse dot de wrote:
 --- Additional Comments From stevenb at suse dot de  2005-01-31 20:14 
 ---
 Subject: Re:  [meta-bug] optimizations that CSE still catches
 
 My numbers for not disabling CSE completely but disabling path following
 are a lot less pessimistic.  This was on an AMD Opteron at 1600MHz:
Right.  That's what I'd focus on first -- that's what I was looking
at when I realized eons ago when I realized that if we don't do a good
job at jump threading, then we have little hope of ever drastically
simplifying CSE.  I've been stuck in jump threading hell ever since :-)

Note I would _STRONGLY_ recommend people look at more than just the
compiler when evaluating the old CSE code.  In particular it is
important that we look at things like 64bit arithmetic on 32bit
hosts (which happens often in kernels, but not nearly as often
in user level benchmarks). 

jeff




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721