[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2005-02-10 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-02-10 
11:02 ---
In comment #3 Zdenek said Possibly even better would be to add generation of 
autoincrements to loop optimizer, but this would require fixing cse so that it 
handles them correctly.  Zdenek, can you elaborate on why CSE needs fixing? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2005-02-10 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-02-10 11:12 ---
Subject: Re:  [4.0 Regression] Poor quality code after loop unrolling.

 In comment #3 Zdenek said Possibly even better would be to add generation of 
 autoincrements to loop optimizer, but this would require fixing cse so that 
 it 
 handles them correctly.  Zdenek, can you elaborate on why CSE needs fixing? 

cse does not handle autoincrements.  I have no idea what's the problem
there, it is just what I was told when I asked for the possibility to
move the autoinc creation pass last time.  Anyone has more precise
information about the nature of the problem?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2005-02-10 Thread law at redhat dot com

--- Additional Comments From law at redhat dot com  2005-02-10 18:01 ---
Subject: Re:  [4.0 Regression] Poor quality
code after loop unrolling.

On Thu, 2005-02-10 at 12:12 +0100, Zdenek Dvorak wrote:
  In comment #3 Zdenek said Possibly even better would be to add generation 
  of 
  autoincrements to loop optimizer, but this would require fixing cse so that 
  it 
  handles them correctly.  Zdenek, can you elaborate on why CSE needs 
  fixing? 
 
 cse does not handle autoincrements.  I have no idea what's the problem
 there, it is just what I was told when I asked for the possibility to
 move the autoinc creation pass last time.  Anyone has more precise
 information about the nature of the problem?
It's been about a decade since I looked at cse vs autoincrements, so
the details have faded from memory.  [The original context I found the
problem was in an attempt to run cse after reload. ]

Anyway, from a 30 second look at CSE the first thing that jumps out at
me is I don't think we would invalidate objects in the hash table which
are auto-incremented.

Jeff



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2005-02-10 Thread kenner at vlsi1 dot ultra dot nyu dot edu

--- Additional Comments From kenner at vlsi1 dot ultra dot nyu dot edu  
2005-02-10 18:12 ---
Subject: Re:  [4.0 Regression] Poor quality code after loop unrolling.

It's been about a decade since I looked at cse vs autoincrements, so
the details have faded from memory.  [The original context I found the
problem was in an attempt to run cse after reload. ]

My recollection is that we never used to allow autoincrements before CSE
with the exception of autoinc on SP.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2005-01-24 Thread rakdver at atrey dot karlin dot mff dot cuni dot cz

--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni 
dot cz  2005-01-24 13:20 ---
Subject: Re:  [4.0 Regression] Poor quality code after loop unrolling.

 Zdenek, is this still a regression, or are your suggestions from 
 comment #12 only enhancements? 

I think it still falls into regression cathegory (we produce worse code
than 3.3); the suggestions would help overcome this problems, but they
are either not nice or requiring large changes.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2005-01-21 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2005-01-21 
14:06 ---
Zdenek, is this still a regression, or are your suggestions from 
comment #12 only enhancements? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-25 Thread pcarlini at suse dot de

--- Additional Comments From pcarlini at suse dot de  2004-12-25 20:32 
---
Zdenek, sorry, is your patch in? I think Rth approved it!

   http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01613.html

Thanks.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-25 Thread cvs-commit at gcc dot gnu dot org

--- Additional Comments From cvs-commit at gcc dot gnu dot org  2004-12-25 
22:54 ---
Subject: Bug 19078

CVSROOT:/cvs/gcc
Module name:gcc
Changes by: [EMAIL PROTECTED]   2004-12-25 22:53:55

Modified files:
gcc: ChangeLog tree-ssa-loop-ivopts.c 

Log message:
PR rtl-optimization/19078
* tree-ssa-loop-ivopts.c (determine_use_iv_cost_generic,
determine_use_iv_cost_outer): Fix computing of cost for the original
bivs.
(dump_use): Handle case related_cands == NULL.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gccr1=2.6955r2=2.6956
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gccr1=2.38r2=2.39



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-25 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-12-25 
22:58 ---
Not closing the bug yet.  There are futher issues; at least

-- we might want to be able to somehow determine whether splitting ivs is 
profittable, instead of doing it unconditionally
-- we might want to improve ivopts to take autoincrement addressing modes into 
account
-- we might want to make it possible to run autoincrement addressing modes 
creation pass before unroller

-- 
   What|Removed |Added

   Keywords|patch   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-22 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-22 
16:45 ---
(In reply to comment #6)
 ;) Well, many people believe I look too *often* at microbenchmarks... ;)
Also sometimes micro benchmarks come from bigger code and shows up in the 
profile as the hot loop.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-20 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-12-20 
15:04 ---
And, Paolo, when was the last time you looked at microbenchmarks?  ;-)



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-20 Thread pcarlini at suse dot de

--- Additional Comments From pcarlini at suse dot de  2004-12-20 15:13 
---
;) Well, many people believe I look too *often* at microbenchmarks... ;)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-20 Thread pcarlini at suse dot de

--- Additional Comments From pcarlini at suse dot de  2004-12-20 15:22 
---
More seriously, I think that we (the libstdc++-v3 people) should more carefully
test the effect of the new optimizations on std::algorithm: indeed, we are 
talking
about benchmarks, not pointless microbenchmarks: std:algorithm is *full* of 
small
loops like this one.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-20 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-12-20 
18:44 ---
Patch:

http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01554.html

-- 
   What|Removed |Added

   Keywords||patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-19 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-19 
13:23 ---
In 3.3.2, the main loop is:
L7:
lwz r6,0(r9)
cmpwi cr0,r6,2
beq- cr0,L1
lwzu r7,4(r9)
cmpwi cr0,r7,2
beq- cr0,L1
lwzu r8,4(r9)
cmpwi cr0,r8,2
beq- cr0,L1
lwzu r10,4(r9)
cmpwi cr0,r10,2
beq- cr0,L1
addi r9,r9,4
cmpw cr0,r9,r4
bne+ cr0,L7

in 4.0.0:
L58:
mr r9,r11
L7:
cmpw cr7,r4,r9
beq- cr7,L5
lwz r0,0(r9)
addi r11,r9,4
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,0(r11)
mr r2,r11
mr r9,r11
addi r11,r11,4
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,0(r11)
mr r9,r11
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,8(r2)
addi r9,r2,8
addi r11,r2,12
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,12(r2)
mr r9,r11
addi r11,r2,16
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,16(r2)
mr r9,r11
addi r11,r2,20
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,20(r2)
mr r9,r11
addi r11,r2,24
cmpwi cr7,r0,2
beq- cr7,L5
lwz r0,24(r2)
mr r9,r11
addi r11,r2,28
cmpwi cr7,r0,2
bne+ cr7,L58

Notice how in 3.3.2, we used lwzu, that is needed.

-- 
   What|Removed |Added

   Severity|normal  |minor
 Status|UNCONFIRMED |NEW
  Component|c   |rtl-optimization
 Ever Confirmed||1
   GCC host triplet|i686-pc-linux-gnu   |
   Keywords||missed-optimization
  Known to fail||4.0.0
  Known to work||3.3.2
   Last reconfirmed|-00-00 00:00:00 |2004-12-19 13:23:03
   date||
   Target Milestone|--- |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-19 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-12-19 
19:41 ---
Unroller splits the induction variables, so that the final code looks basically 
like

if (a[0] == 2)
  return a;
if (a[1] == 2)
  return a + 4;
if (a[2] == 2)
  return a + 8;
...
if (a[7] == 2)
  return a + 28;
a+=32;

Which is good in some cases, but obviously not here.

However even with -fno-split-ivs-in-unroller we do not get the autoincrements; 
we also need -fno-ivopts.  The reason is that with ivopts the code looks like

a = a.1;
a.1 = a + 1;
if (*a == 2)
  return a;

Whereas the old loop optimizer makes things look like

a = a + 1
if (*a == 2)
  return 0;

by changing the initial value of a, which enables the autoinc creation pass to 
work.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-19 Thread rakdver at gcc dot gnu dot org

--- Additional Comments From rakdver at gcc dot gnu dot org  2004-12-19 
22:04 ---
With minor adjustment in ivopts, we get the same code as in 3.3 with fno-split-
ivs-in-unroller, and more reasonably looking code without; I'm testing the 
patch just now.

Of course we cannot have autoincrements and iv splitting at the same time. It 
might be possible to use some heuristics to disable iv splitting if it does not 
seem useful.  Possibly even better would be to add generation of autoincrements 
to loop optimizer, but this would require fixing cse so that it handles them 
correctly.

-- 
   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |rakdver at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078


[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.

2004-12-19 Thread pcarlini at suse dot de

--- Additional Comments From pcarlini at suse dot de  2004-12-19 22:11 
---
Thanks Zdenek. Very frankly, I'm somewhat surprised that we are noticing only
relatively late these problems: such loops seem *so* simple and common...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078