[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From steven at gcc dot gnu dot org 2005-02-10 11:02 --- In comment #3 Zdenek said Possibly even better would be to add generation of autoincrements to loop optimizer, but this would require fixing cse so that it handles them correctly. Zdenek, can you elaborate on why CSE needs fixing? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-02-10 11:12 --- Subject: Re: [4.0 Regression] Poor quality code after loop unrolling. In comment #3 Zdenek said Possibly even better would be to add generation of autoincrements to loop optimizer, but this would require fixing cse so that it handles them correctly. Zdenek, can you elaborate on why CSE needs fixing? cse does not handle autoincrements. I have no idea what's the problem there, it is just what I was told when I asked for the possibility to move the autoinc creation pass last time. Anyone has more precise information about the nature of the problem? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From law at redhat dot com 2005-02-10 18:01 --- Subject: Re: [4.0 Regression] Poor quality code after loop unrolling. On Thu, 2005-02-10 at 12:12 +0100, Zdenek Dvorak wrote: In comment #3 Zdenek said Possibly even better would be to add generation of autoincrements to loop optimizer, but this would require fixing cse so that it handles them correctly. Zdenek, can you elaborate on why CSE needs fixing? cse does not handle autoincrements. I have no idea what's the problem there, it is just what I was told when I asked for the possibility to move the autoinc creation pass last time. Anyone has more precise information about the nature of the problem? It's been about a decade since I looked at cse vs autoincrements, so the details have faded from memory. [The original context I found the problem was in an attempt to run cse after reload. ] Anyway, from a 30 second look at CSE the first thing that jumps out at me is I don't think we would invalidate objects in the hash table which are auto-incremented. Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From kenner at vlsi1 dot ultra dot nyu dot edu 2005-02-10 18:12 --- Subject: Re: [4.0 Regression] Poor quality code after loop unrolling. It's been about a decade since I looked at cse vs autoincrements, so the details have faded from memory. [The original context I found the problem was in an attempt to run cse after reload. ] My recollection is that we never used to allow autoincrements before CSE with the exception of autoinc on SP. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-01-24 13:20 --- Subject: Re: [4.0 Regression] Poor quality code after loop unrolling. Zdenek, is this still a regression, or are your suggestions from comment #12 only enhancements? I think it still falls into regression cathegory (we produce worse code than 3.3); the suggestions would help overcome this problems, but they are either not nice or requiring large changes. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From steven at gcc dot gnu dot org 2005-01-21 14:06 --- Zdenek, is this still a regression, or are your suggestions from comment #12 only enhancements? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From pcarlini at suse dot de 2004-12-25 20:32 --- Zdenek, sorry, is your patch in? I think Rth approved it! http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01613.html Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From cvs-commit at gcc dot gnu dot org 2004-12-25 22:54 --- Subject: Bug 19078 CVSROOT:/cvs/gcc Module name:gcc Changes by: [EMAIL PROTECTED] 2004-12-25 22:53:55 Modified files: gcc: ChangeLog tree-ssa-loop-ivopts.c Log message: PR rtl-optimization/19078 * tree-ssa-loop-ivopts.c (determine_use_iv_cost_generic, determine_use_iv_cost_outer): Fix computing of cost for the original bivs. (dump_use): Handle case related_cands == NULL. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gccr1=2.6955r2=2.6956 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gccr1=2.38r2=2.39 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From rakdver at gcc dot gnu dot org 2004-12-25 22:58 --- Not closing the bug yet. There are futher issues; at least -- we might want to be able to somehow determine whether splitting ivs is profittable, instead of doing it unconditionally -- we might want to improve ivopts to take autoincrement addressing modes into account -- we might want to make it possible to run autoincrement addressing modes creation pass before unroller -- What|Removed |Added Keywords|patch | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-22 16:45 --- (In reply to comment #6) ;) Well, many people believe I look too *often* at microbenchmarks... ;) Also sometimes micro benchmarks come from bigger code and shows up in the profile as the hot loop. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From steven at gcc dot gnu dot org 2004-12-20 15:04 --- And, Paolo, when was the last time you looked at microbenchmarks? ;-) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From pcarlini at suse dot de 2004-12-20 15:13 --- ;) Well, many people believe I look too *often* at microbenchmarks... ;) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From pcarlini at suse dot de 2004-12-20 15:22 --- More seriously, I think that we (the libstdc++-v3 people) should more carefully test the effect of the new optimizations on std::algorithm: indeed, we are talking about benchmarks, not pointless microbenchmarks: std:algorithm is *full* of small loops like this one. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From rakdver at gcc dot gnu dot org 2004-12-20 18:44 --- Patch: http://gcc.gnu.org/ml/gcc-patches/2004-12/msg01554.html -- What|Removed |Added Keywords||patch http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From pinskia at gcc dot gnu dot org 2004-12-19 13:23 --- In 3.3.2, the main loop is: L7: lwz r6,0(r9) cmpwi cr0,r6,2 beq- cr0,L1 lwzu r7,4(r9) cmpwi cr0,r7,2 beq- cr0,L1 lwzu r8,4(r9) cmpwi cr0,r8,2 beq- cr0,L1 lwzu r10,4(r9) cmpwi cr0,r10,2 beq- cr0,L1 addi r9,r9,4 cmpw cr0,r9,r4 bne+ cr0,L7 in 4.0.0: L58: mr r9,r11 L7: cmpw cr7,r4,r9 beq- cr7,L5 lwz r0,0(r9) addi r11,r9,4 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,0(r11) mr r2,r11 mr r9,r11 addi r11,r11,4 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,0(r11) mr r9,r11 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,8(r2) addi r9,r2,8 addi r11,r2,12 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,12(r2) mr r9,r11 addi r11,r2,16 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,16(r2) mr r9,r11 addi r11,r2,20 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,20(r2) mr r9,r11 addi r11,r2,24 cmpwi cr7,r0,2 beq- cr7,L5 lwz r0,24(r2) mr r9,r11 addi r11,r2,28 cmpwi cr7,r0,2 bne+ cr7,L58 Notice how in 3.3.2, we used lwzu, that is needed. -- What|Removed |Added Severity|normal |minor Status|UNCONFIRMED |NEW Component|c |rtl-optimization Ever Confirmed||1 GCC host triplet|i686-pc-linux-gnu | Keywords||missed-optimization Known to fail||4.0.0 Known to work||3.3.2 Last reconfirmed|-00-00 00:00:00 |2004-12-19 13:23:03 date|| Target Milestone|--- |4.0.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From rakdver at gcc dot gnu dot org 2004-12-19 19:41 --- Unroller splits the induction variables, so that the final code looks basically like if (a[0] == 2) return a; if (a[1] == 2) return a + 4; if (a[2] == 2) return a + 8; ... if (a[7] == 2) return a + 28; a+=32; Which is good in some cases, but obviously not here. However even with -fno-split-ivs-in-unroller we do not get the autoincrements; we also need -fno-ivopts. The reason is that with ivopts the code looks like a = a.1; a.1 = a + 1; if (*a == 2) return a; Whereas the old loop optimizer makes things look like a = a + 1 if (*a == 2) return 0; by changing the initial value of a, which enables the autoinc creation pass to work. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From rakdver at gcc dot gnu dot org 2004-12-19 22:04 --- With minor adjustment in ivopts, we get the same code as in 3.3 with fno-split- ivs-in-unroller, and more reasonably looking code without; I'm testing the patch just now. Of course we cannot have autoincrements and iv splitting at the same time. It might be possible to use some heuristics to disable iv splitting if it does not seem useful. Possibly even better would be to add generation of autoincrements to loop optimizer, but this would require fixing cse so that it handles them correctly. -- What|Removed |Added AssignedTo|unassigned at gcc dot gnu |rakdver at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078
[Bug rtl-optimization/19078] [4.0 Regression] Poor quality code after loop unrolling.
--- Additional Comments From pcarlini at suse dot de 2004-12-19 22:11 --- Thanks Zdenek. Very frankly, I'm somewhat surprised that we are noticing only relatively late these problems: such loops seem *so* simple and common... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19078