[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread rguenth at gcc dot gnu dot org


--- Comment #4 from rguenth at gcc dot gnu dot org  2006-09-28 11:08 ---
On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times.  This
is
a code-size regression, but other than that?  The 4.2 version runs slightly
faster than the 4.1 version, though the difference may be in the noise.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu dot
   ||org, rakdver at gcc dot gnu
   ||dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread rakdver at gcc dot gnu dot org


--- Comment #5 from rakdver at gcc dot gnu dot org  2006-09-28 11:34 ---
(In reply to comment #4)
 On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times.  This
 is
 a code-size regression, but other than that?  The 4.2 version runs slightly
 faster than the 4.1 version, though the difference may be in the noise.

Choosing 9 instead of 8 looks weird, though :-).  The reason is following:
jump threading in vrp2 pass peels one iteration of the loop.  With this change,
unrolling by factor of 9 creates smaller code (only one extra iteration needs
to be peeled to make the number of iterations divisible by 9, while one would
need to peel 7 more iterations to make it divisible by 8).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread pinskia at gcc dot gnu dot org


--- Comment #6 from pinskia at gcc dot gnu dot org  2006-09-28 13:47 ---
(In reply to comment #4)
 On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times.  This
 is
 a code-size regression, but other than that?  The 4.2 version runs slightly
 faster than the 4.1 version, though the difference may be in the noise.

No, no, no, I and Edmar are not complaining about how many times it unrolled
but the use of index addressing mode instead of offset addressing mode for the
stores and the extra adds.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread rguenth at gcc dot gnu dot org


--- Comment #7 from rguenth at gcc dot gnu dot org  2006-09-28 14:02 ---
Oh, but those do not happen on x86_64.  So this is a target issue really.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread pinskia at gcc dot gnu dot org


--- Comment #8 from pinskia at gcc dot gnu dot org  2006-09-28 14:08 ---
  D.1563 = -a;
  MEM[base: (int *) D.1563 + c, index: D.1562] = MEM[base: D.1562];

WTFFF


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread rguenth at gcc dot gnu dot org


--- Comment #9 from rguenth at gcc dot gnu dot org  2006-09-28 14:11 ---
Oh, didn't I fix this?  See PR26726.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-28 Thread rakdver at gcc dot gnu dot org


--- Comment #10 from rakdver at gcc dot gnu dot org  2006-09-28 14:15 
---
(In reply to comment #8)
   D.1563 = -a;
   MEM[base: (int *) D.1563 + c, index: D.1562] = MEM[base: D.1562];
 
 WTFFF

ivopts are having fun :-)  On the other hand, this is (one of several possible)
cheapest ways how to express the code, and it should not affect creation of
offsetted modes on RTL, so although this is indeed somewhat curious (well, bug
in fact, from reasons unrelated to the problem covered by this PR), it is not
the cause of this problem.

On x86, tree optimizers seem to do just fine, producing

MEM[symbol: c, index: D.1569, step: 8B] = MEM[symbol: a, index: D.1569, step:
8B];

However, on RTL, we fail to create offsetted version of this addressing mode
after unrolling.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256



[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression

2006-09-27 Thread pinskia at gcc dot gnu dot org


--- Comment #3 from pinskia at gcc dot gnu dot org  2006-09-28 02:59 ---
This is a generic regression, x86 has the same problem with the code.  Even
doing -Ddouble=int, we have the same problem.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu dot
   ||org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   GCC host triplet|x86_64-unknown-linux-gnu|
 GCC target triplet|powerpc-unknown-linux-gnuspe|
   Keywords||missed-optimization
  Known to fail||4.2.0
  Known to work||4.1.2
   Last reconfirmed|-00-00 00:00:00 |2006-09-28 02:59:57
   date||
Summary|[4.2 regression] performance|[4.2 regression] loop
   |regression with double on   |unrolling performance
   |SPE2|regression
   Target Milestone|--- |4.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256