[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #4 from rguenth at gcc dot gnu dot org 2006-09-28 11:08 --- On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This is a code-size regression, but other than that? The 4.2 version runs slightly faster than the 4.1 version, though the difference may be in the noise. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||rguenth at gcc dot gnu dot ||org, rakdver at gcc dot gnu ||dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #5 from rakdver at gcc dot gnu dot org 2006-09-28 11:34 --- (In reply to comment #4) On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This is a code-size regression, but other than that? The 4.2 version runs slightly faster than the 4.1 version, though the difference may be in the noise. Choosing 9 instead of 8 looks weird, though :-). The reason is following: jump threading in vrp2 pass peels one iteration of the loop. With this change, unrolling by factor of 9 creates smaller code (only one extra iteration needs to be peeled to make the number of iterations divisible by 9, while one would need to peel 7 more iterations to make it divisible by 8). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #6 from pinskia at gcc dot gnu dot org 2006-09-28 13:47 --- (In reply to comment #4) On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This is a code-size regression, but other than that? The 4.2 version runs slightly faster than the 4.1 version, though the difference may be in the noise. No, no, no, I and Edmar are not complaining about how many times it unrolled but the use of index addressing mode instead of offset addressing mode for the stores and the extra adds. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #7 from rguenth at gcc dot gnu dot org 2006-09-28 14:02 --- Oh, but those do not happen on x86_64. So this is a target issue really. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #8 from pinskia at gcc dot gnu dot org 2006-09-28 14:08 --- D.1563 = -a; MEM[base: (int *) D.1563 + c, index: D.1562] = MEM[base: D.1562]; WTFFF -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #9 from rguenth at gcc dot gnu dot org 2006-09-28 14:11 --- Oh, didn't I fix this? See PR26726. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #10 from rakdver at gcc dot gnu dot org 2006-09-28 14:15 --- (In reply to comment #8) D.1563 = -a; MEM[base: (int *) D.1563 + c, index: D.1562] = MEM[base: D.1562]; WTFFF ivopts are having fun :-) On the other hand, this is (one of several possible) cheapest ways how to express the code, and it should not affect creation of offsetted modes on RTL, so although this is indeed somewhat curious (well, bug in fact, from reasons unrelated to the problem covered by this PR), it is not the cause of this problem. On x86, tree optimizers seem to do just fine, producing MEM[symbol: c, index: D.1569, step: 8B] = MEM[symbol: a, index: D.1569, step: 8B]; However, on RTL, we fail to create offsetted version of this addressing mode after unrolling. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
[Bug middle-end/29256] [4.2 regression] loop unrolling performance regression
--- Comment #3 from pinskia at gcc dot gnu dot org 2006-09-28 02:59 --- This is a generic regression, x86 has the same problem with the code. Even doing -Ddouble=int, we have the same problem. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 GCC host triplet|x86_64-unknown-linux-gnu| GCC target triplet|powerpc-unknown-linux-gnuspe| Keywords||missed-optimization Known to fail||4.2.0 Known to work||4.1.2 Last reconfirmed|-00-00 00:00:00 |2006-09-28 02:59:57 date|| Summary|[4.2 regression] performance|[4.2 regression] loop |regression with double on |unrolling performance |SPE2|regression Target Milestone|--- |4.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256