[Bug target/27827] [4.0 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827 Bug 27827 depends on bug 27855, which changed state. Bug 27855 Summary: [6/7/8 regression] reassociation causes the RA to be confused https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/27827] [4.0 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #70 from pinskia at gcc dot gnu dot org 2007-02-13 02:59 --- Fixed, 4.0 branch is now been closed. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #69 from steven at gcc dot gnu dot org 2006-10-07 10:06 --- The linked-to patch is already on the trunk. -- steven at gcc dot gnu dot org changed: What|Removed |Added URL|http://gcc.gnu.org/ml/gcc- | |patches/2006- | |08/msg00113.html| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #66 from bonzini at gnu dot org 2006-08-11 14:10 --- (on bugzilla because I had problems sending mail to you) Just got your most recent update. From what I can tell, you have applied your patch to the 4.1 series, so that the next 4.1 release will have the fix? Yes. So, my question is that I notice the comment says: * config/i386/i386.md: Add peephole2 to avoid fld %st instructions. Which, if its what we've been doing should be something like: * config/i386/i386.md: Add peephole2 to substitute fld for memory-source fmul No, what my patch does is exactly replacing fld reg + fmul mem with fld mem + fmul reg,reg. Maybe the ChangeLog is not completely descriptive, but the PR number is there and will make things clear enough. BTW, it's going to remain the case that you must do at least -O2 to get this peephole invoked? You can add -fpeephole2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug target/27827] [4.0 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #67 from whaley at cs dot utsa dot edu 2006-08-11 15:22 --- Uros, Slightly offtopic, but to put some numbers to comment #8 and comment #11, equivalent SSE code now reaches only 50% of x87 single performance and 60% of x87 double performance on AMD x86_64 FYI, you *may* get slightly better single SSE performance with these flags: -fomit-frame-pointer -march=athlon64 -O2 -mfpmath=sse \ -msse -msse2 -msse3 -fargument-noalias-global Also, when ATLAS is allowed to exercise the code generator to find the best kernel, for double precision gcc 4's SSE could be made to almost tie gcc3's x87 performance (gcc3's double x87 performance is roughly 92% of the patched gcc 4 for this platform). However, single precision SSE, even allowing the code generator to go crazy, could only achieve about 2/3 of double *SSE* performance, and since x87 single perf is actually greater for x87 . . . You can find some details at: https://sourceforge.net/mailarchive/forum.php?thread_id=10026092forum_id=426 Cheers, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827