[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #25 from ubizjak at gmail dot com 2008-01-22 12:03 --- Created an attachment (id=14996) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14996&action=view) Much shorter testcase. This testcase was used to track down problems with fre pass. Stay tuned for an analysis. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #24 from lucier at math dot purdue dot edu 2008-01-21 22:43 --- Subject: Re: [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code On Jan 21, 2008, at 2:21 PM, ubizjak at gmail dot com wrote: > It is not possible to create an executable from direct.i. That's correct, sorry. > Could you attach the source that can be used to create the executable? Here are instructions on how to build and test a modified version of Gambit, from which I derived direct.i. Download the file http://www.math.purdue.edu/~lucier/gcc/test-files/bugzilla/33928/ gambc-v4_1_2.tgz Build it with the following commands: > tar zxf gambc-v4_1_2.tgz > cd gambc-v4_1_2 > ./configure CC='/pkgs/gcc-mainline/bin/gcc -save-temps' > make -j If you want to recompile the source after reconfiguring, do > make mostlyclean not 'make clean', unfortunately. Then test it with > gsi/gsi -e '(define a (time (expt 3 1000)))(define b (time (* a > a)))' The output ends with something like > (time (##bignum.make (##fixnum.quotient result-length > (##fixnum.quotient ##bignum.adigit-width ##bignum.fdigit-width)) #f > #f)) > 4 ms real time > 5 ms cpu time (3 user, 2 system) > no collections > 3962448 bytes allocated > 968 minor faults > no major faults > (time (##make-f64vector (##fixnum.* two^n 2))) > 5 ms real time > 5 ms cpu time (1 user, 4 system) > 1 collection accounting for 5 ms real time (1 user, 4 system) > 33554464 bytes allocated > 59 minor faults > no major faults > (time (make-w (##fixnum.- log-two^n 1))) > 30 ms real time > 31 ms cpu time (17 user, 14 system) > no collections > 16810144 bytes allocated > 4097 minor faults > no major faults > (time (make-w-rac log-two^n)) > 28 ms real time > 28 ms cpu time (16 user, 12 system) > no collections > 16826272 bytes allocated > 4097 minor faults > no major faults > (time (bignum->f64vector-rac x a)) > 45 ms real time > 45 ms cpu time (20 user, 25 system) > no collections > -16 bytes allocated > 8192 minor faults > no major faults > (time (componentwise-rac-multiply a rac-table)) > 26 ms real time > 26 ms cpu time (26 user, 0 system) > no collections > -16 bytes allocated > no minor faults > no major faults > (time (direct-fft-recursive-4 a table)) > 445 ms real time > 445 ms cpu time (445 user, 0 system) > no collections > 64 bytes allocated > no minor faults > no major faults > (time (componentwise-complex-multiply a a)) > 24 ms real time > 24 ms cpu time (24 user, 0 system) > no collections > -16 bytes allocated > no minor faults > no major faults > (time (inverse-fft-recursive-4 a table)) > 418 ms real time > 418 ms cpu time (418 user, 0 system) > no collections > 64 bytes allocated > no minor faults > no major faults > (time (componentwise-rac-multiply-conjugate a rac-table)) > 26 ms real time > 26 ms cpu time (26 user, 0 system) > no collections > -16 bytes allocated > no minor faults > no major faults > (time (bignum<-f64vector-rac a result result-length)) > 108 ms real time > 108 ms cpu time (108 user, 0 system) > no collections > 112 bytes allocated > no minor faults > no major faults > (time (* a a)) > 1170 ms real time > 1170 ms cpu time (1105 user, 65 system) > 1 collection accounting for 5 ms real time (1 user, 4 system) > 71266896 bytes allocated > 17413 minor faults > no major faults The time for the routine in direct.i is the time reported for direct- fft-recursive-4: > (time (direct-fft-recursive-4 a table)) > 445 ms real time > 445 ms cpu time (445 user, 0 system) > no collections > 64 bytes allocated > no minor faults > no major faults The name of the routine in the .i and .s files is ___H_direct_2d_fft_2d_recursive_2d_4. By the way, ___H_inverse_2d_fft_2d_recursive_2d_4 is a similar routine implementing the inverse fft, which, for some reason, goes faster than the direct (forward) fft. Brad -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #23 from ubizjak at gmail dot com 2008-01-21 19:21 --- It is not possible to create an executable from direct.i. My compilation fails: (.text+0x20): undefined reference to `main' /tmp/cc0VOLHm.o: In function `___H_direct_2d_fft_2d_recursive_2d_4': _num.c:(.text+0xf1): undefined reference to `___gstate' _num.c:(.text+0x18e): undefined reference to `___gstate' _num.c:(.text+0x1c7): undefined reference to `___gstate' _num.c:(.text+0x27b): undefined reference to `___gstate' _num.c:(.text+0x2e0): undefined reference to `___gstate' /tmp/cc0VOLHm.o:_num.c:(.text+0x6f0): more undefined references to `___gstate' follow Could you attach the source that can be used to create the executable? Or perhaps a detailed instructions how to create one from sources you already posted. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #22 from rguenth at gcc dot gnu dot org 2008-01-12 17:56 --- I'm downgrading this to P2. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Priority|P1 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #21 from lucier at math dot purdue dot edu 2008-01-09 18:44 --- The assembler is identical to that in the third attachment and the time is basically the same (other things were going on at the same time): (time (direct-fft-recursive-4 a table)) 465 ms real time 466 ms cpu time (466 user, 0 system) no collections 64 bytes allocated no minor faults no major faults euler-86% /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2 --enable-gather-detailed-mem-stats Thread model: posix gcc version 4.3.0 20080109 (experimental) [trunk revision 131427] (GCC) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #20 from rguenth at gcc dot gnu dot org 2008-01-09 12:45 --- Can we have updated measurements please? Also I don't think this bug should be P1. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #19 from lucier at math dot purdue dot edu 2007-12-01 18:59 --- Subject: Re: [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code On Nov 30, 2007, at 9:58 AM, bonzini at gnu dot org wrote: > -fno-forward-propagate I don't know how to debug this, that's clear enough, but adding -fno- forward-propagate as an option doesn't change the code at all. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #18 from bonzini at gnu dot org 2007-11-30 14:58 --- It would be -fno-forward-propagate, but what I meant is that the changes *connected to* fwprop could be the culprit. One has to look at dumps to understand if this is the case. It would be possible, maybe, to put an asm around the problematic basic block, so that one could plot the number of instructions in that basic block over time? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #17 from lucier at math dot purdue dot edu 2007-11-30 14:47 --- Subject: Re: [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code On Nov 30, 2007, at 12:39 AM, bonzini at gnu dot org wrote: > One suspect is fwprop. Anyone can confirm? How does one turn off fwprop? It doesn't seem to like "-fno-fwprop". -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #16 from bonzini at gnu dot org 2007-11-30 05:39 --- One suspect is fwprop. Anyone can confirm? -- bonzini at gnu dot org changed: What|Removed |Added CC||bonzini at gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #15 from mmitchel at gcc dot gnu dot org 2007-11-27 05:53 --- I've marked this P1 because I'd like to see us start to explain these kinds of dramatic performance changes. If we can explain the issue coherently, we may well decide that it's not important to fix it, but I think we ought to force ourselves to figure out what's going on. -- mmitchel at gcc dot gnu dot org changed: What|Removed |Added Priority|P3 |P1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
-- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #14 from lucier at math dot purdue dot edu 2007-11-12 21:53 --- Created an attachment (id=14536) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14536&action=view) 4.3.0 assembly for code using a switch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #13 from lucier at math dot purdue dot edu 2007-11-12 21:52 --- Created an attachment (id=14535) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14535&action=view) 4.2.2 assembly for code using switch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #12 from lucier at math dot purdue dot edu 2007-11-12 21:51 --- Created an attachment (id=14534) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14534&action=view) .i file using a switch instead of computed gotos This is the generated code with a switch instead of computed gotos. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
--- Comment #11 from lucier at math dot purdue dot edu 2007-11-12 21:50 --- I suspected that the slowdown had nothing to do with computed gotos, so I regenerated the C code using a switch instead of the computed gotos and got the following: For that same copy of mainline gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC) : (time (direct-fft-recursive-4 a table)) 470 ms real time 470 ms cpu time (470 user, 0 system) no collections 64 bytes allocated no minor faults no major faults For 4.2.2: (time (direct-fft-recursive-4 a table)) 384 ms real time 384 ms cpu time (383 user, 1 system) no collections 64 bytes allocated no minor faults no major faults So that's almost exactly the same slowdown as with computed gotos. I changed the subject line to use 22% instead of 33% (I don't know how I got 33% before, perhaps I just mistyped it) and removed the phrase "with computed gotos". I'll include the new .i and .s files as attachments. -- lucier at math dot purdue dot edu changed: What|Removed |Added Summary|[4.3 Regression] 33%|[4.3 Regression] 22% |performance slowdown from |performance slowdown from |4.2.2 to 4.3.0 in floating- |4.2.2 to 4.3.0 in floating- |point code with computed|point code |gotos | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928