[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2008-01-22 Thread ubizjak at gmail dot com


--- Comment #25 from ubizjak at gmail dot com  2008-01-22 12:03 ---
Created an attachment (id=14996)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14996&action=view)
Much shorter testcase.

This testcase was used to track down problems with fre pass. Stay tuned for an
analysis.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2008-01-21 Thread lucier at math dot purdue dot edu


--- Comment #24 from lucier at math dot purdue dot edu  2008-01-21 22:43 
---
Subject: Re:  [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code


On Jan 21, 2008, at 2:21 PM, ubizjak at gmail dot com wrote:

> It is not possible to create an executable from direct.i.

That's correct, sorry.

> Could you attach the source that can be used to create the executable?

Here are instructions on how to build and test a modified version of  
Gambit, from which I derived direct.i.

Download the file

http://www.math.purdue.edu/~lucier/gcc/test-files/bugzilla/33928/ 
gambc-v4_1_2.tgz

Build it with the following commands:

> tar zxf gambc-v4_1_2.tgz
> cd gambc-v4_1_2
> ./configure CC='/pkgs/gcc-mainline/bin/gcc -save-temps'
> make -j

If you want to recompile the source after reconfiguring, do

> make mostlyclean


not 'make clean', unfortunately.

Then test it with

> gsi/gsi -e '(define a (time (expt 3 1000)))(define b (time (* a  
> a)))'

The output ends with something like

> (time (##bignum.make (##fixnum.quotient result-length  
> (##fixnum.quotient ##bignum.adigit-width ##bignum.fdigit-width)) #f  
> #f))
> 4 ms real time
> 5 ms cpu time (3 user, 2 system)
> no collections
> 3962448 bytes allocated
> 968 minor faults
> no major faults
> (time (##make-f64vector (##fixnum.* two^n 2)))
> 5 ms real time
> 5 ms cpu time (1 user, 4 system)
> 1 collection accounting for 5 ms real time (1 user, 4 system)
> 33554464 bytes allocated
> 59 minor faults
> no major faults
> (time (make-w (##fixnum.- log-two^n 1)))
> 30 ms real time
> 31 ms cpu time (17 user, 14 system)
> no collections
> 16810144 bytes allocated
> 4097 minor faults
> no major faults
> (time (make-w-rac log-two^n))
> 28 ms real time
> 28 ms cpu time (16 user, 12 system)
> no collections
> 16826272 bytes allocated
> 4097 minor faults
> no major faults
> (time (bignum->f64vector-rac x a))
> 45 ms real time
> 45 ms cpu time (20 user, 25 system)
> no collections
> -16 bytes allocated
> 8192 minor faults
> no major faults
> (time (componentwise-rac-multiply a rac-table))
> 26 ms real time
> 26 ms cpu time (26 user, 0 system)
> no collections
> -16 bytes allocated
> no minor faults
> no major faults
> (time (direct-fft-recursive-4 a table))
> 445 ms real time
> 445 ms cpu time (445 user, 0 system)
> no collections
> 64 bytes allocated
> no minor faults
> no major faults
> (time (componentwise-complex-multiply a a))
> 24 ms real time
> 24 ms cpu time (24 user, 0 system)
> no collections
> -16 bytes allocated
> no minor faults
> no major faults
> (time (inverse-fft-recursive-4 a table))
> 418 ms real time
> 418 ms cpu time (418 user, 0 system)
> no collections
> 64 bytes allocated
> no minor faults
> no major faults
> (time (componentwise-rac-multiply-conjugate a rac-table))
> 26 ms real time
> 26 ms cpu time (26 user, 0 system)
> no collections
> -16 bytes allocated
> no minor faults
> no major faults
> (time (bignum<-f64vector-rac a result result-length))
> 108 ms real time
> 108 ms cpu time (108 user, 0 system)
> no collections
> 112 bytes allocated
> no minor faults
> no major faults
> (time (* a a))
> 1170 ms real time
> 1170 ms cpu time (1105 user, 65 system)
> 1 collection accounting for 5 ms real time (1 user, 4 system)
> 71266896 bytes allocated
> 17413 minor faults
> no major faults


The time for the routine in direct.i is the time reported for direct- 
fft-recursive-4:

> (time (direct-fft-recursive-4 a table))
> 445 ms real time
> 445 ms cpu time (445 user, 0 system)
> no collections
> 64 bytes allocated
> no minor faults
> no major faults

The name of the routine in the .i and .s files is  
___H_direct_2d_fft_2d_recursive_2d_4.

By the way, ___H_inverse_2d_fft_2d_recursive_2d_4 is a similar  
routine implementing the inverse fft, which, for some reason, goes  
faster than the direct (forward) fft.

Brad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2008-01-21 Thread ubizjak at gmail dot com


--- Comment #23 from ubizjak at gmail dot com  2008-01-21 19:21 ---
It is not possible to create an executable from direct.i. My compilation fails:

(.text+0x20): undefined reference to `main'
/tmp/cc0VOLHm.o: In function `___H_direct_2d_fft_2d_recursive_2d_4':
_num.c:(.text+0xf1): undefined reference to `___gstate'
_num.c:(.text+0x18e): undefined reference to `___gstate'
_num.c:(.text+0x1c7): undefined reference to `___gstate'
_num.c:(.text+0x27b): undefined reference to `___gstate'
_num.c:(.text+0x2e0): undefined reference to `___gstate'
/tmp/cc0VOLHm.o:_num.c:(.text+0x6f0): more undefined references to `___gstate'
follow

Could you attach the source that can be used to create the executable? Or
perhaps a detailed instructions how to create one from sources you already
posted.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2008-01-12 Thread rguenth at gcc dot gnu dot org


--- Comment #22 from rguenth at gcc dot gnu dot org  2008-01-12 17:56 
---
I'm downgrading this to P2.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P1  |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2008-01-09 Thread lucier at math dot purdue dot edu


--- Comment #21 from lucier at math dot purdue dot edu  2008-01-09 18:44 
---
The assembler is identical to that in the third attachment and the time is
basically the same (other things were going on at the same time):

(time (direct-fft-recursive-4 a table))
465 ms real time
466 ms cpu time (466 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults

euler-86% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2
--with-mpfr=/pkgs/gmp-4.2.2 --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.3.0 20080109 (experimental) [trunk revision 131427] (GCC) 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2008-01-09 Thread rguenth at gcc dot gnu dot org


--- Comment #20 from rguenth at gcc dot gnu dot org  2008-01-09 12:45 
---
Can we have updated measurements please?  Also I don't think this bug should be
P1.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-12-01 Thread lucier at math dot purdue dot edu


--- Comment #19 from lucier at math dot purdue dot edu  2007-12-01 18:59 
---
Subject: Re:  [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code


On Nov 30, 2007, at 9:58 AM, bonzini at gnu dot org wrote:

> -fno-forward-propagate

I don't know how to debug this, that's clear enough, but adding -fno- 
forward-propagate as an option doesn't change the code at all.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-30 Thread bonzini at gnu dot org


--- Comment #18 from bonzini at gnu dot org  2007-11-30 14:58 ---
It would be -fno-forward-propagate, but what I meant is that the changes
*connected to* fwprop could be the culprit.  One has to look at dumps to
understand if this is the case.

It would be possible, maybe, to put an asm around the problematic basic block,
so that one could plot the number of instructions in that basic block over
time?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-30 Thread lucier at math dot purdue dot edu


--- Comment #17 from lucier at math dot purdue dot edu  2007-11-30 14:47 
---
Subject: Re:  [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code

On Nov 30, 2007, at 12:39 AM, bonzini at gnu dot org wrote:

> One suspect is fwprop.  Anyone can confirm?

How does one turn off fwprop?  It doesn't seem to like "-fno-fwprop".


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-29 Thread bonzini at gnu dot org


--- Comment #16 from bonzini at gnu dot org  2007-11-30 05:39 ---
One suspect is fwprop.  Anyone can confirm?


-- 

bonzini at gnu dot org changed:

   What|Removed |Added

 CC||bonzini at gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-26 Thread mmitchel at gcc dot gnu dot org


--- Comment #15 from mmitchel at gcc dot gnu dot org  2007-11-27 05:53 
---
I've marked this P1 because I'd like to see us start to explain these kinds of
dramatic performance changes.  If we can explain the issue coherently, we may
well decide that it's not important to fix it, but I think we ought to force
ourselves to figure out what's going on.


-- 

mmitchel at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P3  |P1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-18 Thread pinskia at gcc dot gnu dot org


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu dot
   ||org
   Target Milestone|--- |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-12 Thread lucier at math dot purdue dot edu


--- Comment #14 from lucier at math dot purdue dot edu  2007-11-12 21:53 
---
Created an attachment (id=14536)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14536&action=view)
4.3.0 assembly for code using a switch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-12 Thread lucier at math dot purdue dot edu


--- Comment #13 from lucier at math dot purdue dot edu  2007-11-12 21:52 
---
Created an attachment (id=14535)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14535&action=view)
4.2.2 assembly for code using switch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-12 Thread lucier at math dot purdue dot edu


--- Comment #12 from lucier at math dot purdue dot edu  2007-11-12 21:51 
---
Created an attachment (id=14534)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14534&action=view)
.i file using a switch instead of computed gotos

This is the generated code with a switch instead of computed gotos.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928



[Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code

2007-11-12 Thread lucier at math dot purdue dot edu


--- Comment #11 from lucier at math dot purdue dot edu  2007-11-12 21:50 
---
I suspected that the slowdown had nothing to do with computed gotos, so I
regenerated the C code using a switch instead of the computed gotos and got the
following:

For that same copy of mainline

gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC) 

:

(time (direct-fft-recursive-4 a table))
470 ms real time
470 ms cpu time (470 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults

For 4.2.2:

(time (direct-fft-recursive-4 a table))
384 ms real time
384 ms cpu time (383 user, 1 system)
no collections
64 bytes allocated
no minor faults
no major faults

So that's almost exactly the same slowdown as with computed gotos.

I changed the subject line to use 22% instead of 33% (I don't know how I got
33% before, perhaps I just mistyped it) and removed the phrase "with computed
gotos".

I'll include the new .i and .s files as attachments.


-- 

lucier at math dot purdue dot edu changed:

   What|Removed |Added

Summary|[4.3 Regression] 33%|[4.3 Regression] 22%
   |performance slowdown from   |performance slowdown from
   |4.2.2 to 4.3.0 in floating- |4.2.2 to 4.3.0 in floating-
   |point code with computed|point code
   |gotos   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928