[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2007-07-09 Thread ubizjak at gmail dot com


--- Comment #15 from ubizjak at gmail dot com  2007-07-09 18:16 ---
New timings on x86_64 core2 (from [1])

The tests were performed on core2 in 64bit mode, using '-DREPS=1 -O3 -msse3
-march=core2 -ffast-math' flags, with and without newly introduced
-fno-tree-reassoc flag.

The results were _interesting_, showing extreme differences in the run times:

w/o -fno-tree-reassoc:

ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

-DTYPE=float: atlasmm   60  1   2.000 2159.87
-DTYPE=double:atlasmm   60  1   2.500 1727.89

w/ -fno-tree-reassoc:

ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==

-DTYPE=float: atlasmm   60  1   0.932 4634.90
-DTYPE=double:atlasmm   60  1   1.520 2841.93

[1] http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00849.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2007-07-09 Thread uros at gcc dot gnu dot org


--- Comment #16 from uros at gcc dot gnu dot org  2007-07-09 19:22 ---
Subject: Bug 27855

Author: uros
Date: Mon Jul  9 19:22:03 2007
New Revision: 126491

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=126491
Log:
PR target/27855
* doc/extend.texi: Add ftree-reassoc flag.
* common.opt (ftree-reassoc): New flag.
* tree-ssa-reassoc.c (gate_tree_ssa_reassoc): New static function.
(struct tree_opt_pass pass_reassoc): Use gate_tree_ssa_reassoc.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/common.opt
trunk/gcc/doc/invoke.texi
trunk/gcc/tree-ssa-reassoc.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2007-03-02 Thread ubizjak at gmail dot com


--- Comment #13 from ubizjak at gmail dot com  2007-03-02 15:34 ---
Any news about this problem?
Current mainline still has severe problems:

-msse3 -O2 -mfpmath=sse -ffast-math
GCC 4.3 -ffast-math double performance:
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==
atlasmm   60   1000   0.288 1499.91

-msse3 -O2 -mfpmath=sse
GCC 4.3 double performance:
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==
atlasmm   60   1000   0.192 2249.86

-msse3 -O2 -mfpmath=sse -ffast-math
GCC 4.3 -ffast-math single performance:
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==
atlasmm   60   1000   0.304 1420.96

-msse3 -O2 -mfpmath=sse
GCC 4.3 single performance:
ALGORITHM NB   REPSTIME  MFLOPS
=  =  =  ==  ==
atlasmm   60   1000   0.172 2511.48

Please consider the fact that all benchmarks are using -ffast-math nowadays. ;)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2007-03-02 Thread pinskia at gcc dot gnu dot org


--- Comment #14 from pinskia at gcc dot gnu dot org  2007-03-02 17:42 
---
 Please consider the fact that all benchmarks are using -ffast-math nowadays. 
 ;)

Please also consider the fact that the register allocator has been broken since
20 years ago :) :) :) :).

And I repeat again, this has nothing to do with -ffast-math, see my comment #6
and #7 where I prove -ffast-math is not the issue and it is just the register
allocator going wrong.  If anyone disables reassociation at the tree level, I
am going to object loudly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-10-07 Thread steven at gcc dot gnu dot org


--- Comment #11 from steven at gcc dot gnu dot org  2006-10-07 10:05 ---
Would anyone object if I'd propose to disable reassociation for floating point
thingies on x86 for GCC 4.2?  We can re-enable it if/when amacleod's new
out-of-ssa stuff fixes this for real...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-10-07 Thread pinskia at gcc dot gnu dot org


--- Comment #12 from pinskia at gcc dot gnu dot org  2006-10-07 16:36 
---
(In reply to comment #11)
 Would anyone object if I'd propose to disable reassociation for floating point
 thingies on x86 for GCC 4.2?  We can re-enable it if/when amacleod's new
 out-of-ssa stuff fixes this for real...

Yes I do because it helps PPC which is why I added in the first place.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-05 Thread amacleod at redhat dot com


--- Comment #9 from amacleod at redhat dot com  2006-06-05 15:46 ---
This thread is moving dangerously close to work in progress.. :-)
I'll have something more definitive to say about it in a few weeks, but I'm
looking at doing significant register pressure reduction at out of ssa time as
a component of a larger hunk of work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-05 Thread dberlin at gcc dot gnu dot org


--- Comment #10 from dberlin at gcc dot gnu dot org  2006-06-05 15:57 
---
(In reply to comment #9)
 This thread is moving dangerously close to work in progress.. :-)
 I'll have something more definitive to say about it in a few weeks, but I'm
 looking at doing significant register pressure reduction at out of ssa time as
 a component of a larger hunk of work.

Just be careful. The last time i tried something like this, it produced good
code on x86, and really crappy code everywhere else.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-03 Thread steven at gcc dot gnu dot org


--- Comment #8 from steven at gcc dot gnu dot org  2006-06-03 23:49 ---
You could add a basic block list scheduler at the tree level just before
out-of-ssa, with heuristics to make life times as short as possible :-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread uros at kss-loka dot si


--- Comment #2 from uros at kss-loka dot si  2006-06-02 10:04 ---
(In reply to comment #1)
 There is nothing special about reassociation at all.  In fact what you are
 seeing is register allocator going funky.  This what you get with x87.

This is also what you get with SSE.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread pinskia at gcc dot gnu dot org


--- Comment #3 from pinskia at gcc dot gnu dot org  2006-06-02 10:19 ---
(In reply to comment #2)
 This is also what you get with SSE.
And how many registers does SSE have, not many.  Try it on PPC or any processor
have more registers?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread steven at gcc dot gnu dot org


--- Comment #4 from steven at gcc dot gnu dot org  2006-06-02 23:19 ---
Real bug, despite Andrew's usual portion of x86-hate.


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-06-02 23:19:36
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



Re: [Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread Daniel Berlin
steven at gcc dot gnu dot org wrote:
 --- Comment #4 from steven at gcc dot gnu dot org  2006-06-02 23:19 
 ---
 Real bug, despite Andrew's usual portion of x86-hate.
 
 

It'd be good to know what exactly is going wrong.
Reassociation only touches floating point because someone asked me to
make it touch floating point.

It still shouldn't have *this* much of an affect, my guess is it is
triggering some bad behavior elsewhere.




[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread dberlin at dberlin dot org


--- Comment #5 from dberlin at gcc dot gnu dot org  2006-06-03 02:11 ---
Subject: Re:  reassociation pass produces ~30% slower matrix
 multiplication code

steven at gcc dot gnu dot org wrote:
 --- Comment #4 from steven at gcc dot gnu dot org  2006-06-02 23:19 
 ---
 Real bug, despite Andrew's usual portion of x86-hate.
 
 

It'd be good to know what exactly is going wrong.
Reassociation only touches floating point because someone asked me to
make it touch floating point.

It still shouldn't have *this* much of an affect, my guess is it is
triggering some bad behavior elsewhere.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread pinskia at gcc dot gnu dot org


--- Comment #6 from pinskia at gcc dot gnu dot org  2006-06-03 02:38 ---
What reassociation is doing is scheduling the instructions further down
before the use but it exands the life time of some variables.

e.g.:
  D.1563_59 = rA0_49 * rB0_50;
  rC0_0_60 = D.1563_59 + rC0_0_516;
  D.1564_61 = rA1_52 * rB0_50;
  rC1_0_62 = D.1564_61 + rC1_0_517;
  D.1565_63 = rA2_54 * rB0_50;
  rC2_0_64 = D.1565_63 + rC2_0_518;
  D.1566_65 = rA3_56 * rB0_50;
  rC3_0_66 = D.1566_65 + rC3_0_519;
  D.1567_67 = rA4_58 * rB0_50;
  rC4_0_68 = D.1567_67 + rC4_0_520;

into:
  D.1563_59 = rB0_50 * rA0_49;
  D.1564_61 = rA1_52 * rB0_50;
  D.1565_63 = rA2_54 * rB0_50;
  D.1566_65 = rA3_56 * rB0_50;
  D.1567_67 = rA4_58 * rB0_50;
. (with loads, etc here)
 D.1563_477 = rB0_468 * rA0_466;
  rC0_0_60 = D.1563_59 + rC0_0_516;
  rC0_0_82 = rC0_0_60 + D.1563_81;
  rC0_0_104 = rC0_0_82 + D.1563_103;
  rC0_0_126 = rC0_0_104 + D.1563_125;
  rC0_0_148 = rC0_0_126 + D.1563_147;
  rC0_0_170 = rC0_0_148 + D.1563_169;
  rC0_0_192 = rC0_0_170 + D.1563_191;
  rC0_0_214 = rC0_0_192 + D.1563_213;
  rC0_0_236 = rC0_0_214 + D.1563_235;
  rC0_0_258 = rC0_0_236 + D.1563_257;
  rC0_0_280 = rC0_0_258 + D.1563_279;
  rC0_0_302 = rC0_0_280 + D.1563_301;
  rC0_0_324 = rC0_0_302 + D.1563_323;
  rC0_0_346 = rC0_0_324 + D.1563_345;
  rC0_0_368 = rC0_0_346 + D.1563_367;
  rC0_0_390 = rC0_0_368 + D.1563_389;
  rC0_0_412 = rC0_0_390 + D.1563_411;
  rC0_0_434 = rC0_0_412 + D.1563_433;
  rC0_0_456 = rC0_0_434 + D.1563_455;
  rC0_0_478 = rC0_0_456 + D.1563_477;


Which in of itself not supressing and not nothing which reassociate should
handle special.  This is what we get with a semi bad register allocation which
does nothing to reduce spilling.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-02 Thread pinskia at gcc dot gnu dot org


--- Comment #7 from pinskia at gcc dot gnu dot org  2006-06-03 02:49 ---
If you change the code to be integers, this also cause the drop too with
reassociation even without -ffast-math so it is unrelated to the fact
-ffast-math turns on reassociate for floating points for fast math. 

So what is happening is that the add to rC[0-4]_0 is being further down which
causes variable's life to be extended.  Yes there should be a pass at the tree
level which optimizes variable's life time but I don't know how much use that
is without a better register allocator in the first place.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855



[Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code

2006-06-01 Thread pinskia at gcc dot gnu dot org


--- Comment #1 from pinskia at gcc dot gnu dot org  2006-06-01 15:26 ---
There is nothing special about reassociation at all.  In fact what you are
seeing is register allocator going funky.  This what you get with x87.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

  Component|tree-optimization   |target
   Keywords||ra


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27855