[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

--- Comment #7 from Richard Guenther  2011-06-27 
10:28:45 UTC ---
Author: rguenth
Date: Mon Jun 27 10:28:39 2011
New Revision: 175474

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175474
Log:
2011-06-27  Richard Guenther  

PR tree-optimization/49365
* params.def (min-insn-to-prefetch-ratio): Reduce from 10 to 9.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/params.def


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

Richard Guenther  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.7.0

--- Comment #8 from Richard Guenther  2011-06-27 
10:29:03 UTC ---
Fixed.


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-22 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

Richard Guenther  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 AssignedTo|unassigned at gcc dot   |rguenth at gcc dot gnu.org
   |gnu.org |

--- Comment #6 from Richard Guenther  2011-06-22 
14:13:14 UTC ---
I have posted a patch.


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-14 Thread changpeng.fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

--- Comment #5 from Changpeng Fang  2011-06-14 
22:22:11 UTC ---
It seems there is a prefetch generation bug on Bulldozer.

With -O3 -ffast-math -funroll-loops -fpeel-loops -march=bdver1
-fprefetch-loop-arrays, I got a normal timing of 795s.

However, when "--param min-insn-to-prefetch-ratio=9" is added, the timing
becomes 2853s.

This may be a different bug, in the opposite direction to amdfam10

I also want to mention here that software prefetching was actually enabled
at -O3 and higher for Bulldozer, when Honza cleaned up the code in i386.c
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00573.html


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-14 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

Richard Guenther  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011.06.14 10:49:14
 CC||changpeng.fang at amd dot
   ||com
 Ever Confirmed|0   |1

--- Comment #4 from Richard Guenther  2011-06-14 
10:49:14 UTC ---
Indeed, for the important loop in StaggeredLeapfrog2.F we now have

 Ahead 1, unroll factor 1, trip count -1
 insn count 919, mem ref count 100, prefetch count 100
 Not prefetching -- instruction to prefetch ratio (9) too small

while before the patch we had

 insn count 1019, mem ref count 100, prefetch count 100

as we now have half the cost for the vectorized mem-refs (100 instead of 200).

Building with --param min-insn-to-prefetch-ratio=9 fixes it.


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-10 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

--- Comment #3 from Richard Guenther  2011-06-10 
15:36:49 UTC ---
I'm trying to get my hands on it.  Most code differences betweeen good and
bad rev. appear in loop array prefetching.  Before aprefetch dumps differ only
for datestamp.c, PUGH/SetupPGV.c and regex.c.

I'm trying binaries with -fno-prefetch-loop-arrays now (well, on Monday
that is).

Prefetching uses tree_num_loop_insns which uses estimate_num_insns. 
Prefetching
is enabled by default for barcelona (but also for K8 where I don't see this
issue).  So my bet is on prefetching costs getting confused and need
adjustment.


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-10 Thread hjl.tools at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

H.J. Lu  changed:

   What|Removed |Added

 CC|hjl at gcc dot gnu.org  |hjl.tools at gmail dot com,
   ||sergos.gnu at gmail dot com

--- Comment #2 from H.J. Lu  2011-06-10 15:21:24 
UTC ---
What is the problem?


[Bug tree-optimization/49365] 436.cactusADM performance regression

2011-06-10 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365

Richard Guenther  changed:

   What|Removed |Added

 CC||hjl at gcc dot gnu.org
  Known to fail||4.6.1, 4.7.0

--- Comment #1 from Richard Guenther  2011-06-10 
15:11:50 UTC ---
Bisecting this shows that rev. 166552 is the cause.

2010-11-10  H.J. Lu  

   PR tree-optimization/46414
   * tree-inline.c (estimate_move_cost): Check preferred vector
   mode for vector type.

The bug doesn't manifest itself on K8 or iCore7 nor does it show up
with the default arch and generic tuning.