[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 --- Comment #7 from Richard Guenther 2011-06-27 10:28:45 UTC --- Author: rguenth Date: Mon Jun 27 10:28:39 2011 New Revision: 175474 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175474 Log: 2011-06-27 Richard Guenther PR tree-optimization/49365 * params.def (min-insn-to-prefetch-ratio): Reduce from 10 to 9. Modified: trunk/gcc/ChangeLog trunk/gcc/params.def
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 Richard Guenther changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED Target Milestone|--- |4.7.0 --- Comment #8 from Richard Guenther 2011-06-27 10:29:03 UTC --- Fixed.
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 Richard Guenther changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org |gnu.org | --- Comment #6 from Richard Guenther 2011-06-22 14:13:14 UTC --- I have posted a patch.
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 --- Comment #5 from Changpeng Fang 2011-06-14 22:22:11 UTC --- It seems there is a prefetch generation bug on Bulldozer. With -O3 -ffast-math -funroll-loops -fpeel-loops -march=bdver1 -fprefetch-loop-arrays, I got a normal timing of 795s. However, when "--param min-insn-to-prefetch-ratio=9" is added, the timing becomes 2853s. This may be a different bug, in the opposite direction to amdfam10 I also want to mention here that software prefetching was actually enabled at -O3 and higher for Bulldozer, when Honza cleaned up the code in i386.c http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00573.html
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 Richard Guenther changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011.06.14 10:49:14 CC||changpeng.fang at amd dot ||com Ever Confirmed|0 |1 --- Comment #4 from Richard Guenther 2011-06-14 10:49:14 UTC --- Indeed, for the important loop in StaggeredLeapfrog2.F we now have Ahead 1, unroll factor 1, trip count -1 insn count 919, mem ref count 100, prefetch count 100 Not prefetching -- instruction to prefetch ratio (9) too small while before the patch we had insn count 1019, mem ref count 100, prefetch count 100 as we now have half the cost for the vectorized mem-refs (100 instead of 200). Building with --param min-insn-to-prefetch-ratio=9 fixes it.
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 --- Comment #3 from Richard Guenther 2011-06-10 15:36:49 UTC --- I'm trying to get my hands on it. Most code differences betweeen good and bad rev. appear in loop array prefetching. Before aprefetch dumps differ only for datestamp.c, PUGH/SetupPGV.c and regex.c. I'm trying binaries with -fno-prefetch-loop-arrays now (well, on Monday that is). Prefetching uses tree_num_loop_insns which uses estimate_num_insns. Prefetching is enabled by default for barcelona (but also for K8 where I don't see this issue). So my bet is on prefetching costs getting confused and need adjustment.
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 H.J. Lu changed: What|Removed |Added CC|hjl at gcc dot gnu.org |hjl.tools at gmail dot com, ||sergos.gnu at gmail dot com --- Comment #2 from H.J. Lu 2011-06-10 15:21:24 UTC --- What is the problem?
[Bug tree-optimization/49365] 436.cactusADM performance regression
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49365 Richard Guenther changed: What|Removed |Added CC||hjl at gcc dot gnu.org Known to fail||4.6.1, 4.7.0 --- Comment #1 from Richard Guenther 2011-06-10 15:11:50 UTC --- Bisecting this shows that rev. 166552 is the cause. 2010-11-10 H.J. Lu PR tree-optimization/46414 * tree-inline.c (estimate_move_cost): Check preferred vector mode for vector type. The bug doesn't manifest itself on K8 or iCore7 nor does it show up with the default arch and generic tuning.