------- Comment #11 from jv244 at cam dot ac dot uk 2010-08-29 05:09 ------- After David's patch (thanks!), the testcase requires 240s, that's still a 5x slowdown. I paste the new timing profile below, and reopen the bug. There is no obvious candidate for the slowdown.
> gfortran -c -ftime-report -cpp -fbounds-check -g -O3 -ffast-math > -funroll-loops -ftree-vectorize -march=native -ffree-form test.f90 Execution times (seconds) garbage collection : 12.55 ( 5%) usr 0.03 ( 2%) sys 12.57 ( 5%) wall 0 kB ( 0%) ggc callgraph construction: 0.08 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 5736 kB ( 0%) ggc callgraph optimization: 0.40 ( 0%) usr 0.02 ( 1%) sys 0.41 ( 0%) wall 725 kB ( 0%) ggc ipa cp : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 1347 kB ( 0%) ggc ipa function splitting: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc ipa reference : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa profile : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.07 ( 0%) usr 0.01 ( 1%) sys 0.15 ( 0%) wall 0 kB ( 0%) ggc cfg cleanup : 2.28 ( 1%) usr 0.00 ( 0%) sys 2.35 ( 1%) wall 4726 kB ( 0%) ggc CFG verifier : 5.54 ( 2%) usr 0.03 ( 2%) sys 5.73 ( 2%) wall 0 kB ( 0%) ggc trivially dead code : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.65 ( 0%) wall 0 kB ( 0%) ggc df multiple defs : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 0 kB ( 0%) ggc df reaching defs : 2.00 ( 1%) usr 0.00 ( 0%) sys 2.12 ( 1%) wall 0 kB ( 0%) ggc df live regs : 9.80 ( 4%) usr 0.01 ( 1%) sys 10.18 ( 4%) wall 0 kB ( 0%) ggc df live&initialized regs: 3.62 ( 1%) usr 0.00 ( 0%) sys 3.08 ( 1%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 1.22 ( 0%) usr 0.00 ( 0%) sys 1.26 ( 1%) wall 0 kB ( 0%) ggc df live reg subwords : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 4.67 ( 2%) usr 0.00 ( 0%) sys 4.44 ( 2%) wall 8317 kB ( 0%) ggc register information : 2.10 ( 1%) usr 0.00 ( 0%) sys 1.97 ( 1%) wall 0 kB ( 0%) ggc alias analysis : 1.73 ( 1%) usr 0.00 ( 0%) sys 1.87 ( 1%) wall 47018 kB ( 3%) ggc alias stmt walking : 0.61 ( 0%) usr 0.07 ( 4%) sys 0.61 ( 0%) wall 6938 kB ( 0%) ggc register scan : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 202 kB ( 0%) ggc rebuild jump labels : 0.72 ( 0%) usr 0.00 ( 0%) sys 0.67 ( 0%) wall 0 kB ( 0%) ggc parser : 0.90 ( 0%) usr 0.09 ( 5%) sys 0.99 ( 0%) wall 55368 kB ( 3%) ggc inline heuristics : 0.17 ( 0%) usr 0.01 ( 1%) sys 0.26 ( 0%) wall 0 kB ( 0%) ggc tree gimplify : 0.51 ( 0%) usr 0.01 ( 1%) sys 0.57 ( 0%) wall 48405 kB ( 3%) ggc tree eh : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.02 ( 0%) usr 0.01 ( 1%) sys 0.03 ( 0%) wall 11974 kB ( 1%) ggc tree CFG cleanup : 1.30 ( 1%) usr 0.02 ( 1%) sys 1.21 ( 0%) wall 3530 kB ( 0%) ggc tree VRP : 2.50 ( 1%) usr 0.03 ( 2%) sys 2.44 ( 1%) wall 67364 kB ( 4%) ggc tree copy propagation : 0.16 ( 0%) usr 0.05 ( 3%) sys 0.15 ( 0%) wall 1384 kB ( 0%) ggc tree find ref. vars : 0.05 ( 0%) usr 0.01 ( 1%) sys 0.05 ( 0%) wall 3806 kB ( 0%) ggc tree PTA : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 5198 kB ( 0%) ggc tree PHI insertion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 3194 kB ( 0%) ggc tree SSA rewrite : 0.39 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 14011 kB ( 1%) ggc tree SSA other : 0.10 ( 0%) usr 0.04 ( 2%) sys 0.10 ( 0%) wall 432 kB ( 0%) ggc tree SSA incremental : 1.18 ( 0%) usr 0.14 ( 8%) sys 1.44 ( 1%) wall 7441 kB ( 0%) ggc tree operand scan : 0.47 ( 0%) usr 0.33 (19%) sys 0.78 ( 0%) wall 58289 kB ( 3%) ggc dominator optimization: 0.52 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall 8527 kB ( 0%) ggc tree SRA : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CCP : 1.05 ( 0%) usr 0.05 ( 3%) sys 1.28 ( 1%) wall 4845 kB ( 0%) ggc tree PHI const/copy prop: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 106 kB ( 0%) ggc tree split crit edges : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 2014 kB ( 0%) ggc tree reassociation : 0.27 ( 0%) usr 0.03 ( 2%) sys 0.27 ( 0%) wall 6030 kB ( 0%) ggc tree PRE : 0.85 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 7164 kB ( 0%) ggc tree FRE : 0.47 ( 0%) usr 0.02 ( 1%) sys 0.56 ( 0%) wall 5411 kB ( 0%) ggc tree code sinking : 0.11 ( 0%) usr 0.02 ( 1%) sys 0.03 ( 0%) wall 1311 kB ( 0%) ggc tree linearize phis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree forward propagate: 0.22 ( 0%) usr 0.02 ( 1%) sys 0.26 ( 0%) wall 11820 kB ( 1%) ggc tree phiprop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 0.11 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall 576 kB ( 0%) ggc tree aggressive DCE : 0.84 ( 0%) usr 0.01 ( 1%) sys 0.92 ( 0%) wall 25495 kB ( 1%) ggc tree buildin call DCE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 260 kB ( 0%) ggc tree loop bounds : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 6686 kB ( 0%) ggc tree loop invariant motion: 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 76 kB ( 0%) ggc tree canonical iv : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 3421 kB ( 0%) ggc scev constant prop : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 2302 kB ( 0%) ggc tree loop unswitching : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 739 kB ( 0%) ggc complete unrolling : 1.60 ( 1%) usr 0.12 ( 7%) sys 1.40 ( 1%) wall 101520 kB ( 6%) ggc tree vectorization : 0.31 ( 0%) usr 0.02 ( 1%) sys 0.27 ( 0%) wall 20116 kB ( 1%) ggc tree slp vectorization: 0.92 ( 0%) usr 0.00 ( 0%) sys 0.90 ( 0%) wall 52747 kB ( 3%) ggc tree loop distribution: 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc tree prefetching : 3.09 ( 1%) usr 0.06 ( 3%) sys 3.07 ( 1%) wall 90905 kB ( 5%) ggc tree iv optimization : 32.77 (13%) usr 0.03 ( 2%) sys 32.96 (13%) wall 322284 kB (18%) ggc predictive commoning : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 1752 kB ( 0%) ggc tree loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 1307 kB ( 0%) ggc tree loop fini : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc tree copy headers : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 1658 kB ( 0%) ggc tree SSA uncprop : 0.04 ( 0%) usr 0.01 ( 1%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree rename SSA copies: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc tree SSA verifier : 7.34 ( 3%) usr 0.02 ( 1%) sys 7.29 ( 3%) wall 0 kB ( 0%) ggc tree STMT verifier : 15.08 ( 6%) usr 0.00 ( 0%) sys 15.11 ( 6%) wall 0 kB ( 0%) ggc callgraph verifier : 0.85 ( 0%) usr 0.00 ( 0%) sys 0.87 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 0 kB ( 0%) ggc control dependences : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc out of ssa : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 225 kB ( 0%) ggc expand vars : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 11294 kB ( 1%) ggc expand : 14.67 ( 6%) usr 0.04 ( 2%) sys 13.89 ( 6%) wall 111424 kB ( 6%) ggc post expand cleanups : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 5818 kB ( 0%) ggc lower subreg : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc forward prop : 1.58 ( 1%) usr 0.00 ( 0%) sys 1.45 ( 1%) wall 15809 kB ( 1%) ggc CSE : 1.60 ( 1%) usr 0.00 ( 0%) sys 1.73 ( 1%) wall 662 kB ( 0%) ggc dead code elimination : 1.72 ( 1%) usr 0.00 ( 0%) sys 1.77 ( 1%) wall 0 kB ( 0%) ggc dead store elim1 : 1.36 ( 1%) usr 0.01 ( 1%) sys 1.29 ( 1%) wall 23524 kB ( 1%) ggc dead store elim2 : 2.01 ( 1%) usr 0.00 ( 0%) sys 2.10 ( 1%) wall 22835 kB ( 1%) ggc loop analysis : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 2220 kB ( 0%) ggc loop invariant motion : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall 448 kB ( 0%) ggc loop unswitching : 5.19 ( 2%) usr 0.01 ( 1%) sys 5.40 ( 2%) wall 218 kB ( 0%) ggc loop unrolling : 26.07 (11%) usr 0.02 ( 1%) sys 25.98 (11%) wall 184992 kB (10%) ggc CPROP : 2.20 ( 1%) usr 0.00 ( 0%) sys 2.48 ( 1%) wall 25399 kB ( 1%) ggc PRE : 1.33 ( 1%) usr 0.00 ( 0%) sys 1.25 ( 1%) wall 1798 kB ( 0%) ggc web : 2.26 ( 1%) usr 0.00 ( 0%) sys 2.29 ( 1%) wall 8429 kB ( 0%) ggc CSE 2 : 2.07 ( 1%) usr 0.01 ( 1%) sys 2.30 ( 1%) wall 2123 kB ( 0%) ggc branch prediction : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 6857 kB ( 0%) ggc combiner : 4.11 ( 2%) usr 0.00 ( 0%) sys 4.21 ( 2%) wall 60529 kB ( 3%) ggc if-conversion : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall 2520 kB ( 0%) ggc regmove : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.80 ( 0%) wall 0 kB ( 0%) ggc integrated RA : 14.33 ( 6%) usr 0.05 ( 3%) sys 14.22 ( 6%) wall 44292 kB ( 2%) ggc reload : 6.75 ( 3%) usr 0.00 ( 0%) sys 6.74 ( 3%) wall 10065 kB ( 1%) ggc reload CSE regs : 4.55 ( 2%) usr 0.01 ( 1%) sys 4.67 ( 2%) wall 36964 kB ( 2%) ggc load CSE after reload : 0.36 ( 0%) usr 0.01 ( 1%) sys 0.45 ( 0%) wall 449 kB ( 0%) ggc zee : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 45 kB ( 0%) ggc thread pro- & epilogue: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 3988 kB ( 0%) ggc if-conversion 2 : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 1056 kB ( 0%) ggc combine stack adjustments: 0.15 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 0 kB ( 0%) ggc peephole 2 : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 2995 kB ( 0%) ggc rename registers : 1.23 ( 1%) usr 0.00 ( 0%) sys 1.31 ( 1%) wall 2741 kB ( 0%) ggc hard reg cprop : 1.23 ( 1%) usr 0.02 ( 1%) sys 1.11 ( 0%) wall 15 kB ( 0%) ggc scheduling 2 : 6.25 ( 3%) usr 0.04 ( 2%) sys 6.24 ( 3%) wall 1284 kB ( 0%) ggc machine dep reorg : 0.82 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 77 kB ( 0%) ggc reorder blocks : 0.68 ( 0%) usr 0.00 ( 0%) sys 0.73 ( 0%) wall 4788 kB ( 0%) ggc final : 1.86 ( 1%) usr 0.08 ( 5%) sys 2.10 ( 1%) wall 9656 kB ( 1%) ggc symout : 0.65 ( 0%) usr 0.06 ( 3%) sys 0.69 ( 0%) wall 58849 kB ( 3%) ggc variable tracking : 2.74 ( 1%) usr 0.00 ( 0%) sys 2.83 ( 1%) wall 62059 kB ( 3%) ggc var-tracking dataflow : 4.21 ( 2%) usr 0.01 ( 1%) sys 4.24 ( 2%) wall 0 kB ( 0%) ggc var-tracking emit : 3.79 ( 2%) usr 0.01 ( 1%) sys 3.58 ( 1%) wall 19142 kB ( 1%) ggc TOTAL : 244.77 1.72 246.49 1780321 kB Extra diagnostic checks enabled; compiler may run slowly. -- jv244 at cam dot ac dot uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | Summary|[4.6 Regression] compile |[4.6 Regression] compile |time increases 8x. |time increases 5x. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422