https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[14/16 Regression] long |[14 Regression] long
|compile time with -O2 and |compile time with -O2 and
|many loops |many loops
--- Comment #61 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
It seems this is not a 16 regression after all, must have been
frequency changes etc on the machine I ran this on. This time,
trunk was actually a bit faster. Adjusting subject accordingly.
For trunk:
$ time trunk-nocheck/gcc/f951 -O2 gener.f90
MAIN__ main
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 114M} <visibility> {heap 114M} <build_ssa_passes>
{heap 114M} <targetclone> {heap 114M} <opt_local_passes> {heap 114M} {GC
released 28M} {GC 260M -> 152M} <remove_symbols> {heap 265M} <targetclone>
{heap 265M} <free-fnsummary> {heap 265M} <increase_alignment> {heap
265M}Streaming LTO
<whole-program> {heap 265M} <profile_estimate> {heap 265M} <icf> {heap 265M}
<devirt> {heap 265M} <cp> {heap 265M} <sra> {heap 265M} <fnsummary> {heap 265M}
<inline> {heap 265M} <pure-const> {heap 265M} <modref> {heap 265M}
<free-fnsummary> {heap 265M} <static-var> {heap 265M} <single-use> {heap 265M}
<comdats> {heap 265M}Assembling functions:
MAIN__ {GC released 42M} {GC 415M -> 273M} {GC released 27M madv_dontneed 12M}
{GC 575M -> 204M} {GC madv_dontneed 128M} {GC 417M -> 332M} main
Time variable wall GGC
phase setup : 0.01 ( 0%) 187k ( 0%)
phase parsing : 1.88 ( 0%) 72M ( 4%)
phase lang. deferred : 0.01 ( 0%) 0 ( 0%)
phase opt and generate :2288.94 (100%) 1532M ( 95%)
garbage collection : 0.49 ( 0%) 0 ( 0%)
callgraph construction : 0.15 ( 0%) 57M ( 4%)
callgraph optimization : 0.11 ( 0%) 0 ( 0%)
callgraph functions expansion :2263.07 ( 99%) 1234M ( 77%)
callgraph ipa passes : 25.31 ( 1%) 133M ( 8%)
ipa function summary : 0.23 ( 0%) 6033k ( 0%)
ipa cp : 0.18 ( 0%) 15M ( 1%)
ipa inlining heuristics : 0.01 ( 0%) 0 ( 0%)
ipa pure const : 0.06 ( 0%) 1072 ( 0%)
ipa icf : 0.06 ( 0%) 24 ( 0%)
ipa SRA : 0.03 ( 0%) 584 ( 0%)
ipa modref : 0.06 ( 0%) 1600 ( 0%)
cfg construction : 0.14 ( 0%) 11M ( 1%)
cfg cleanup : 1.02 ( 0%) 5441k ( 0%)
trivially dead code : 0.43 ( 0%) 0 ( 0%)
df scan insns : 0.31 ( 0%) 192 ( 0%)
df reaching defs : 73.20 ( 3%) 0 ( 0%)
df live regs : 3.60 ( 0%) 0 ( 0%)
df live&initialized regs : 1.56 ( 0%) 0 ( 0%)
df must-initialized regs : 0.07 ( 0%) 0 ( 0%)
df use-def / def-use chains : 0.79 ( 0%) 0 ( 0%)
df live reg subwords : 0.20 ( 0%) 0 ( 0%)
df reg dead/unused notes : 1.92 ( 0%) 14M ( 1%)
register information : 0.43 ( 0%) 0 ( 0%)
alias analysis : 1.05 ( 0%) 43M ( 3%)
alias stmt walking : 36.35 ( 2%) 11M ( 1%)
register scan : 0.06 ( 0%) 168k ( 0%)
rebuild jump labels : 0.14 ( 0%) 0 ( 0%)
parser (global) : 1.89 ( 0%) 72M ( 4%)
inline parameters : 0.20 ( 0%) 2049k ( 0%)
integration : 0.01 ( 0%) 0 ( 0%)
tree gimplify : 0.29 ( 0%) 61M ( 4%)
tree eh : 0.02 ( 0%) 0 ( 0%)
tree CFG construction : 0.10 ( 0%) 53M ( 3%)
tree CFG cleanup : 4.55 ( 0%) 5898k ( 0%)
tree tail merge : 0.05 ( 0%) 3670k ( 0%)
tree VRP : 25.19 ( 1%) 34M ( 2%)
tree Early VRP : 2.18 ( 0%) 14M ( 1%)
tree copy propagation : 0.21 ( 0%) 168k ( 0%)
tree PTA : 14.92 ( 1%) 9434k ( 1%)
tree SSA rewrite : 0.09 ( 0%) 28M ( 2%)
tree SSA incremental : 0.72 ( 0%) 46M ( 3%)
tree operand scan : 0.24 ( 0%) 53M ( 3%)
dominator optimization : 34.88 ( 2%) 93M ( 6%)
backwards jump threading : 5.12 ( 0%) 4544 ( 0%)
tree SRA : 2.99 ( 0%) 75M ( 5%)
isolate eroneous paths : 0.01 ( 0%) 0 ( 0%)
tree CCP : 2.10 ( 0%) 4177k ( 0%)
tree reassociation : 0.07 ( 0%) 0 ( 0%)
tree PRE : 3.24 ( 0%) 15M ( 1%)
tree FRE : 11.51 ( 1%) 23M ( 1%)
tree RPO VN : 0.02 ( 0%) 1096k ( 0%)
tree code sinking : 0.10 ( 0%) 3037k ( 0%)
tree linearize phis : 0.07 ( 0%) 2144 ( 0%)
tree backward propagate : 0.04 ( 0%) 0 ( 0%)
tree forward propagate : 0.41 ( 0%) 12M ( 1%)
tree phiprop : 0.01 ( 0%) 0 ( 0%)
tree conservative DCE : 0.22 ( 0%) 0 ( 0%)
tree aggressive DCE : 0.18 ( 0%) 2307k ( 0%)
tree DSE : 0.67 ( 0%) 21M ( 1%)
PHI merge : 0.01 ( 0%) 0 ( 0%)
tree loop invariant motion : 0.08 ( 0%) 0 ( 0%)
tree canonical iv : 0.11 ( 0%) 6548k ( 0%)
scev constant prop : 0.04 ( 0%) 0 ( 0%)
complete unrolling : 6.40 ( 0%) 89M ( 6%)
tree vectorization : 0.68 ( 0%) 17M ( 1%)
tree slp vectorization : 0.43 ( 0%) 38M ( 2%)
tree loop distribution : 0.01 ( 0%) 0 ( 0%)
tree iv optimization : 1.36 ( 0%) 109M ( 7%)
predictive commoning : 0.09 ( 0%) 4134k ( 0%)
tree copy headers : 5.97 ( 0%) 7394k ( 0%)
tree SSA uncprop : 0.02 ( 0%) 0 ( 0%)
gimple widening/fma detection : 0.02 ( 0%) 0 ( 0%)
tree strlen optimization : 0.05 ( 0%) 10k ( 0%)
tree modref : 0.04 ( 0%) 688 ( 0%)
dominance frontiers : 0.02 ( 0%) 0 ( 0%)
dominance computation : 0.76 ( 0%) 0 ( 0%)
control dependences : 0.01 ( 0%) 0 ( 0%)
out of ssa : 0.13 ( 0%) 192 ( 0%)
expand vars : 0.10 ( 0%) 4014k ( 0%)
expand : 0.75 ( 0%) 122M ( 8%)
post expand cleanups : 0.05 ( 0%) 792 ( 0%)
lower subreg : 0.12 ( 0%) 0 ( 0%)
forward prop : 0.77 ( 0%) 506k ( 0%)
CSE : 1.29 ( 0%) 12M ( 1%)
dead code elimination : 0.94 ( 0%) 0 ( 0%)
dead store elim1 : 0.47 ( 0%) 8023k ( 0%)
dead store elim2 : 0.48 ( 0%) 10M ( 1%)
loop init : 2.47 ( 0%) 42M ( 3%)
loop invariant motion : 1.61 ( 0%) 448 ( 0%)
loop unrolling : 470.88 ( 21%) 55M ( 3%)
loop doloop : 1.61 ( 0%) 584 ( 0%)
loop fini : 0.03 ( 0%) 0 ( 0%)
CPROP : 0.05 ( 0%) 0 ( 0%)
PRE : 0.01 ( 0%) 0 ( 0%)
web : 0.23 ( 0%) 1265k ( 0%)
auto inc dec : 1.15 ( 0%) 1349k ( 0%)
CSE 2 : 1.31 ( 0%) 7578k ( 0%)
branch prediction : 0.21 ( 0%) 3701k ( 0%)
combiner : 2.04 ( 0%) 30M ( 2%)
if-conversion : 0.05 ( 0%) 0 ( 0%)
scheduling :1540.03 ( 67%) 12M ( 1%)
integrated RA : 2.61 ( 0%) 103M ( 6%)
LRA non-specific : 1.88 ( 0%) 10209k ( 1%)
LRA virtuals elimination : 0.19 ( 0%) 4563k ( 0%)
LRA create live ranges : 0.23 ( 0%) 2784k ( 0%)
LRA hard reg assignment : 0.17 ( 0%) 0 ( 0%)
reload : 0.01 ( 0%) 48 ( 0%)
reload CSE regs : 1.38 ( 0%) 17M ( 1%)
ree : 0.13 ( 0%) 280 ( 0%)
thread pro- & epilogue : 0.65 ( 0%) 6480 ( 0%)
if-conversion 2 : 0.04 ( 0%) 0 ( 0%)
peephole 2 : 0.14 ( 0%) 0 ( 0%)
hard reg cprop : 0.27 ( 0%) 253k ( 0%)
scheduling 2 : 1.73 ( 0%) 506k ( 0%)
reorder blocks : 0.18 ( 0%) 5357k ( 0%)
shorten branches : 0.11 ( 0%) 0 ( 0%)
final : 0.66 ( 0%) 55M ( 3%)
tree if-combine : 0.01 ( 0%) 0 ( 0%)
if to switch conversion : 0.02 ( 0%) 0 ( 0%)
straight-line strength reduction : 0.04 ( 0%) 0 ( 0%)
store merging : 0.07 ( 0%) 2977k ( 0%)
tree loop if-conversion : 0.07 ( 0%) 3965k ( 0%)
access analysis : 0.77 ( 0%) 1296 ( 0%)
fold mem offsets : 0.09 ( 0%) 613k ( 0%)
rest of compilation : 1.07 ( 0%) 10109k ( 1%)
remove unused locals : 0.26 ( 0%) 0 ( 0%)
address taken : 0.16 ( 0%) 0 ( 0%)
rebuild frequencies : 0.01 ( 0%) 0 ( 0%)
TOTAL :2290.85 1605M
real 38m10.864s
user 37m22.113s
sys 0m48.719s
gcc 15:
$ time 15-bin/gcc/f951 -O2 gener.f90
MAIN__ main
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 114M} <visibility> {heap 114M} <build_ssa_passes>
{heap 114M} <targetclone> {heap 114M} <opt_local_passes> {heap 114M} {GC
released 28M} {GC 260M -> 152M} <remove_symbols> {heap 265M} <targetclone>
{heap 265M} <free-fnsummary> {heap 265M} <increase_alignment> {heap
265M}Streaming LTO
<whole-program> {heap 265M} <profile_estimate> {heap 265M} <icf> {heap 265M}
<devirt> {heap 265M} <cp> {heap 265M} <sra> {heap 265M} <fnsummary> {heap 265M}
<inline> {heap 265M} <pure-const> {heap 265M} <modref> {heap 265M}
<free-fnsummary> {heap 265M} <static-var> {heap 265M} <single-use> {heap 265M}
<comdats> {heap 265M}Assembling functions:
MAIN__ {GC released 42M} {GC 415M -> 273M} {GC released 27M madv_dontneed 12M}
{GC 575M -> 204M} {GC madv_dontneed 128M} {GC 417M -> 332M} main
Time variable wall GGC
phase setup : 0.01 ( 0%) 187k ( 0%)
phase parsing : 1.77 ( 0%) 72M ( 4%)
phase lang. deferred : 0.01 ( 0%) 0 ( 0%)
phase opt and generate :2448.70 (100%) 1532M ( 95%)
garbage collection : 0.49 ( 0%) 0 ( 0%)
callgraph construction : 0.15 ( 0%) 57M ( 4%)
callgraph optimization : 0.11 ( 0%) 0 ( 0%)
callgraph functions expansion :2422.09 ( 99%) 1234M ( 77%)
callgraph ipa passes : 26.09 ( 1%) 133M ( 8%)
ipa function summary : 0.23 ( 0%) 6033k ( 0%)
ipa cp : 0.18 ( 0%) 15M ( 1%)
ipa inlining heuristics : 0.01 ( 0%) 0 ( 0%)
ipa pure const : 0.06 ( 0%) 1072 ( 0%)
ipa icf : 0.06 ( 0%) 24 ( 0%)
ipa SRA : 0.03 ( 0%) 584 ( 0%)
ipa modref : 0.06 ( 0%) 1600 ( 0%)
cfg construction : 0.15 ( 0%) 11M ( 1%)
cfg cleanup : 1.05 ( 0%) 5441k ( 0%)
trivially dead code : 0.42 ( 0%) 0 ( 0%)
df scan insns : 0.31 ( 0%) 192 ( 0%)
df reaching defs : 69.75 ( 3%) 0 ( 0%)
df live regs : 3.73 ( 0%) 0 ( 0%)
df live&initialized regs : 1.67 ( 0%) 0 ( 0%)
df must-initialized regs : 0.07 ( 0%) 0 ( 0%)
df use-def / def-use chains : 0.84 ( 0%) 0 ( 0%)
df live reg subwords : 0.20 ( 0%) 0 ( 0%)
df reg dead/unused notes : 1.97 ( 0%) 14M ( 1%)
register information : 0.44 ( 0%) 0 ( 0%)
alias analysis : 1.07 ( 0%) 43M ( 3%)
alias stmt walking : 36.73 ( 1%) 11M ( 1%)
register scan : 0.06 ( 0%) 168k ( 0%)
rebuild jump labels : 0.14 ( 0%) 0 ( 0%)
parser (global) : 1.77 ( 0%) 72M ( 4%)
inline parameters : 0.20 ( 0%) 2049k ( 0%)
integration : 0.01 ( 0%) 0 ( 0%)
tree gimplify : 0.28 ( 0%) 61M ( 4%)
tree eh : 0.02 ( 0%) 0 ( 0%)
tree CFG construction : 0.09 ( 0%) 53M ( 3%)
tree CFG cleanup : 4.55 ( 0%) 5898k ( 0%)
tree tail merge : 0.05 ( 0%) 3670k ( 0%)
tree VRP : 25.81 ( 1%) 34M ( 2%)
tree Early VRP : 2.21 ( 0%) 14M ( 1%)
tree copy propagation : 0.23 ( 0%) 168k ( 0%)
tree PTA : 14.72 ( 1%) 9434k ( 1%)
tree SSA rewrite : 0.08 ( 0%) 28M ( 2%)
tree SSA incremental : 0.76 ( 0%) 46M ( 3%)
tree operand scan : 0.24 ( 0%) 53M ( 3%)
dominator optimization : 35.04 ( 1%) 93M ( 6%)
backwards jump threading : 4.98 ( 0%) 4544 ( 0%)
tree SRA : 3.06 ( 0%) 75M ( 5%)
isolate eroneous paths : 0.01 ( 0%) 0 ( 0%)
tree CCP : 2.12 ( 0%) 4177k ( 0%)
tree reassociation : 0.07 ( 0%) 0 ( 0%)
tree PRE : 2.96 ( 0%) 15M ( 1%)
tree FRE : 11.90 ( 0%) 23M ( 1%)
tree RPO VN : 0.02 ( 0%) 1096k ( 0%)
tree code sinking : 0.10 ( 0%) 3037k ( 0%)
tree linearize phis : 0.08 ( 0%) 2144 ( 0%)
tree backward propagate : 0.04 ( 0%) 0 ( 0%)
tree forward propagate : 0.41 ( 0%) 12M ( 1%)
tree phiprop : 0.01 ( 0%) 0 ( 0%)
tree conservative DCE : 0.23 ( 0%) 0 ( 0%)
tree aggressive DCE : 0.22 ( 0%) 2307k ( 0%)
tree DSE : 0.70 ( 0%) 21M ( 1%)
PHI merge : 0.01 ( 0%) 0 ( 0%)
tree loop invariant motion : 0.08 ( 0%) 0 ( 0%)
tree canonical iv : 0.13 ( 0%) 6548k ( 0%)
scev constant prop : 0.04 ( 0%) 0 ( 0%)
complete unrolling : 6.26 ( 0%) 89M ( 6%)
tree vectorization : 0.70 ( 0%) 17M ( 1%)
tree slp vectorization : 0.46 ( 0%) 38M ( 2%)
tree loop distribution : 0.02 ( 0%) 0 ( 0%)
tree iv optimization : 1.39 ( 0%) 109M ( 7%)
predictive commoning : 0.09 ( 0%) 4134k ( 0%)
tree copy headers : 7.94 ( 0%) 7394k ( 0%)
tree SSA uncprop : 0.02 ( 0%) 0 ( 0%)
gimple widening/fma detection : 0.02 ( 0%) 0 ( 0%)
tree strlen optimization : 0.06 ( 0%) 10k ( 0%)
tree modref : 0.05 ( 0%) 688 ( 0%)
dominance frontiers : 0.02 ( 0%) 0 ( 0%)
dominance computation : 0.83 ( 0%) 0 ( 0%)
control dependences : 0.01 ( 0%) 0 ( 0%)
out of ssa : 0.14 ( 0%) 192 ( 0%)
expand vars : 0.10 ( 0%) 4014k ( 0%)
expand : 0.76 ( 0%) 122M ( 8%)
post expand cleanups : 0.05 ( 0%) 792 ( 0%)
lower subreg : 0.14 ( 0%) 0 ( 0%)
forward prop : 0.76 ( 0%) 506k ( 0%)
CSE : 1.32 ( 0%) 12M ( 1%)
dead code elimination : 0.93 ( 0%) 0 ( 0%)
dead store elim1 : 0.47 ( 0%) 8023k ( 0%)
dead store elim2 : 0.48 ( 0%) 10M ( 1%)
loop init : 2.48 ( 0%) 42M ( 3%)
loop invariant motion : 1.68 ( 0%) 448 ( 0%)
loop unrolling : 469.90 ( 19%) 55M ( 3%)
loop doloop : 1.66 ( 0%) 584 ( 0%)
loop fini : 0.03 ( 0%) 0 ( 0%)
CPROP : 0.05 ( 0%) 0 ( 0%)
PRE : 0.01 ( 0%) 0 ( 0%)
web : 0.23 ( 0%) 1265k ( 0%)
auto inc dec : 1.13 ( 0%) 1349k ( 0%)
CSE 2 : 1.32 ( 0%) 7578k ( 0%)
branch prediction : 0.22 ( 0%) 3701k ( 0%)
combiner : 2.02 ( 0%) 30M ( 2%)
if-conversion : 0.05 ( 0%) 0 ( 0%)
scheduling :1700.22 ( 69%) 12M ( 1%)
integrated RA : 2.74 ( 0%) 103M ( 6%)
LRA non-specific : 1.88 ( 0%) 10209k ( 1%)
LRA virtuals elimination : 0.20 ( 0%) 4563k ( 0%)
LRA create live ranges : 0.23 ( 0%) 2784k ( 0%)
LRA hard reg assignment : 0.17 ( 0%) 0 ( 0%)
reload : 0.01 ( 0%) 48 ( 0%)
reload CSE regs : 1.38 ( 0%) 17M ( 1%)
ree : 0.14 ( 0%) 280 ( 0%)
thread pro- & epilogue : 0.67 ( 0%) 6480 ( 0%)
if-conversion 2 : 0.04 ( 0%) 0 ( 0%)
peephole 2 : 0.14 ( 0%) 0 ( 0%)
hard reg cprop : 0.28 ( 0%) 253k ( 0%)
scheduling 2 : 1.70 ( 0%) 506k ( 0%)
reorder blocks : 0.19 ( 0%) 5357k ( 0%)
shorten branches : 0.12 ( 0%) 0 ( 0%)
final : 0.67 ( 0%) 55M ( 3%)
tree if-combine : 0.01 ( 0%) 0 ( 0%)
if to switch conversion : 0.02 ( 0%) 0 ( 0%)
straight-line strength reduction : 0.04 ( 0%) 0 ( 0%)
store merging : 0.07 ( 0%) 2977k ( 0%)
tree loop if-conversion : 0.07 ( 0%) 3965k ( 0%)
access analysis : 0.79 ( 0%) 1296 ( 0%)
fold mem offsets : 0.09 ( 0%) 613k ( 0%)
rest of compilation : 1.08 ( 0%) 10109k ( 1%)
remove unused locals : 0.27 ( 0%) 0 ( 0%)
address taken : 0.16 ( 0%) 0 ( 0%)
rebuild frequencies : 0.01 ( 0%) 0 ( 0%)
TOTAL :2450.49 1605M
real 40m50.499s
user 40m5.988s
sys 0m44.479s
gcc14 is actually very slow:
$ time 14-bin/gcc/f951 -O2 gener.f90
MAIN__ main
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 113M} <visibility> {heap 113M} <build_ssa_passes>
{heap 113M} <opt_local_passes> {heap 113M} <remove_symbols> {heap 265M}
<targetclone> {heap 265M} <free-fnsummary> {heap 265M} <increase_alignment>
{heap 265M}Streaming LTO
<whole-program> {GC released 19M} {GC 262M -> 134M} {heap 265M}
<profile_estimate> {heap 265M} <icf> {heap 265M} <devirt> {heap 265M} <cp>
{heap 265M} <sra> {heap 265M} <fnsummary> {heap 265M} <inline> {heap 265M}
<pure-const> {heap 265M} <modref> {heap 265M} <free-fnsummary> {heap 265M}
<static-var> {heap 265M} <single-use> {heap 265M} <comdats> {heap
265M}Assembling functions:
MAIN__ {GC released 27M madv_dontneed 2944k} {GC 368M -> 251M} {GC released
30M madv_dontneed 3072k} {GC 581M -> 274M} {GC released 17M madv_dontneed
5376k} {GC 557M -> 298M} main
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%)
185k ( 0%)
phase parsing : 2.78 ( 0%) 0.03 ( 0%) 2.81 ( 0%)
70M ( 5%)
phase lang. deferred : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
phase opt and generate :13099.78 (100%) 41.97 (100%)13142.11
(100%) 1402M ( 95%)
garbage collection : 0.56 ( 0%) 0.01 ( 0%) 0.58 ( 0%)
0 ( 0%)
callgraph construction : 0.13 ( 0%) 0.00 ( 0%) 0.14 ( 0%)
41M ( 3%)
callgraph optimization : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
0 ( 0%)
callgraph functions expansion :13064.41 (100%) 41.48 ( 99%)13106.24
(100%) 1139M ( 77%)
callgraph ipa passes : 34.87 ( 0%) 0.49 ( 1%) 35.37 ( 0%)
134M ( 9%)
ipa function summary : 0.21 ( 0%) 0.00 ( 0%) 0.20 ( 0%)
6033k ( 0%)
ipa cp : 0.23 ( 0%) 0.04 ( 0%) 0.26 ( 0%)
14M ( 1%)
ipa inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
ipa pure const : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
1072 ( 0%)
ipa icf : 0.06 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
24 ( 0%)
ipa SRA : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
584 ( 0%)
ipa free inline summary : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
ipa modref : 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
1600 ( 0%)
cfg construction : 0.13 ( 0%) 0.00 ( 0%) 0.14 ( 0%)
11M ( 1%)
cfg cleanup : 0.99 ( 0%) 0.01 ( 0%) 0.98 ( 0%)
5357k ( 0%)
trivially dead code : 0.44 ( 0%) 0.00 ( 0%) 0.44 ( 0%)
0 ( 0%)
df scan insns : 0.28 ( 0%) 0.04 ( 0%) 0.31 ( 0%)
192 ( 0%)
df reaching defs : 37.34 ( 0%) 35.67 ( 85%) 64.19 ( 0%)
0 ( 0%)
df live regs : 2.79 ( 0%) 0.16 ( 0%) 3.01 ( 0%)
0 ( 0%)
df live&initialized regs : 1.42 ( 0%) 0.18 ( 0%) 1.71 ( 0%)
0 ( 0%)
df must-initialized regs : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%)
0 ( 0%)
df use-def / def-use chains : 0.74 ( 0%) 0.01 ( 0%) 0.84 ( 0%)
0 ( 0%)
df live reg subwords : 0.21 ( 0%) 0.00 ( 0%) 0.21 ( 0%)
0 ( 0%)
df reg dead/unused notes : 2.00 ( 0%) 0.02 ( 0%) 2.06 ( 0%)
14M ( 1%)
register information : 0.43 ( 0%) 0.00 ( 0%) 0.44 ( 0%)
0 ( 0%)
alias analysis : 1.07 ( 0%) 0.00 ( 0%) 1.07 ( 0%)
43M ( 3%)
alias stmt walking : 35.17 ( 0%) 1.04 ( 2%) 36.42 ( 0%)
11M ( 1%)
register scan : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
168k ( 0%)
rebuild jump labels : 0.12 ( 0%) 0.00 ( 0%) 0.12 ( 0%)
0 ( 0%)
parser (global) : 2.79 ( 0%) 0.03 ( 0%) 2.82 ( 0%)
70M ( 5%)
inline parameters : 0.15 ( 0%) 0.00 ( 0%) 0.15 ( 0%)
2049k ( 0%)
integration : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree gimplify : 0.27 ( 0%) 0.00 ( 0%) 0.27 ( 0%)
57M ( 4%)
tree eh : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
tree CFG construction : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%)
36M ( 2%)
tree CFG cleanup : 4.48 ( 0%) 0.07 ( 0%) 4.55 ( 0%)
5898k ( 0%)
tree tail merge : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
3543k ( 0%)
tree VRP :10988.65 ( 84%) 0.43 ( 1%)10989.38 (
84%) 11M ( 1%)
tree Early VRP : 10.11 ( 0%) 0.11 ( 0%) 10.25 ( 0%)
16M ( 1%)
tree copy propagation : 0.21 ( 0%) 0.00 ( 0%) 0.19 ( 0%)
168k ( 0%)
tree PTA : 13.20 ( 0%) 0.03 ( 0%) 13.25 ( 0%)
9434k ( 1%)
tree SSA rewrite : 0.12 ( 0%) 0.08 ( 0%) 0.17 ( 0%)
28M ( 2%)
tree SSA incremental : 0.95 ( 0%) 0.02 ( 0%) 1.08 ( 0%)
47M ( 3%)
tree operand scan : 0.65 ( 0%) 0.35 ( 1%) 0.89 ( 0%)
53M ( 4%)
dominator optimization : 27.28 ( 0%) 0.05 ( 0%) 27.40 ( 0%)
64M ( 4%)
backwards jump threading : 0.66 ( 0%) 0.00 ( 0%) 0.65 ( 0%)
0 ( 0%)
tree SRA : 3.72 ( 0%) 0.33 ( 1%) 4.12 ( 0%)
71M ( 5%)
isolate eroneous paths : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
tree CCP : 1.83 ( 0%) 0.04 ( 0%) 1.83 ( 0%)
3669k ( 0%)
tree split crit edges : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1181k ( 0%)
tree reassociation : 0.09 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
0 ( 0%)
tree PRE : 3.75 ( 0%) 0.06 ( 0%) 3.86 ( 0%)
15M ( 1%)
tree FRE : 13.07 ( 0%) 0.13 ( 0%) 12.83 ( 0%)
25M ( 2%)
tree RPO VN : 0.00 ( 0%) 0.01 ( 0%) 0.03 ( 0%)
1054k ( 0%)
tree code sinking : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
2868k ( 0%)
tree linearize phis : 0.09 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
2144 ( 0%)
tree backward propagate : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
tree forward propagate : 0.26 ( 0%) 0.10 ( 0%) 0.35 ( 0%)
6488k ( 0%)
tree phiprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree conservative DCE : 0.33 ( 0%) 0.11 ( 0%) 0.38 ( 0%)
0 ( 0%)
tree aggressive DCE : 0.24 ( 0%) 0.04 ( 0%) 0.31 ( 0%)
2180k ( 0%)
tree buildin call DCE : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
tree DSE : 0.67 ( 0%) 0.00 ( 0%) 0.68 ( 0%)
21M ( 1%)
PHI merge : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
127k ( 0%)
tree loop invariant motion : 0.09 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
0 ( 0%)
tree canonical iv : 0.12 ( 0%) 0.01 ( 0%) 0.12 ( 0%)
6506k ( 0%)
complete unrolling : 10.08 ( 0%) 0.46 ( 1%) 10.50 ( 0%)
88M ( 6%)
tree vectorization : 0.37 ( 0%) 0.00 ( 0%) 0.37 ( 0%)
28M ( 2%)
tree slp vectorization : 0.48 ( 0%) 0.01 ( 0%) 0.50 ( 0%)
48M ( 3%)
tree loop distribution : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
tree iv optimization : 1.23 ( 0%) 0.07 ( 0%) 1.29 ( 0%)
56M ( 4%)
predictive commoning : 0.08 ( 0%) 0.00 ( 0%) 0.09 ( 0%)
4134k ( 0%)
tree copy headers : 6.76 ( 0%) 0.03 ( 0%) 6.79 ( 0%)
7225k ( 0%)
tree SSA uncprop : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
gimple expand pow/cabs : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
gimple widening/fma detection : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree strlen optimization : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
6656 ( 0%)
tree modref : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
688 ( 0%)
dominance frontiers : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
dominance computation : 0.89 ( 0%) 0.00 ( 0%) 0.87 ( 0%)
0 ( 0%)
control dependences : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
out of ssa : 0.11 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
184 ( 0%)
expand vars : 0.07 ( 0%) 0.01 ( 0%) 0.08 ( 0%)
4014k ( 0%)
expand : 0.63 ( 0%) 0.00 ( 0%) 0.64 ( 0%)
122M ( 8%)
post expand cleanups : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
776 ( 0%)
lower subreg : 0.11 ( 0%) 0.00 ( 0%) 0.12 ( 0%)
0 ( 0%)
forward prop : 0.64 ( 0%) 0.01 ( 0%) 0.66 ( 0%)
506k ( 0%)
CSE : 0.95 ( 0%) 0.00 ( 0%) 0.95 ( 0%)
12M ( 1%)
dead code elimination : 0.92 ( 0%) 0.00 ( 0%) 0.90 ( 0%)
0 ( 0%)
dead store elim1 : 0.40 ( 0%) 0.01 ( 0%) 0.41 ( 0%)
8023k ( 1%)
dead store elim2 : 0.45 ( 0%) 0.00 ( 0%) 0.46 ( 0%)
10M ( 1%)
loop analysis : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
loop init : 3.55 ( 0%) 0.01 ( 0%) 3.60 ( 0%)
43M ( 3%)
loop invariant motion : 0.69 ( 0%) 1.02 ( 2%) 1.48 ( 0%)
448 ( 0%)
loop unrolling : 438.21 ( 3%) 1.06 ( 3%) 446.68 ( 3%)
54M ( 4%)
loop doloop : 0.38 ( 0%) 0.01 ( 0%) 1.71 ( 0%)
536 ( 0%)
loop fini : 0.04 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
CPROP : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
PRE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
web : 0.22 ( 0%) 0.00 ( 0%) 0.23 ( 0%)
1265k ( 0%)
auto inc dec : 1.10 ( 0%) 0.00 ( 0%) 1.10 ( 0%)
1349k ( 0%)
CSE 2 : 1.12 ( 0%) 0.02 ( 0%) 1.16 ( 0%)
7463k ( 0%)
branch prediction : 0.19 ( 0%) 0.00 ( 0%) 0.19 ( 0%)
3658k ( 0%)
combiner : 1.97 ( 0%) 0.01 ( 0%) 1.98 ( 0%)
28M ( 2%)
if-conversion : 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
scheduling :1456.96 ( 11%) 0.02 ( 0%)1456.99 ( 11%)
12M ( 1%)
integrated RA : 2.67 ( 0%) 0.00 ( 0%) 2.66 ( 0%)
102M ( 7%)
LRA non-specific : 1.81 ( 0%) 0.01 ( 0%) 1.81 ( 0%)
10210k ( 1%)
LRA virtuals elimination : 0.13 ( 0%) 0.00 ( 0%) 0.13 ( 0%)
3225k ( 0%)
LRA create live ranges : 0.22 ( 0%) 0.01 ( 0%) 0.23 ( 0%)
2784k ( 0%)
LRA hard reg assignment : 0.17 ( 0%) 0.00 ( 0%) 0.17 ( 0%)
0 ( 0%)
reload : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
48 ( 0%)
reload CSE regs : 1.27 ( 0%) 0.00 ( 0%) 1.26 ( 0%)
17M ( 1%)
ree : 0.12 ( 0%) 0.00 ( 0%) 0.12 ( 0%)
280 ( 0%)
thread pro- & epilogue : 0.62 ( 0%) 0.00 ( 0%) 0.63 ( 0%)
6304 ( 0%)
if-conversion 2 : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
peephole 2 : 0.15 ( 0%) 0.00 ( 0%) 0.14 ( 0%)
0 ( 0%)
hard reg cprop : 0.27 ( 0%) 0.00 ( 0%) 0.27 ( 0%)
253k ( 0%)
scheduling 2 : 1.68 ( 0%) 0.01 ( 0%) 1.70 ( 0%)
509k ( 0%)
reorder blocks : 0.17 ( 0%) 0.01 ( 0%) 0.20 ( 0%)
5315k ( 0%)
shorten branches : 0.11 ( 0%) 0.00 ( 0%) 0.12 ( 0%)
0 ( 0%)
final : 0.61 ( 0%) 0.02 ( 0%) 0.62 ( 0%)
55M ( 4%)
tree if-combine : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
if to switch conversion : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
straight-line strength reduction : 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
store merging : 0.06 ( 0%) 0.01 ( 0%) 0.08 ( 0%)
3047k ( 0%)
initialize rtl : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
11k ( 0%)
address lowering : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
tree loop if-conversion : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
3965k ( 0%)
access analysis : 3.17 ( 0%) 0.00 ( 0%) 3.16 ( 0%)
1296 ( 0%)
rest of compilation : 1.10 ( 0%) 0.00 ( 0%) 1.15 ( 0%)
10M ( 1%)
remove unused locals : 0.18 ( 0%) 0.00 ( 0%) 0.18 ( 0%)
0 ( 0%)
address taken : 0.15 ( 0%) 0.00 ( 0%) 0.17 ( 0%)
0 ( 0%)
rebuild frequencies : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
TOTAL :13102.57 42.01 13144.94
1473M
real 219m4.958s
user 218m22.576s
sys 0m42.029s