Hello, For some time now, I've wanted to see where compile time goes in a typical GCC build, because nobody really seems to know what the compiler spends its time on. The impressions that get published about gcc usually indicate that there is at least a feeling that GCC is not getting faster, and that parts of the compiler are unreasonably slow. I was hoping to maybe shed some light on what parts that could be.
What I've done is this: * Build GCC 4.6.0 (trunk r159624) with --enable-checking=release and with -O2 and install it * Build GCC 4.6.0 (trunk r159624) again, with the installed compiler and with "-O2 -g3 -ftime-report". The time reports (along with everything else on stderr) are piped to an output file * Extract, sum, and sort time consumed per timevar Host was cfarm gcc14 (8 x 3GHz Xeon). Target was x86_64-unknown-linux-gnu. "Build" means non-bootstrap. Results at the bottom of this mail. Conclusions: * There are quite a few timevars for parts of the compiler that have been removed: TV_SEQABSTR, TV_GLOBAL_ALLOC, TV_LOCAL_ALLOC are the ones I've spotted so far. I will go through the whole list, remove all timevars that are unused, and submit a patch. * The "slow" parts of the compiler are not exactly news: tree-PRE, scheduling, register allocation * Variable tracking costs ~7.8% of compile time. This more than the cost of the register allocation (IRA+reload) * The C front end (preprocessing+lexing+parsing) costs ~17%. For an optimizing compiler with so many passes, this is quite a lot. * The GIMPLE optimizers (done with egrep "tree|dominator_opt|alias_stmt_walking|alias_analysis|inline_heuristics|PHI_merge") together cost ~16%. * Adding and subtracting the above numbers, the rest of the compiler, which is mostly the RTL parts, still account for 100-17-16-8=59% of the total compile time. This was the most surprising result for me. Ciao! Steven auto_inc_dec 0.00 0% callgraph_verifier 0.00 0% cfg_construction 0.00 0% CFG_verifier 0.00 0% delay_branch_sched 0.00 0% df_live_byte_regs 0.00 0% df_scan_insns 0.00 0% df_uninitialized_regs_2 0.00 0% dump_files 0.00 0% global_alloc 0.00 0% Graphite_code_generation 0.00 0% Graphite_data_dep_analysis 0.00 0% Graphite_loop_transforms 0.00 0% ipa_free_lang_data 0.00 0% ipa_lto_cgraph_IO 0.00 0% ipa_lto_cgraph_merge 0.00 0% ipa_lto_decl_init_IO 0.00 0% ipa_lto_decl_IO 0.00 0% ipa_lto_decl_merge 0.00 0% ipa_lto_gimple_IO 0.00 0% ipa_points_to 0.00 0% ipa_profile 0.00 0% ipa_type_escape 0.00 0% life_analysis 0.00 0% life_info_update 0.00 0% load_CSE_after_reload 0.00 0% local_alloc 0.00 0% loop_doloop 0.00 0% loop_unrolling 0.00 0% loop_unswitching 0.00 0% LSM 0.00 0% lto 0.00 0% name_lookup 0.00 0% overload_resolution 0.00 0% PCH_main_state_restore 0.00 0% PCH_main_state_save 0.00 0% PCH_pointer_reallocation 0.00 0% PCH_pointer_sort 0.00 0% PCH_preprocessor_state_restore 0.00 0% PCH_preprocessor_state_save 0.00 0% plugin_execution 0.00 0% plugin_initialization 0.00 0% predictive_commoning 0.00 0% reg_stack 0.00 0% rename_registers 0.00 0% rest_of_compilation 0.00 0% sequence_abstraction 0.00 0% shorten_branches 0.00 0% sms_modulo_scheduling 0.00 0% template_instantiation 0.00 0% total_time 0.00 0% tracer 0.00 0% tree_check_data_dependences 0.00 0% tree_loop_distribution 0.00 0% tree_loop_linear 0.00 0% tree_loop_optimization 0.00 0% tree_loop_unswitching 0.00 0% tree_parallelize_loops 0.00 0% tree_prefetching 0.00 0% tree_redundant_PHIs 0.00 0% tree_slp_vectorization 0.00 0% tree_SSA_to_normal 0.00 0% tree_SSA_verifier 0.00 0% tree_STMT_verifier 0.00 0% tree_STORE_CCP 0.00 0% tree_store_copy_prop 0.00 0% tree_vectorization 0.00 0% value_profile_opts 0.00 0% web 0.00 0% whopr_ltrans 0.00 0% whopr_wpa 0.00 0% whopr_wpa_IO 0.00 0% whopr_wpa_ltrans 0.00 0% mode_switching 0.01 0.00261117% tree_NRV_optimization 0.01 0.00261117% tree_loop_fini 0.03 0.00783351% tree_switch_initialization_conversion 0.03 0.00783351% lower_subreg 0.04 0.0104447% tree_buildin_call_DCE 0.05 0.0130559% code_hoisting 0.06 0.015667% ipa_reference 0.06 0.015667% tree_canonical_iv 0.06 0.015667% tree_if_combine 0.06 0.015667% PHI_merge 0.07 0.0182782% tree_phiprop 0.07 0.0182782% uninit_var_anaysis 0.07 0.0182782% control_dependences 0.08 0.0208894% varconst 0.09 0.0235005% tree_PHI_const_copy_prop 0.16 0.0417787% tree_eh 0.19 0.0496122% tree_split_crit_edges 0.19 0.0496122% scev_constant_prop 0.20 0.0522234% tree_PHI_insertion 0.20 0.0522234% tree_copy_headers 0.23 0.0600569% tree_loop_bounds 0.24 0.0626681% tree_loop_invariant_motion 0.27 0.0705016% variable_output 0.27 0.0705016% combine_stack_adjustments 0.28 0.0731128% garbage_collection 0.28 0.0731128% loop_analysis 0.28 0.0731128% tree_SSA_uncprop 0.28 0.0731128% tree_SRA 0.30 0.0783351% ipa_cp 0.34 0.0887798% tree_linearize_phis 0.34 0.0887798% tree_DSE 0.39 0.101836% tree_find_ref._vars 0.39 0.101836% varpool_construction 0.39 0.101836% tree_rename_SSA_copies 0.47 0.122725% complete_unrolling 0.50 0.130559% tree_SSA_other 0.53 0.138392% tree_loop_init 0.57 0.148837% tree_CFG_construction 0.59 0.154059% tree_code_sinking 0.60 0.15667% zee 0.62 0.161893% dominance_frontiers 0.65 0.169726% loop_invariant_motion 0.66 0.172337% register_scan 0.67 0.174948% ipa_pure_const 0.71 0.185393% tree_reassociation 0.72 0.188004% callgraph_construction 0.73 0.190615% if_conversion_2 0.74 0.193227% tree_forward_propagate 0.77 0.20106% ipa_SRA 0.91 0.237617% peephole_2 0.95 0.248061% tree_conservative_DCE 0.96 0.250672% regmove 1.02 0.266339% thread_pro_and_epilogue 1.28 0.33423% tree_iv_optimization 1.31 0.342063% tree_operand_scan 1.32 0.344675% rebuild_jump_labels 1.33 0.347286% jump 1.34 0.349897% branch_prediction 1.35 0.352508% machine_dep_reorg 1.36 0.355119% inline_heuristics 1.42 0.370786% df_multiple_defs 1.50 0.391676% dead_code_elimination 1.74 0.454344% tree_SSA_rewrite 1.80 0.470011% df_use_def_def_use_chains 1.86 0.485678% trivially_dead_code 2.00 0.522234% reorder_blocks 2.07 0.540512% hard_reg_cprop 2.10 0.548346% alias_stmt_walking 2.15 0.561402% tree_copy_propagation 2.19 0.571846% register_information 2.27 0.592736% tree_aggressive_DCE 2.29 0.597958% dead_store_elim1 2.38 0.621459% dead_store_elim2 2.40 0.626681% integration 2.73 0.71285% if_conversion 2.80 0.731128% tree_CCP 2.89 0.754628% tree_gimplify 3.03 0.791185% callgraph_optimization 3.22 0.840797% forward_prop 3.25 0.84863% alias_analysis 3.41 0.890409% df_reaching_defs 3.44 0.898243% dominator_optimization 3.49 0.911299% tree_SSA_incremental 3.50 0.91391% tree_FRE 3.90 1.01836% CSE_2 4.71 1.22986% tree_PTA 4.80 1.25336% CPROP 4.98 1.30036% reload_CSE_regs 5.23 1.36564% final 5.26 1.37348% dominance_computation 5.44 1.42048% df_reg_dead_unused_notes 5.62 1.46748% tree_CFG_cleanup 5.69 1.48576% cfg_cleanup 6.28 1.63982% PRE 6.64 1.73382% lexical_analysis 6.65 1.73643% CSE 8.16 2.13072% tree_VRP 8.36 2.18294% symout 8.94 2.33439% combiner 10.17 2.65556% tree_PRE 11.42 2.98196% scheduling_2 11.44 2.98718% reload 11.7 3.05507% df_live_initialized_regs 12.92 3.37363% integrated_RA 16.31 4.25882% df_live_regs 17.52 4.57477% expand 24.18 6.31381% preprocessing 27.59 7.20422% variable_tracking 29.17 7.61678% parser 31.53 8.23302% TOTAL 382.97 100%