https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109237
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- Samples: 289K of event 'cycles:u', Event count (approx.): 384226334976 Overhead Samples Command Shared Object Symbol 3.52% 9747 cc1 cc1 [.] bb_is_just_return # 2.94% 8241 cc1 cc1 [.] df_note_compute # 2.92% 8085 cc1 cc1 [.] init_alias_analysis # 2.55% 7035 cc1 cc1 [.] delete_trivially_dead_insns # 2.28% 6372 cc1 cc1 [.] contains_no_active_insn_p # 2.16% 6288 cc1 cc1 [.] get_ref_base_and_extent # 2.02% 5785 cc1 cc1 [.] ggc_set_mark # 1.55% 4308 cc1 cc1 [.] fast_dce # I see that bb_is_just_return is high in the profile and looking at its implementation I wonder whether on RTL we can scan insns backwards and stop if the last (real?) insn isn't ANY_RETURN_P ()? Using FOR_BB_INSNS_REVERSE puts it off the profile completely. Will test a patch. Similar contains_no_active_insn_p is high up in the profile and it looks like micro-optimizing it a bit would help. Using NONDEBUG_INSN_P to guard the flow_active_insn_p call doesn't seem to help (but perf is always noisy).