https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
Bug ID: 94360 Summary: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux When built at -O2, generic march/mtune and with both PGO and LTO and current trunk/master, SPEC 2017 INTrate 502.gcc_r is 6% slower when run on and AMD Zen2-based CPU - and about 4.8% slower on Intel Cascade Lake. Looking at how the run-time of the benchmark evolved over the course of GCC 10 development cycle, the first and biggest regression (9%) comes with: commit 2925cad2151842daa387950e62d989090e47c91d Author: Jan Hubicka <hubi...@ucw.cz> Date: Thu Oct 3 17:08:21 2019 +0200 params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New. * params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New. * doc/invoke.texi (inline-heuristics-hint-percent, inline-heuristics-hint-percent-O2): Document. * tree-inline.c (inline_insns_single, inline_insns_auto): Add new hint attribute. (can_inline_edge_by_limits_p): Use it. From-SVN: r276516 Then between Wed Nov 6 (72d6aeecd95) and Mon Nov 18 (58c036c8354) it improved to about 103% of GCC 9 run-time (I did not exactly found what caused it because in much of this range the compiler was segfaulting in the LTO phase). Eventually, the benchmark regresses to current 106% of GCC 9 run-time with Honza's: - 9340d34599e Convert inliner to function specific param infrastructure, or - 1e83bd7003e Convert inliner to new param infrastructure. The former cannot be built without the latter. Symbol profiles are: trunk (26b3e568a60): Overhead Samples Shared Object Symbol ........ ......... .................... .................................... 4.04% 42371 cpugcc_r_peak.pgolto bitmap_ior_into 2.91% 30281 cpugcc_r_peak.pgolto df_worklist_dataflow 2.24% 23342 cpugcc_r_peak.pgolto df_note_compute 1.92% 20120 cpugcc_r_peak.pgolto bitmap_set_bit 1.75% 18148 cpugcc_r_peak.pgolto rest_of_handle_fast_dce.lto_priv.0 1.58% 16580 libc-2.31.so __memset_avx2_unaligned_erms 1.40% 14514 cpugcc_r_peak.pgolto extract_new_fences_from.lto_priv.0 1.39% 14732 libc-2.31.so _int_malloc 1.33% 13824 cpugcc_r_peak.pgolto bitmap_copy 1.24% 12962 cpugcc_r_peak.pgolto bitmap_bit_p 1.19% 12346 cpugcc_r_peak.pgolto bitmap_and 1.18% 12242 cpugcc_r_peak.pgolto df_lr_local_compute.lto_priv.0 1.02% 10618 cpugcc_r_peak.pgolto cleanup_cfg.isra.0 vs gcc 9 (releases/gcc-9.3.0): Overhead Samples Shared Object Symbol ........ ......... .................... ..................................... 6.81% 66967 cpugcc_r_peak.pgolto df_worklist_dataflow 2.83% 28063 cpugcc_r_peak.pgolto bitmap_ior_into 2.80% 27489 cpugcc_r_peak.pgolto df_note_compute.lto_priv.0 2.17% 21334 cpugcc_r_peak.pgolto rest_of_handle_fast_dce.lto_priv.0 1.69% 16671 libc-2.31.so __memset_avx2_unaligned_erms 1.51% 14876 cpugcc_r_peak.pgolto try_optimize_cfg.lto_priv.0 1.50% 14990 libc-2.31.so _int_malloc 1.50% 14715 cpugcc_r_peak.pgolto extract_new_fences_from.lto_priv.0 1.36% 13406 cpugcc_r_peak.pgolto df_lr_local_compute.lto_priv.0 1.20% 11926 cpugcc_r_peak.pgolto remove_unused_locals 1.06% 10433 cpugcc_r_peak.pgolto sched_analyze_insn 1.04% 10210 cpugcc_r_peak.pgolto init_alias_analysis 1.04% 10188 cpugcc_r_peak.pgolto prescan_insns_for_dce.lto_priv.0 1.00% 9876 cpugcc_r_peak.pgolto compute_transp Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)