[Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO

jamborm at gcc dot gnu.org Fri, 27 Mar 2020 09:35:09 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360


            Bug ID: 94360
           Summary: 6% run-time regression of 502.gcc_r against GCC 9 when
                    compiled with -O2 and both PGO and LTO
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

When built at -O2, generic march/mtune and with both PGO and LTO and
current trunk/master, SPEC 2017 INTrate 502.gcc_r is 6% slower when
run on and AMD Zen2-based CPU - and about 4.8% slower on Intel Cascade
Lake.

Looking at how the run-time of the benchmark evolved over the course
of GCC 10 development cycle, the first and biggest regression (9%)
comes with:

  commit 2925cad2151842daa387950e62d989090e47c91d
  Author: Jan Hubicka <hubi...@ucw.cz>
  Date:   Thu Oct 3 17:08:21 2019 +0200

    params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New.

            * params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT,
            PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New.
            * doc/invoke.texi (inline-heuristics-hint-percent,
            inline-heuristics-hint-percent-O2): Document.
            * tree-inline.c (inline_insns_single, inline_insns_auto): Add new
            hint attribute.
            (can_inline_edge_by_limits_p): Use it.

   From-SVN: r276516

Then between Wed Nov 6 (72d6aeecd95) and Mon Nov 18 (58c036c8354) it
improved to about 103% of GCC 9 run-time (I did not exactly found what
caused it because in much of this range the compiler was segfaulting
in the LTO phase).  Eventually, the benchmark regresses to current
106% of GCC 9 run-time with Honza's:

  - 9340d34599e Convert inliner to function specific param infrastructure, or
  - 1e83bd7003e Convert inliner to new param infrastructure.

The former cannot be built without the latter.

Symbol profiles are:

trunk (26b3e568a60):
  Overhead    Samples  Shared Object         Symbol                             
  ........  .........  .................... 
....................................

     4.04%      42371  cpugcc_r_peak.pgolto  bitmap_ior_into
     2.91%      30281  cpugcc_r_peak.pgolto  df_worklist_dataflow
     2.24%      23342  cpugcc_r_peak.pgolto  df_note_compute
     1.92%      20120  cpugcc_r_peak.pgolto  bitmap_set_bit
     1.75%      18148  cpugcc_r_peak.pgolto  rest_of_handle_fast_dce.lto_priv.0
     1.58%      16580  libc-2.31.so          __memset_avx2_unaligned_erms
     1.40%      14514  cpugcc_r_peak.pgolto  extract_new_fences_from.lto_priv.0
     1.39%      14732  libc-2.31.so          _int_malloc
     1.33%      13824  cpugcc_r_peak.pgolto  bitmap_copy
     1.24%      12962  cpugcc_r_peak.pgolto  bitmap_bit_p
     1.19%      12346  cpugcc_r_peak.pgolto  bitmap_and
     1.18%      12242  cpugcc_r_peak.pgolto  df_lr_local_compute.lto_priv.0
     1.02%      10618  cpugcc_r_peak.pgolto  cleanup_cfg.isra.0


vs gcc 9 (releases/gcc-9.3.0):


  Overhead    Samples  Shared Object         Symbol                             
  ........  .........  .................... 
.....................................

     6.81%      66967  cpugcc_r_peak.pgolto  df_worklist_dataflow
     2.83%      28063  cpugcc_r_peak.pgolto  bitmap_ior_into
     2.80%      27489  cpugcc_r_peak.pgolto  df_note_compute.lto_priv.0
     2.17%      21334  cpugcc_r_peak.pgolto  rest_of_handle_fast_dce.lto_priv.0
     1.69%      16671  libc-2.31.so          __memset_avx2_unaligned_erms
     1.51%      14876  cpugcc_r_peak.pgolto  try_optimize_cfg.lto_priv.0
     1.50%      14990  libc-2.31.so          _int_malloc
     1.50%      14715  cpugcc_r_peak.pgolto  extract_new_fences_from.lto_priv.0
     1.36%      13406  cpugcc_r_peak.pgolto  df_lr_local_compute.lto_priv.0
     1.20%      11926  cpugcc_r_peak.pgolto  remove_unused_locals
     1.06%      10433  cpugcc_r_peak.pgolto  sched_analyze_insn
     1.04%      10210  cpugcc_r_peak.pgolto  init_alias_analysis
     1.04%      10188  cpugcc_r_peak.pgolto  prescan_insns_for_dce.lto_priv.0
     1.00%       9876  cpugcc_r_peak.pgolto  compute_transp


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug ipa/94360] New: 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO

Reply via email to