https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90595

            Bug ID: 90595
           Summary: LRA liveness analysis is slow
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Split out from PR88440.  As of today you'll see a big compile-time increase
for compiling module_configure.fppized.f90 of 521.wrf_r at -O2.

Time profile before/after:

╔══════════════════════════╤════════╤════════╤═════════╗
║ PASS                     │ Before │ After  │ Change  ║
╠══════════════════════════╪════════╪════════╪═════════╣
║ backwards jump threading │ 6.29   │ 6.16   │ 97.93%  ║
║ integrated RA            │ 6.76   │ 6.41   │ 94.82%  ║
║ tree SSA incremental     │ 9.01   │ 11.16  │ 123.86% ║
║ LRA create live ranges   │ 15.68  │ 40.02  │ 255.23% ║
║ PRE                      │ 23.24  │ 32.32  │ 139.07% ║
║ alias stmt walking       │ 27.69  │ 28.75  │ 103.83% ║
║ phase opt and generate   │ 124.13 │ 163.95 │ 132.08% ║
║ TOTAL                    │ 125.39 │ 165.17 │ 131.73% ║

so LRA live ranges is already slow before.  perf profiling after the change
shows

Samples: 579  of event 'cycles:ppp', Event count (approx.): 257134187434191     
Overhead  Command  Shared Object     Symbol                                     
  22.26%  f951     f951              [.] process_bb_lives
  15.06%  f951     f951              [.] ix86_hard_regno_call_part_clobbered
   8.55%  f951     f951              [.] concat
   6.88%  f951     f951              [.] find_base_term
   3.60%  f951     f951              [.] get_ref_base_and_extent
   3.27%  f951     f951              [.] find_base_term
   2.95%  f951     f951              [.] make_hard_regno_dead

which IMHO points at

static inline void
check_pseudos_live_through_calls (int regno,
                                  HARD_REG_SET last_call_used_reg_set,
                                  rtx_insn *call_insn)
{
...
  for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
    if (targetm.hard_regno_call_part_clobbered (call_insn, hr,
                                                PSEUDO_REGNO_MODE (regno)))
      add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
                           PSEUDO_REGNO_MODE (regno), hr);

where we do a lot of redundant work because we call this function in a loop
possibly very many times with the same arguments besides regno (but with
same PSEUDO_REGNO_MODE).

Reply via email to