https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125666

            Bug ID: 125666
           Summary: inline-unit-growth param not aggressive enough for
                    SPEC2026
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org
  Target Milestone: ---

I've been investigating cases where SPEC2026 benchmarks compiled with GCC are
slower than with LLVM at -Ofast -flto. A few of them seem to be restricted by
the inliner aggressivenes.
Setting --param inline-unit-growth to 100 rather than the default 40 I get
speedups on:
723.llvm_r +5.1%
766.femflow +11.3%
767.nest +3.7%
748.flightdm +3.9%
760.rocksdb +2.6%
734.vpr +1.7%.

I think the problem is that the global inline-unit-growth budget is a hard cap:
once overall_size + growth exceeds it, inline_small_functions vetoes
every remaining edge with CIF_INLINE_UNIT_GROWTH_LIMIT -- including edges that
the per-call cost model (want_inline_small_function_p, checked immediately
after) has already approved.  On large LTO C++ programs this blocks a broad
tier of small, ubiquitous helper inlines (constructors, destructors,
accessors, shared_ptr reference counting)

I'm not sure that raising the inline-unit-growth unilaterally is the best fix
as it would bloat code size quite a lot (but maybe it's acceptable for some
aggressively OOO -mcpus at -O3?) but maybe there can be some soft cap on the
global cap and let the per-call model decide?

Reply via email to