https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531
Bug ID: 114531 Summary: Feature proposal for an `-finline-functions-aggressive` compiler option Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: rvmallad at amazon dot com CC: rsandifo at gcc dot gnu.org Target Milestone: --- Created attachment 57837 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57837&action=edit patch to implement -finline-functions-aggressive option in GCC This is a proposal for a user-visible GCC compiler option for aggressive inlining that is currently only available at -O3 as internal inline parameters (--param=early-inlining-insns=14 --param=inline-heuristics-hint-percent=600 --param=inline-min-speedup=15 --param=max-inline-insns-auto=30 --param=max-inline-insns-single=200). I got some perf data for Envoy (https://github.com/envoyproxy/envoy) and SPEC CPU2017 intrate benchmarks on C7g.2xlarge w Ubuntu22 + gcc-11.4.0. We see perf gains (2% - 5%) using these aggressive inline parameters (at -O2). Attached is a patch for this change. We do not want to add these inline limits at ‘-O2’ itself, as we see from one of the SPEC CPU tests that got slower. Also, more inline tuning at -O2 would make some of the symbols not to be available for probe/ debug (that are available when not using these aggressive inline params). ----------------------------------------------------------------------- Envoy load_balancer_benchmark – using only 1 CPU – Iterations, higher better $ bazel run -c opt //test/common/upstream:load_balancer_benchmark bazel-envoy/external/local_config_cc/BUILD can be changed for adding inline parameters/ options. ------------------------------------------------------------------------ Benchmark Iterations Baseline O2 + Inline Params Gain ------------------------------------------------------------------------ benchmarkRoundRobinLoad 1518 1596 1.05x BalancerBuild/500/50/50 benchmarkLeastRequestLoad 1465 1514 1.03x BalancerChooseHost/100/3/1000 benchmarkRingHashLoadBalancer 33 34 1.03x ChooseHost/100/65536/100000 benchmarkMaglevLoadBalancer 69 72 1.04x Weighted/500/95/75/25/10000 ------------------------------------------------------------------------ copies=8 "-O2" "-Ofast" Gain "-O2 + Gain w w Ofast inlining" inlining 500.perlbench_r 36.5 34.3 94.0% 34.4 94.2% 502.gcc_r 45.4 47.6 104.8% 47.5 104.6% 505.mcf_r 44.6 48.2 108.1% 44.3 99.3% 520.omnetpp_r 22.1 24.9 112.7% 21.9 99.1% 523.xalancbmk_r 43.8 46.3 105.7% 45.4 103.7% 525.x264_r 44.3 89 200.9% 43.8 98.9% 531.deepsjeng_r 36 37.3 103.6% 37.5 104.2% 541.leela_r 33.5 33.9 101.2% 34.2 102.1% 548.exchange2_r 65.4 76.6 117.1% 65.3 99.8% 557.xz_r 19.8 19.9 100.5% 19.9 100.5% SPECrate..base 37.1 41.6 112.1% 37.3 100.5% -----------------------------------------------------------------------