Hi, this patch set addresses a number of shortcomings of IPA-CP when it has profile feedback data at its disposal. While at this point it is mostly RFC material because I expect Honza will correct many of the places where I use a wrong method of profile_count and should be using some slightly different one, I do want to turn it into material I can push to master rather quickly.
Most of the changes were motivated by SPEC 2017 exchange2 benchmark, which exposes the problems nicely, is now 22% slower with profile feedback, and this patch fixes that. Overall, the patch set does not have any effect on SPEC 2017 FPrate. SPEC 2017 INTrate results, as quickly gathered on my znver2 desktop overnight (1 run only), are: PGO only: | Benchmark | Trunk | Rate | Patch | % | Rate | |-----------------+-------+------+-------+--------+------| | 500.perlbench_r | 236 | 6.74 | 239 | +1.27 | 6.67 | | 502.gcc_r | 160 | 8.85 | 159 | -0.62 | 8.89 | | 505.mcf_r | 227 | 7.11 | 228 | +0.44 | 7.08 | | 520.omnetpp_r | 314 | 4.18 | 311 | -0.96 | 4.21 | | 523.xalancbmk_r | 195 | 5.41 | 199 | +2.05 | 5.32 | | 525.x264_r | 129 | 13.6 | 131 | +1.55 | 13.4 | | 531.deepsjeng_r | 230 | 4.98 | 230 | +0.00 | 4.98 | | 541.leela_r | 353 | 4.70 | 353 | +0.00 | 4.69 | | 548.exchange2_r | 249 | 10.5 | 189 | -24.10 | 13.8 | | 557.xz_r | 246 | 4.39 | 248 | +0.81 | 4.36 | |-----------------+-------+------+-------+--------+------| | Geomean | | 6.53 | | | 6.68 | I have re-run 523.xalancbmk_r and the regression seems to be noise. PGO+LTO: | Benchmark | Trunk | Rate | Patch | % | Rate | |-----------------+-------+------+-------+--------+-------| | 500.perlbench_r | 231 | 6.88 | 230 | -0.43 | 6.93 | | 502.gcc_r | 149 | 9.51 | 149 | +0.00 | 9.53 | | 505.mcf_r | 208 | 7.76 | 202 | -2.88 | 7.98 | | 520.omnetpp_r | 282 | 4.64 | 282 | +0.00 | 4.65 | | 523.xalancbmk_r | 185 | 5.70 | 188 | +1.62 | 5.63 | | 525.x264_r | 133 | 13.1 | 134 | +0.75 | 13.00 | | 531.deepsjeng_r | 190 | 6.04 | 185 | -2.63 | 6.20 | | 541.leela_r | 298 | 5.56 | 298 | +0.00 | 5.57 | | 548.exchange2_r | 247 | 10.6 | 193 | -21.86 | 13.60 | | 557.xz_r | 250 | 4.32 | 251 | +0.40 | 4.31 | |-----------------+-------+------+-------+--------+-------| | Geomean | | 6.97 | | | 7.18 | I have re-run 531.deepsjeng_r and 505.mcf_r and while the former improvement seems to be noise, the latter is consistent and even explainable by more cloning of spec_qsort, which is the result of the last patch and saner updates of counts of call graph edges from these clones. In both cases the exchange2 improvement is achieved by: 1) The second patch which makes sure that IPA-CP creates a clone for the first value, even though the non-recursive edge bringing the value is quite cold, because it enables specializing for much hotter contexts, and 2) the third patch which changes how values resulting from arithmetic jump functions on self-recursive edges are evaluated and then modifies the profile count of the whole resulting call graph part. The final patch is not necessary to address the exchange2 regression. I have bootstrapped and LTO-profile-bootstrapped and tested the whole patch series on x86_64-linux without any issues. As written above, I'll be happy to address any comments/concerns so that something like this can be pushed to master soon. Thanks, Martin Martin Jambor (4): cgraph: Do not warn about caller count mismatches of removed functions ipa-cp: Propagation boost for recursion generated values ipa-cp: Fix updating of profile counts and self-gen value evaluation ipa-cp: Select saner profile count to base heuristics on gcc/cgraph.c | 4 +- gcc/ipa-cp.c | 786 ++++++++++++++++++++++++++++++++++++++----------- gcc/params.opt | 8 + 3 files changed, 621 insertions(+), 177 deletions(-) -- 2.32.0