Hi,

this patch set addresses a number of shortcomings of IPA-CP when it
has profile feedback data at its disposal.  While at this point it is
mostly RFC material because I expect Honza will correct many of the
places where I use a wrong method of profile_count and should be using
some slightly different one, I do want to turn it into material I can
push to master rather quickly.

Most of the changes were motivated by SPEC 2017 exchange2 benchmark,
which exposes the problems nicely, is now 22% slower with profile
feedback, and this patch fixes that.  Overall, the patch set does not
have any effect on SPEC 2017 FPrate. SPEC 2017 INTrate results, as
quickly gathered on my znver2 desktop overnight (1 run only), are:

PGO only:

  | Benchmark       | Trunk | Rate | Patch |      % | Rate |
  |-----------------+-------+------+-------+--------+------|
  | 500.perlbench_r |   236 | 6.74 |   239 |  +1.27 | 6.67 |
  | 502.gcc_r       |   160 | 8.85 |   159 |  -0.62 | 8.89 |
  | 505.mcf_r       |   227 | 7.11 |   228 |  +0.44 | 7.08 |
  | 520.omnetpp_r   |   314 | 4.18 |   311 |  -0.96 | 4.21 |
  | 523.xalancbmk_r |   195 | 5.41 |   199 |  +2.05 | 5.32 |
  | 525.x264_r      |   129 | 13.6 |   131 |  +1.55 | 13.4 |
  | 531.deepsjeng_r |   230 | 4.98 |   230 |  +0.00 | 4.98 |
  | 541.leela_r     |   353 | 4.70 |   353 |  +0.00 | 4.69 |
  | 548.exchange2_r |   249 | 10.5 |   189 | -24.10 | 13.8 |
  | 557.xz_r        |   246 | 4.39 |   248 |  +0.81 | 4.36 |
  |-----------------+-------+------+-------+--------+------|
  | Geomean         |       | 6.53 |       |        | 6.68 |

I have re-run 523.xalancbmk_r and the regression seems to be noise.

PGO+LTO:

| Benchmark       | Trunk | Rate | Patch |      % |  Rate |
|-----------------+-------+------+-------+--------+-------|
| 500.perlbench_r |   231 | 6.88 |   230 |  -0.43 |  6.93 |
| 502.gcc_r       |   149 | 9.51 |   149 |  +0.00 |  9.53 |
| 505.mcf_r       |   208 | 7.76 |   202 |  -2.88 |  7.98 |
| 520.omnetpp_r   |   282 | 4.64 |   282 |  +0.00 |  4.65 |
| 523.xalancbmk_r |   185 | 5.70 |   188 |  +1.62 |  5.63 |
| 525.x264_r      |   133 | 13.1 |   134 |  +0.75 | 13.00 |
| 531.deepsjeng_r |   190 | 6.04 |   185 |  -2.63 |  6.20 |
| 541.leela_r     |   298 | 5.56 |   298 |  +0.00 |  5.57 |
| 548.exchange2_r |   247 | 10.6 |   193 | -21.86 | 13.60 |
| 557.xz_r        |   250 | 4.32 |   251 |  +0.40 |  4.31 |
|-----------------+-------+------+-------+--------+-------|
| Geomean         |       | 6.97 |       |        |  7.18 |

I have re-run 531.deepsjeng_r and 505.mcf_r and while the former
improvement seems to be noise, the latter is consistent and even
explainable by more cloning of spec_qsort, which is the result of the
last patch and saner updates of counts of call graph edges from these
clones.

In both cases the exchange2 improvement is achieved by:

1) The second patch which makes sure that IPA-CP creates a clone for
   the first value, even though the non-recursive edge bringing the
   value is quite cold, because it enables specializing for much
   hotter contexts, and

2) the third patch which changes how values resulting from arithmetic
   jump functions on self-recursive edges are evaluated and then
   modifies the profile count of the whole resulting call graph part.

The final patch is not necessary to address the exchange2 regression.

I have bootstrapped and LTO-profile-bootstrapped and tested the whole
patch series on x86_64-linux without any issues.  As written above,
I'll be happy to address any comments/concerns so that something like
this can be pushed to master soon.

Thanks,

Martin


Martin Jambor (4):
  cgraph: Do not warn about caller count mismatches of removed functions
  ipa-cp: Propagation boost for recursion generated values
  ipa-cp: Fix updating of profile counts and self-gen value evaluation
  ipa-cp: Select saner profile count to base heuristics on

 gcc/cgraph.c   |   4 +-
 gcc/ipa-cp.c   | 786 ++++++++++++++++++++++++++++++++++++++-----------
 gcc/params.opt |   8 +
 3 files changed, 621 insertions(+), 177 deletions(-)

-- 
2.32.0

Reply via email to