https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943

--- Comment #47 from Andrew Macleod <amacleod at redhat dot com> ---
Created attachment 52637
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52637&action=edit
new patch

I am working on a alternative cache for GCC 13, but along the way, I have
changes to the ranger_cache::range_from_dom() routine.  The original version
gave up when it hit a block which had outgoing edges. The new version is
smarter and basically goes back until it finds a cache entry, and then
intersects all outgoing edge between the two places. It also removes the
recursion , and does not SET any cache values during the lookup (making it a
true query).

The net effect of this is significant improvements in cache performance because
its used far less, but there is more time spend doing calculations. This
bootstraps and passes all regression tests.  we do miss out on a few minor
opportunities (30 out of 4400 in all of EVRP over the GCC source)  which occur
as a result of updated values not being propagated properly as the cache is no
longer "full" like it was before.  

IN GCC 13 I will address this, but I thought you might be interested in trying
this patch against this PR.

In building 380 GCC source files, I see the following avg speedups
evrp : -22.57%
VRP2 : -5.4%
thread_jumps_full : -14.16%
total : -0.44%

So it is not insignificant.

It is likely to be most effective in large CFGs.
This is *total* compile time percent speed up for the 5 most significant cases:

expr.ii  -2.62%
lra-constraints.ii -3.75%
caller-save.ii -3.98%
reload.ii -4.04%
optabs.ii -5.05%

EVRP isolated speedups (yes, these are *percetage* speedup)
expr.ii -62.38
simplify-rtx.ii  -65.97
lra-constraints.ii  -67.87
reload.ii trunk  -68.67
caller-save.ii trunk  -71.93
optabs.ii trunk  -78.69

I think those times are probably worth the odd miss.

Anyway, next time you are checking performance for this PR maybe also try this
patch and see how it performs.

Reply via email to