Re: [PATCH 2/2] Use dominators to reduce ranger cache-flling.

Andrew MacLeod via Gcc-patches Mon, 06 Dec 2021 10:39:27 -0800

On 12/6/21 02:27, Richard Biener wrote:

On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:

When a request is made for the range of an ssa_name at some location,
the first thing we do is invoke range_of_stmt() to ensure we have looked
at the definition and have an evaluation for the name at a global
level.  I recently added a patch which dramatically reduces the call
stack requirements for that call.


Once we have confirmed the definition range has been set, a call is made
for the range-on-entry to the block of the use.  This is performed by
the cache, which proceeds to walk the CFG predecessors looking for when
ranges are created  (exported), existing range-on-entry cache hits,  or
the definition block. Once this list of  predecessors has been created,
a forward walk is done, pushing the range's through successor edges of
all the blocks  were visited in the initial walk.

If the use is far from the definition, we end up filling a lot of the
same value on these paths.  Also uses which are far from a
range-modifying statement push the same value into a lot of blocks.

This patch tries to address at least some inefficiencies.  It recognizes
that

First, if there is no range modifying stmt between this use and the last
range we saw in a dominating block, we can just use the value from the
dominating block and not fill in all the cache entries between here and
there.  This is the biggest win.

Second. if there is a range modifying statement at the end of some
block, we will have to do the appropriate cache walk to this point, but
its possible the range-on-entry to THAT block might be able to use a
dominating range, and we can prevent the walk from going any further
than this block

Combined, this should prevent a lot of unnecessary ranges from being
plugging into the cache.

ie, to visualize:

bb4:
    a = foo()
<..>
bb60:
     if (a < 30)
<...>
bb110:
      g = a + 10

if the first place we ask for a is in bb110, we walk the CFG from 110
all the way back to bb4, on all paths leading back. then fill all those
cache entries.
With this patch,
    a) if bb60 does not dominate bb110, the request will scan the
dominators, arrive at the definition block, have seen no range modifier,
and simply set the on-entry for 110 to the range of a. done.
    b) if bb60 does dominate 110, we have no idea which edge out of 60
dominates it, so we will revert to he existing cache algorithm.  Before
doing so, it checks and determines that there are no modifiers between
bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be
the range of a.   Now when we do the cache fill walk, it only has to go
back as far as bb60 instead of all the way to bb4.

Otherwise we just revert to what we do now (or if dominators are not
available).   I have yet to see a case where we miss something we use to
get, but that does not mean there isn't one :-).

The cumulative performance impact of this compiling a set of 390 GCC
source files at -O2 (measured via callgrind) is pretty decent:  Negative
numbers are a compile time decrease.  Thus -10% is 10% faster
compilation time.

EVRP     : %change from trunk is -26.31% (!)
VRP2     : %change from trunk is -9.53%
thread_jumps_full   : %change from trunk is -15.8%
Total compilation time  : %change from trunk is -1.06%

So its not insignificant.

Risk would be very low, unless dominators are screwed up mid-pass.. but
then the relation code would be screwed up too.

Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

OK.


Committed.


Wow - I didn't realize we have this backwards CFG walk w/o dominators ...
I wonder if you can add a counter to visualize places we end up using this path.

Well, its only does the fill now when there is range info located on anoutgoing edge of the dominator. Its still used, just on a much smallerportion of the graph.


We could do even better if we knew whether one edge was involved. ie

bb3:
a = foo()
if (a > 4)
   blah1;       bb4
else
   blah2;       bb5
bb6:
 = a;

The use of a in bb6 will look to bb3 as the dominator, but if it knewthat both edges are used in that dominance relation (ie, neitheroutgoing edge dominates bb6), it wouldn't have to calculate a range for'a'.


But as it stands, it cant really tell the above situation from:

bb3:
a = foo()
if (a > 4)
    = a        bb4

In this case, only the one edge is used, and we DO need to calculate arange for a. The edge from 3->4 does dominate bb4

In both cases, bb3 is the dominator, and bb3 can generate an outgoingrange, but only in one case do we need to calculate a range.

What would be really useful would be to be able to tell if an edgedominates a block :-)

I have thoughts on this for next release, and may overhaul the cache...but I don't think there is any such facility in the dominator subsystem?


Andrew

Re: [PATCH 2/2] Use dominators to reduce ranger cache-flling.

Reply via email to