Re: On-Demand range technology [2/5] - Major Components : How it works

Andrew MacLeod Tue, 04 Jun 2019 09:50:09 -0700

On 6/4/19 11:26 AM, Richard Biener wrote:

On Fri, May 31, 2019 at 5:40 PM Andrew MacLeod <amacl...@redhat.com> wrote:

On 5/29/19 7:15 AM, Richard Biener wrote:

On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <amacl...@redhat.com> wrote:

On 5/27/19 9:02 AM, Richard Biener wrote:

On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <amacl...@redhat.com> wrote:

The above suggests that iff this is done at all it is not in GORI because
those are not conditional stmts or ranges from feeding those.  The
machinery doing the use-def walking from stmt context also cannot
come along these so I have the suspicion that Ranger cannot handle
telling us that for the stmt following above, for example


     if (_5 != 0)

that _5 is not zero?

Can you clarify?

So there are 2 aspects to this.    the range-ops code for DIV_EXPR, if
asked for the range of op2 () would return ~[0,0] for _5.
But you are also correct in that the walk backwards would not find this.

This is similar functionality to how null_derefs are currently handled,
and in fact could probably be done simultaneously using the same code
base.   I didn't bring null derefs up, but this is a good time :-)

There is a separate class used by the gori-cache which tracks the
non-nullness property at the block level.    It has a single API:
non_null_deref_p (name, bb)    which determines whether the is a
dereference in any BB for NAME, which indicates whether the range has an
implicit ~[0,0] range in that basic block or not.

So when we then have

    _1 = *_2; // after this _2 is non-NULL
    _3 = _1 + 1; // _3 is non-NULL
    _4 = *_3;
...

when a on-demand user asks whether _3 is non-NULL at the
point of _4 = *_3 we don't have this information?  Since the
per-BB caching will only say _1 is non-NULL after the BB.
I'm also not sure whether _3 ever gets non-NULL during
non-NULL processing of the block since walking immediate uses
doesn't really help here?

presumably _3 is globally non-null due to the definition being (pointer
+ x)  ... ie, _3 has a global range o f ~[0,0] ?

No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.

I'm confused.

_1 was loaded from _2 (thus asserting _2 is non-NULL).  but we have no
idea what the range of _1 is, so  how do you assert _1 is [~0,0] ?

Bug on my part - it should offset _2:

  _1 = *_2; // after this _2 is non-NULL
  _3 = _2 + 1; // _3 is non-NULL
  _4 = *_3;

The only way I see to determine _3 is non-NULL  is through the _4 = *_3
statement.

So how do you derive a ~[0,0] _2 and thus know that *_3 does not trap?

well, we know *_2 is non-null on exit to the block. we wind backwards,so we know its non-null when we get to _3 = _2 + 1But as I will say later in this note and in the earlier discussion withjeff in this thread, the non-null property needs some work as I canrewrite this to make it wrong. :-P

There is an interesting difference between what we know about things ina forward walk, what we know in a backward walk, and what we know in adef-chain walk :-)

So this seems to be a fundamental limitation [to the caching scheme],
not sure if it is bad in practice.

Or am I missing something?

Not missing anything  The non-nullness property is maintains globally at
the basic block level.  both _1 and _3 are flagged as being non-null in
the block.  Upon exit, its a bit check.   If the global information does
not have the non-nullness property, then when a request is made for
non-nullness  and the def and the use are both within the same block,
and its flagged as being non-null in that block, then the request is
forced back to a quick walk between the def and the use to see if there
is any non-nulless introduced in between.   Yes, that makes it a linear
walk, but its infrequent, and often short.  to the best of our knowledge
at this point anyway :-)

So with the clarification above do we ever see that _3 is non-NULL?
I suppose the worker processing _3 = _1 + 1 would ask for
_1 non-nullness but we do not record any non-NULL-ness of _1 in
this basic-block (but only at its end).  Consider stmts

   _4 = (uintptr_t) _2;
   _5 = _6 / _4;
   _1 = *_2;
...

here at _1 we know _2 is not NULL.  But if we ask for non-NULLness
of _2 at the definition of _4 we may not compute ~[0, 0] and thus
conclude that _6 / _4 does not trap.

EVRP must look backwards to figure this out since the forward walk will
process _5 = _6 / _4 before it sees the dereference to _2... so how does
it know that _4 is non-zero without looking backwards at things after it
sees the dereference??  Does it actually do this?

It actually doesn't do this for divisions (but I have a patch lying somewhere).
During the forward walk when coming along _5 = _6 / _4; it would
in evrp_range_analyzer::record_ranges_from_stmt via infer_value_range
push a new value-range for _4.  To infer the same for _2 it would need to
look backward which I think it doesn't do right now but it easily could
(I'm just trying to make up examples to understand ranger better).
When visiting _1 = *_2 we then know _2 is not NULL.

There may be more "useful" cases, for example we could derive
ranges for shift operands based on undefinedness and their range
may be useful for simplifying adjacent stmts.

I'm trying to discover how ranger appearantly recording ranges only
at BB boundaries interacts with ranges that become "live" at
some point inside the BB.

The ranger records that a name becomes non-zero somewhere within ablock. this is then propagated to other following blocks.

The case where there is work to do is when a name is tagged as having anon-zero reference in the block AND

the use being queried is not the non-zero reference, AND
  a) the range did contain zero before the block OR
  b) the definition is within the block

Then the only way it can determine if any given use is non-zero is towalk from the use back towards the def/start of the block looking to seeif we encounter the non-zero use along the way.

This thread contains a longer discussion with jeff about the block levelinfo in the presence of threading as well where we concluded it needssome extra thought.

At the moment, there is not a perfect solution for this. the inferredvalues by unrelated statements is clearly not the strongest aspect ofthe backwards walking ranger which is why it needs something like thenon-null infrastructure to get these cases..

Its clearly a far more frequent thing for pointers than for otherintegrals, and maybe there is something better than can be done for them.

stmt-level tracking of ranges are sometimes important.  This is
something the machinery cannot provide - correct?  At least not
optimistically enough with ranges derived about uses.

Maybe I'm the one missing something, but in the absence of statement
level exception throwing via 'can_throw_non_call_exceptions' being true,
any assertion made anywhere in the block to an ssa_name applies to the
entire block does it not?  ie it doesn't matter if the deference
happens first thing in the block or last thing, its not going to change
its value within the block.. its going to be non-null throughout the
entire block.

Well, one complication I had with the division by zero patch is that
for _2 = _1 / _3 you can derive that _3 is not zero _after_ the stmt
but you may of course not assume it is not zero during evaluation
of the stmt because that may lead you to assume it may not trap
and thus move it to a place it was not executed previously.

yeah, during the reverse walk we can assume that when we reach thispoint its non zero, but the inferred values like this and *p can not beassumed ON the statement.

I guess thats would have to be a property of inferred values.

so if one statement in the block asserts that references to _2 are
non-null, we can assert that all references to _2 in the block are
non-null. Meaning we get all these cases by knowing that the specified
name is non-zero through-out the block.  This also means we could know
things earlier in the block than a forward walk would provide.

So with the 2 examples:

    _1 = *_2; // after this _2 is non-NULL
    _3 = _1 + 1;
    _4 = *_3;



both _2 and _3 are flagged as non-null in the block due to the
de-references.  Im not sure what we know about _1 from above, but we do
know
    ~[0,0] = _1 + 1  .  so whatever that tells you , if anything, about
_1 we know.  it seems to me _1 is ~[-1,-1] based on that...

Regardless, I think we know all the same things EVRP does.

likewise

   _4 = (uintptr_t) _2;
   _5 = _6 / _4;
   _1 = *_2;

_2 will be non-null in the entire block,  so _4 must also be non-null
and we can conclude that the divide does not trap.

And move the divide up like so?

  _4 = (uintptr_t) _2;
  _5 = _6 / _4;
   if (flag)
    {
     _1 = *_2;
...
    }

most definitely not.

well you are in a different block and there is nothing in the firstblock that says its non-null now, just the TRUE side of the if block...

but I get the point I think you are trying to make, and back to thediscussion with jeff I think we've concluded that the non-null-ref inblock needs some work to be fully viable :-)


now, when we set the ' can_throw_non_call_exceptions' flag, then we'd
have to resort to statement walks,  and we cannot determine that _5 does
not trap anyway.   EVRP is in the same boat.. It doesn't know its not
going to trap either because we may never get to the *_2..

The point is not that we know *_2 does not trap - we don't!  We just know
that if the program reaches the next stmt it did not and thus _2 was not NULL
_for the stmts following the dereference_.

Note I'm not sure it will make a difference in practice since any transform
on a stmt after *_2 relying on the above creates constraints on stmt ordering
which we cannot represent in the IL and thus the result is likely going to be
very fragile.

As said, I was just wondering if the non-NULL machinery you have
integrates with the range machinery or is really independent
(thus the pointer derived from the non-NULL one due to the dereference
case above).

It is currently an independent class that is invoked by the ranger whenevaluating the range on an outgoing edge to determine if non-null shouldbe applied to the range

It looks like I disabled any checks within the block, so at the momentit wont apply any non-nullness within the block unless it was live onentry. I see some potential issues even with the live on exit valuebeing propagated back up into the block if the value wasnt also non-zeroon entry. I think my current implementation of non-nullness is onlysafe to use when the value is non-zero throughout the block.

yes, compile-time complexity is from empirical speed timings and
theory-crafting from the algorithms,  and that the on-entry cache
prevents multiple passes over things.

we have not done a memory analysis yet, not done anything to see if we
can improve it.
It makes very heavy use of bitmaps, which are typically quite sparse.
The on-entry cache is a vector of pointers to a range, initially 0, and
we allocate ranges as needed.   There will be an on-entry range entry
for any ssa-name which has been queried between the query point and the
definition.

So that's similar to LIVE-on-entry thus a SSA name with a range
will have an on-entry range entry on each BB it dominates?
That makes storage requirement quadratic in the worst case.

Yes, assuming a request has been made for all ssa-names everywhere they
are live.

You did propose to replace [E]VRP with a walk over the whole function
querying ranges and simplifying stmts, did you?

yes, for the case of EVRP.  But not all use cases query every range
everywhere.

its also a bit lower than that since we cache certain types of ranges.
we cache a range for varying (or range for type if you prefer)  of
whatever the ssa-name type is.  so if the range is varying everywhere,
we actually only have once instance of the range rather than N of them.
So any name that doesn't have a range reduction anywhere will only
create a single range instance for the entire CFG.

But you still have a reference to the range in evry BB dominated by the
definition?

every range dominated by the definition that leads to a use, yes. Sincewe work bottom up, the live-on-exit property is implicit so once it goesout of liveness there are no more references. so anything that isdefined, used in the block and never used again will never be in the cache.

  I think thats the
most common value, so that should reduce a large number of them.   I've
also considering caching ranges like we cache tree constants... but I
haven't delved into that.  I figured if memory turns out to be a
problem, then we'll look at it then.

Andrew

Re: On-Demand range technology [2/5] - Major Components : How it works

Reply via email to