On 9/18/23 02:53, Richard Biener wrote:
On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod <amacl...@redhat.com> wrote:
Ive been looking at __builtin_unreachable () regressions.  The
fundamental problem seems to be  a lack of consistent expectation for
when we remove it earlier than the final pass of VRP.    After looking
through them, I think this provides a sensible approach.

Ranger is pretty good at providing ranges in blocks dominated by the
__builtin_unreachable  branch, so removing it isn't quite a critical as
it once was.  Its also pretty good at identifying what in the block can
be affected by the branch.

This patch provide an alternate removal algorithm for earlier passes.
it looks at *all* the exports from the block, and if the branch
dominates every use of all the exports, AND none of those values access
memory, VRP will remove the unreachable call, rewrite the branch, update
all the values globally, and finally perform the simple DCE on the
branch's ssa-name.   This is kind of what it did before, but it wasn't
as stringent on the requirements.

The memory access check is required because there are a couple of test
cases for PRE in which there is a series of instruction leading to an
unreachable call, and none of those ssa names are ever used in the IL
again. The whole chunk is dead, and we update globals, however
pointlessly.  However, one of ssa_names loads from memory, and a later
passes commons this value with a later load, and then  the unreachable
call provides additional information about the later load.    This is
evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this
situation is to not remove the unreachable if there is a load feeding it.

What this does is a more sophisticated version of what DOM does in
all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have
to be single use, but they do have to be dominated by the branch or be
single use within the branches block..

If there are multiple uses in the same block as the branch, this does
not remove the unreachable call.  If we could be sure there are no
intervening calls or side effects, it would be allowable, but this a
more expensive checking operation.  Ranger gets the ranges right anyway,
so with more passes using ranger, Im not sure we'd see much benefit from
the additional analysis.   It could always be added later.

This fixes at least 110249 and 110080 (and probably others).  The only
regression is 93917 for which I changed the testcase to adjust
expectations:

// PR 93917
void f1(int n)
{
    if(n<0)
      __builtin_unreachable();
    f3(n);
}

void f2(int*n)
{
    if(*n<0)
      __builtin_unreachable();
    f3 (*n);
}

We were removing both unreachable calls in VRP1, but only updating the
global values in the first case, meaning we lose information.   With the
change in semantics, we only update the global in the first case, but we
leave the unreachable call in the second case now (due to the load from
memory).  Ranger still calculates the contextual range correctly as [0,
+INF] in the second case, it just doesn't set the global value until
VRP2 when it is removed.

Does this seem reasonable?
I wonder how this addresses the fundamental issue we always faced
in that when we apply the range this range info in itself allows the
branch to the __builtin_unreachable () to be statically determined,
so when the first VRP pass sets the range the next pass evaluating
the condition will remove it (and the guarded __builtin_unreachable ()).

In principle there's nothing wrong with that if we don't lose the range
info during optimizations, but that unfortunately happens more often
than wanted and with the __builtin_unreachable () gone we've lost
the ability to re-compute them.

I think it's good to explicitly remove the branch at the point we want
rather than relying on the "next" visitor to pick up the global range.

As I read the patch we now remove __builtin_unreachable () explicitly
as soon as possible but don't really address the fundamental issue
in any way?


I think it pretty much addresses the issue completely.  No globals are updated by the unreachable branch unless it is removed.  We remove the unreachable early ONLY if every use of all the exports is dominated by the branch...    with the exception of a single use in the block used to define a different export. and those have to all have no other uses which are not dominated.  ie

 <bb 2> [local count: 1073741824]:
  y_2 = x_1(D) >> 1;
  t_3 = y_2 + 1;
  if (t_3 > 100)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  __builtin_unreachable ();

  <bb 4> [local count: 1073741824]:
  func (x_1(D), y_2, t_3);


In this case we will remove the unreachable call because we can provide an accurate global range for all values used in the definition chain for the program.

Global Exported (via early unreachable): x_1(D) = [irange] unsigned int [0, 199] MASK 0xff VALUE 0x0 Global Exported (via early unreachable): y_2 = [irange] unsigned int [0, 99] MASK 0x7fffffff VALUE 0x0 Global Exported (via early unreachable): t_3 = [irange] unsigned int [1, 100]


If conditions are not satisfied to do this early, we do not et ANY of the global values, and leave the unreachable alone.  ie. if that was fed from a memory load instead of a parameter:


 <bb 2> [local count: 1073741824]:
  x.0_1 = x;
  y_3 = x.0_1 >> 1;
  t_4 = y_3 + 1;
  if (t_4 > 100)
    goto <bb 3>; [0.00%]
  else
    goto <bb 4>; [100.00%]

  <bb 3> [count: 0]:
  __builtin_unreachable ();


we no longer remove the unreachable call, and do not sety any global values as a result.   that means a following pass is not goign to see the condition as foldable because we dont know anything about t_4's range globally.

We still contextually know via ranger that after the unreachable call we have those ranges:

=========== BB 4 ============
x.0_1   [irange] unsigned int [0, 199] MASK 0xff VALUE 0x0
y_3     [irange] unsigned int [0, 99] MASK 0x7fffffff VALUE 0x0
t_4     [irange] unsigned int [1, 100]
    <bb 4> [local count: 1073741824]:
    func (x.0_1, y_3, t_4);

but even ranger will not apply those ranges at the branch location.

The same thing happen if there is any other use of x.0_1, y_3 or t_4 that isnt dominated by the branch.. we leave it alone.

So I don't think we lose any information... we just only ever move it to the global range early IFF we know that range applies to every export, and there is no chance there will be any commoning opportunities due to memory loads.

I dont see anything in the early VRP processing now that would allow a later pass to remove the unreachable unless it does its own analysis like DOM might do.

Andrew

Reply via email to