Re: A + B CMP A -> A CMP' CST' match.pd patterns [was [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)]

Jeff Law Mon, 23 Jan 2017 16:04:39 -0800

On 01/21/2017 01:00 AM, Marc Glisse wrote:

On Fri, 20 Jan 2017, Jeff Law wrote:

On 01/20/2017 04:30 PM, Jeff Law wrote:

Going to work from the self-contained test...


Here's a test case that's closer to the one from the bug.  It also
ends up with the out of bounds memset even at -O1, during PRE.

typedef __SIZE_TYPE__ size_t;

struct S
  int *p0, *p1, *p2;

  size_t size () const { return p1 - p0; }

  void f (size_t n) {
    if (n > size ())       // can't happen because
      foo (n - size ());   //   n is in [1, MIN(size() - 1, 3)]
    else if (n < size ())
      bar (p0 + n);
  }

  void foo (size_t n)
  {
    size_t left = (size_t)(p2 - p1);
    if (left >= n)
      __builtin_memset (p2, 0, n * sizeof *p2);
  }

  void bar (int*);
};

void f (S &s)
{
  size_t n = s.size ();
  if (n > 1 && n < 5)
    s.f (n - 1);
}



Looking at the .vrp1 dump for this test we find:

;;   basic block 3, loop depth 0, count 0, freq 3664, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 [36.6%]  (TRUE_VALUE,EXECUTABLE)
  _18 = ASSERT_EXPR <_13, _13 + 18446744073709551614 <= 2>;
  _2 = _18 + 18446744073709551615;
  if (_2 > _18)
    goto <bb 4>; [50.00%]
  else
    goto <bb 6>; [50.00%]

_2 > _18 is the same as

(_18 - 1) > _18

Which is only true if _18 is zero.  And since _18 has a computed range
of [2..4], that can never happen.

VRP doesn't currently handle this case, though we do have some code to
turn relational comparisons into equality comparisons.  If I manually
turn the relational into _18 == 0 at that point, then the next VRP pass
is able to optimize the conditional away which in turn eliminates the
problematical memset call & warning.

So one possible approach would be to treat simplify_cond_using_ranges to
simplify that kind of condition.  I don't know if that would fix the
reported testcase as I haven't looked at it directly under the debugger
yet.

In fact, match.pd already has a pattern to handle a relational like
(X + -1) > X

The pattern doesn't fire because of a single_use guard.   This was
discussed back in 2015 with Jakub advocating for a single_use guard,
but Marc leaning the other direction, but not enough to argue too much
about it.

In early 2016 Marc actually submitted the match.pd patch with the
single use guard per Jakub's preference.

The single use guard checks the SSA_NAME holding the (X + CST)
expression for single use.   It's not entirely clear why we check for
single use.  If it's because of object lifetimes, then that's a total
red herring here.

Given something like:
x = a + c
if (a > x)

The transformation into

x = a + c;
if (a < c')

Where x has multiple uses, the transformation will not inherently
change the lifetime of "x" worse (if the conditional was the last use,
then the lifetime of x may shorten somewhat).  If there were uses
after the condition, the lifetime of "x" remains unchanged.


I don't remember the conversation and I don't have experience with
register pressure. My assumption would be that the issue is with the new
constant c'. In the first case, during the comparison, only a and x are
alive, while in the second case, c' is as well, which may use an extra
register. Does that make sense?

(I don't know if that's bad enough to inhibit the simplification)

Or are we talking about the transformation that has this comment in
match.pd:

   Currently restricted to single use so as not to interfere too much with
   ADD_OVERFLOW detection in tree-ssa-math-opts.c.

That detection logic needs to be improved anyway (see also a discussion
with Vincent Lefevre yesterday on gcc-help, users do write the
simplified form directly), the single_use is supposedly only there until
that is done (really depends on how well the improved logic works...).

"a" was already live from the assignment to at least the conditional.
But again, we haven't inherently changed its lifetime.

However, what can happen is after transformation, the assignment might
sink to a later use point in the CFG, which might lengthen the
lifetime of "a", but it's also shortening the lifetime of "x" by the
same amount.

So, ultimately I don't buy the argument that we should inhibit this
based on register lifetime issues.

Jakub, perhaps you remember why you wanted this restricted to cases
where "x" has a single use?

So I went and looked at the match_uaddsub_overflow bits for a while.AFAICT it appears to be trying to allow us to do the arithmetic andoverflow status computation once and CSE the result. At the singleinstance we would generate a uaddv4 insn which does the arithmetic and jump.

ISTM the real value here is in the initial RTL generation. DOM ought tobe simplifying the arithmetic CSEs and dominated tests. Threadingshould pick up the most common non-dominated tests.

Converting into an EQ/NE test against zero for the special case shouldstill preserve those optimizations (and in fact makes the threadingeasier to discover).

So it's really a matter of do we do it directly in VRP, or in match.pd.The former is more effective, the latter is cleaner.


I believe we are supposed to call match.pd from VRP eventually (though
that may not have happened yet), so the test could probably also be done
there, if that looks more convenient and we decide not to remove
single_use completely for this transformation.

That was my thought as well, but AFAICT we only call into match.pd fromVRP if we changed the insn. There's no systematic calling intomatch.pd from most passes. Thus, we don't do the transformation untilfairly late in the pipeline when it's in match.pd.


Jeff

Re: A + B CMP A -> A CMP' CST' match.pd patterns [was [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)]

Reply via email to