On Tue, 25 Jan 2022, Jiufu Guo wrote:

> Jiufu Guo <guoji...@linux.ibm.com> writes:
> 
> > Richard Biener <rguent...@suse.de> writes:
> >
> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
> > ...
> >>
> >>> -      /* No need to check sign of the new step since below code takes 
> >>> care
> >>> -  of this well.  */
> >>> +      /* Like cases shown in PR100740/102131, negtive step is not safe.  
> >>> */
> >>> +      if (tree_int_cst_sign_bit (step))
> >>> + return false;
> >>> +
> >>>        if (code != NE_EXPR
> >>>     && (TREE_CODE (step) != INTEGER_CST
> >>>         || !iv0->no_overflow || !iv1->no_overflow))
> >>
> >> I think for NE_EXPR the sign is not relevant.  I think the key is
> >> that we require that iv0->no_overflow and for non-equality checks
> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
> >> does not cause any overflow on either side.  On the LHS this is
> >
> > Hi Richard,
> >
> > Thanks a lot for your comments and ideas!
> >
> > Right! The adjusting is safe only if we can make sure there is
> > no overflow/wrap on either side or say there is no overflow/wrap
> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.

The point is that we may not change the iteration number at which
overflow occurs since that alters the result of the < compare.
Only if we know there is no overflow with the present expression
during the loop evaluation we can do the transform and then only
if we do not introduce overflow.

We are basically doing the transform that fold_comparison
in fold-const.cc does:

  /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
     X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
     the resulting offset is smaller in absolute value than the
     original one and has the same sign.  */
  if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
      && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
...

> > Or it may also ok if we can compute an assumption, under which
> > all three ivs are not overflowed/wrapped.
> >
> >> only guaranteed if the absolute value of C1 - C2 is smaller than
> >> that of C1 and if it has the same sign.
> > I'm thinking this in another way:
> > When trying to do this transform in number_of_iterations_cond,
> > GT/GE is inverted to LT/LE, then the compare code would be:
> > LT/LE or NE.
> >
> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
> > chasing iv1.  We would able to assume X < Y (may_be_zero would
> > be set later via number_of_iterations_lt/le).
> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
> > This indicates, if the sign of (C1 - C1) is negative, then the
> > transform would be incorrect.
> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
> > iv1 and combined iv) are not wrapped.
> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
> There is still some cases: iv0 step is too large, then iv0 wraps
> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
> 
> > For C1 < 0, {X,C1} should not down cross MIN
> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
> 
> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
> > it would be almost safe.
> For this case, we may still add the assumption to avoid wraping
> at the first iteration.
> 
> BR,
> Jiufu
> 
> >
> > For NE, it seems more interesting. The transformation depends
> > on 3 things: 1. the relation between X and Y; 2 the sign
> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
> > The 3rd one may be more special.
> > The good news is, number_of_iterations_ne seems able to handle NE.
> >
> >>
> >> With the same reasoning we then know the new IV0 doesn't overflow.
> >>
> >> So something like the following.  IIRC I've proposed sth similar
> >> a while back.  I'm going to give it some testing, collecting
> >> testcases from related PRs.
> >>
> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> >> index b767056aeb0..74fa4f66ee2 100644
> >> --- a/gcc/tree-ssa-loop-niter.cc
> >> +++ b/gcc/tree-ssa-loop-niter.cc
> >> @@ -1890,17 +1890,28 @@ number_of_iterations_cond (class loop *loop,
> >>        tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
> >>                                            iv0->step, iv1->step);
> >>  
> >> -      /* No need to check sign of the new step since below code takes 
> >> care
> >> -        of this well.  */
> >> -      if (code != NE_EXPR
> >> -         && (TREE_CODE (step) != INTEGER_CST
> >> -             || !iv0->no_overflow || !iv1->no_overflow))
> >> -       return false;
> >> +      /* For code other than NE_EXPR we have to ensure moving the 
> >> evolution
> >> +        of IV1 to that of IV0 does not introduce overflow.  */
> >> +      if (TREE_CODE (step) != INTEGER_CST
> >> +         || !iv0->no_overflow || !iv1->no_overflow)
> >> +       {
> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
> > the computation logic of no_overflow is related to the type of IV.  If the
> > type of IV is signed, the C semantics may be used, overflow in signed
> > IV are treated UB, and then no_overflow would be true.
> >
> > For unsigned IV, no_overflow would be false, even for the cases which
> > looks like:
> > "{10, 2} < {20, 1}", which would be ok to compute niter.

IIRC no_overflow is determined by SCEV which might also use niter
analysis.  For the case of {10, +2} < {20, +1} there is no need to
compute it as {10, +1} < 20 and we hopefully deal with this in
other code paths (in fact with base and step all constant we
can simply solve the linear equation for 'n' - maybe that's a
capability we should add to number_of_iterations_cond).

Running a small example through the debugger I can indeed see
no_overflow = true for this case (after a few times == false),
so we do have some ways of determining there is no overflow for this
case.

I do not think we should try to add special-casings to this very
generic transform.

I will look at the regression that was pointed out now.

Richard.

Reply via email to