Richard Biener <rguent...@suse.de> writes:

> On Tue, 25 Jan 2022, Jiufu Guo wrote:
>
>> Jiufu Guo <guoji...@linux.ibm.com> writes:
>> 
>> > Richard Biener <rguent...@suse.de> writes:
>> >
>> >> On Thu, 13 Jan 2022, Jiufu Guo wrote:
>> > ...
>> >>
>> >>> -      /* No need to check sign of the new step since below code takes 
>> >>> care
>> >>> -         of this well.  */
>> >>> +      /* Like cases shown in PR100740/102131, negtive step is not safe. 
>> >>>  */
>> >>> +      if (tree_int_cst_sign_bit (step))
>> >>> +        return false;
>> >>> +
>> >>>        if (code != NE_EXPR
>> >>>            && (TREE_CODE (step) != INTEGER_CST
>> >>>                || !iv0->no_overflow || !iv1->no_overflow))
>> >>
>> >> I think for NE_EXPR the sign is not relevant.  I think the key is
>> >> that we require that iv0->no_overflow and for non-equality checks
>> >> adjusting X + C1 < Y + C2 to X + C1 - C2 < Y is only valid if that
>> >> does not cause any overflow on either side.  On the LHS this is
>> >
>> > Hi Richard,
>> >
>> > Thanks a lot for your comments and ideas!
>> >
>> > Right! The adjusting is safe only if we can make sure there is
>> > no overflow/wrap on either side or say there is no overflow/wrap
>> > on three 'iv's: {X,C1}, {Y,C2} and {X, C1 - C2}.
Hi Richard,

Thanks for your quickly reply, and patient review!
>
> The point is that we may not change the iteration number at which
> overflow occurs since that alters the result of the < compare.
> Only if we know there is no overflow with the present expression
> during the loop evaluation we can do the transform and then only
> if we do not introduce overflow.
Exactly, this is also what I mean :)

>
> We are basically doing the transform that fold_comparison
> in fold-const.cc does:
>
>   /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>      X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>      the resulting offset is smaller in absolute value than the
>      original one and has the same sign.  */
>   if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
>       && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
> ...
>
This transform seems not the scenario which we are care about in
number_of_iterations_cond.
For example, 'X + 1 < Y + 4' ==> 'X < Y + 3' would be correct if
no overflow happen.
But for loop, the niter for '{X, 1} < {Y, 4}' would be totally
different with niter for '{X, 0} < {Y, 3}'.
for (iv0 = X, iv1 = Y; iv0 < iv1; iv0 += 1, iv1 += 4)
   in this loop, iv1 walks to overflow faster, step is 4.
vs.
for (iv0 = X, iv1 = Y; iv0 < iv1; iv1 += 3) (iv1 overflow slow)
   in this loop, iv1 overflows slower, step is 3.

Actually, the transformation 'X + 1 < Y + 4' ==> 'X < Y + 3',
may not always correct.  e.g. for below code, X=6, and Y=2147483645
it may output "b0 + 1 < b1 + 4 is true".
```c
int __attribute__ ((noinline))
foo (int b0, int b1)
{
  return __builtin_printf ("b0 + 1 < b1 + 4 is %s\n",
                           b0 + 1 < b1 + 4 ? "true" : "false");
}

int main(int argc, char **argv)
{
  if (argc < 2)
    return -1;
  int b0 = atoi(argv[1]);
  int b1 = atoi(argv[2]);  
  return foo (b0, b1);
}
```
>> > Or it may also ok if we can compute an assumption, under which
>> > all three ivs are not overflowed/wrapped.
>> >
>> >> only guaranteed if the absolute value of C1 - C2 is smaller than
>> >> that of C1 and if it has the same sign.
>> > I'm thinking this in another way:
>> > When trying to do this transform in number_of_iterations_cond,
>> > GT/GE is inverted to LT/LE, then the compare code would be:
>> > LT/LE or NE.
>> >
>> > For LT/LE, like {X, C1} < {Y, C2}, we can look it as iv0 is
>> > chasing iv1.  We would able to assume X < Y (may_be_zero would
>> > be set later via number_of_iterations_lt/le).
>> > 1. If C1 < C2, iv0 can never catch up iv1. For examples:
>> > {X, 1} < {Y, 2}; {X, -2} < {Y, -1}; {X, -2} < {Y, 1}.
>> > And there must be at least one overflow/wrap in iv0,iv1, or iv.
>> > This indicates, if the sign of (C1 - C1) is negative, then the
>> > transform would be incorrect.
>> > 2. If C1 > C2, we still need to make sure all the ivs (iv0,
>> > iv1 and combined iv) are not wrapped.
>> > For C2 > 0, {Y,C2} should not cross MAX before {X, C1} catch up.
>> >    the assumption may like : (MAX-Y)/C2 > (Y-X)/(C1-C1)
>> There is still some cases: iv0 step is too large, then iv0 wraps
>> first, e.g. {MAX-5, 10} < {MAX-3, 1}. For this, the assumption
>> would need to and with (MAX-X)/C1 > (Y-X)/(C1-C1).
>> 
>> > For C1 < 0, {X,C1} should not down cross MIN
>> >    the assumption may like : (X-MIN)/-C1 > (Y-X)/(C1-C1)
>>   Also add the assumption: (Y-MIN)/-C2 > (Y-X)/(C1-C1)
>> 
>> > For C1 > 0 and C2 < 0, iv0 and iv1 are walking to each other,
>> > it would be almost safe.
>> For this case, we may still add the assumption to avoid wraping
>> at the first iteration.
>> 
>> BR,
>> Jiufu
>> 
>> >
>> > For NE, it seems more interesting. The transformation depends
>> > on 3 things: 1. the relation between X and Y; 2 the sign
>> > of (C1-C2); 3. if iv0 and iv1 can be equal finally.
>> > The 3rd one may be more special.
>> > The good news is, number_of_iterations_ne seems able to handle NE.
>> >
>> >>
>> >> With the same reasoning we then know the new IV0 doesn't overflow.
>> >>
>> >> So something like the following.  IIRC I've proposed sth similar
>> >> a while back.  I'm going to give it some testing, collecting
>> >> testcases from related PRs.
>> >>
>> >> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
>> >> index b767056aeb0..74fa4f66ee2 100644
...
>> >> +      if (TREE_CODE (step) != INTEGER_CST
>> >> +         || !iv0->no_overflow || !iv1->no_overflow)
>> >> +       {
>> > I was also trying to leverage no_overflow of iv0 and iv1. While it seems
>> > the computation logic of no_overflow is related to the type of IV.  If the
>> > type of IV is signed, the C semantics may be used, overflow in signed
>> > IV are treated UB, and then no_overflow would be true.
>> >
>> > For unsigned IV, no_overflow would be false, even for the cases which
>> > looks like:
>> > "{10, 2} < {20, 1}", which would be ok to compute niter.
>
> IIRC no_overflow is determined by SCEV which might also use niter
> analysis.  For the case of {10, +2} < {20, +1} there is no need to
> compute it as {10, +1} < 20 and we hopefully deal with this in
> other code paths (in fact with base and step all constant we
> can simply solve the linear equation for 'n' - maybe that's a
> capability we should add to number_of_iterations_cond).

Thanks for point this out.
Yes, for const base(s) and step(s), we have other code path
to deal with (e.g. loop_niter_by_eval).

For {10, +2}, what I really mean is about the no_overflow iv(s)
on unsigned.  Sorry the misleading words.
For no_overflow, it is set at some places, including
number_of_iterations_xxx. :),  Before number_of_iterations_xxx,
no_overflow could be calculated in simple_iv_with_niters:
```c
  iv->no_overflow = !folded_casts && nowrap_type_p (type);
```
nowrap_type_p checks if overflow is UB on type through macro
TYPE_OVERFLOW_UNDEFINED.  For signed, it is UB; for unsigned
it is different.

For example as below code, no_overflow is set as false for iv0
and iv1, and then niter was not computed quickly.
```c
unsigned __attribute__ ((noinline))
foo (unsigned b0, unsigned b1)
{
  unsigned n = 0;
  for (; b0 < b1; b0 += 2, b1 += 1)
    n++;
  return n;
}

int main()
{
  return foo (10, 20);
}
```

>
> Running a small example through the debugger I can indeed see
> no_overflow = true for this case (after a few times == false),
> so we do have some ways of determining there is no overflow for this
> case.
>
> I do not think we should try to add special-casings to this very
> generic transform.
>
> I will look at the regression that was pointed out now.
You may talking about pr81196.c, it seems relate to
"{p0, 1} < {p1, -1}".  It is not transformed to
"{p0, 2} < {p1,0}" anymore, and niter is not computed, then
vectorization does not happen.

BR,
Jiufu

>
> Richard.

Reply via email to