Richard Biener <rguent...@suse.de> writes: > On Tue, 31 Aug 2021, guojiufu wrote: > >> On 2021-08-30 20:02, Richard Biener wrote: >> > On Mon, 30 Aug 2021, guojiufu wrote: >> > >> >> On 2021-08-30 14:15, Jiufu Guo wrote: >> >> > Hi, >> >> > >> >> > In patch r12-3136, niter->control, niter->bound and niter->cmp are >> >> > derived from number_of_iterations_lt. While for 'until wrap condition', >> >> > the calculation in number_of_iterations_lt is not align the requirements >> >> > on the define of them and requirements in determine_exit_conditions. >> >> > >> >> > This patch calculate niter->control, niter->bound and niter->cmp in >> >> > number_of_iterations_until_wrap. >> >> > >> >> > The ICEs in the PR are pass with this patch. >> >> > Bootstrap and reg-tests pass on ppc64/ppc64le and x86. >> >> > Is this ok for trunk? >> >> > >> >> > BR. >> >> > Jiufu Guo >> >> > >> >> Add ChangeLog: >> >> gcc/ChangeLog: >> >> >> >> 2021-08-30 Jiufu Guo <guoji...@linux.ibm.com> >> >> >> >> PR tree-optimization/102087 >> >> * tree-ssa-loop-niter.c (number_of_iterations_until_wrap): >> >> Set bound/cmp/control for niter. >> >> >> >> gcc/testsuite/ChangeLog: >> >> >> >> 2021-08-30 Jiufu Guo <guoji...@linux.ibm.com> >> >> >> >> PR tree-optimization/102087 >> >> * gcc.dg/vect/pr101145_3.c: Update tests. >> >> * gcc.dg/pr102087.c: New test. >> >> >> >> > --- >> >> > gcc/tree-ssa-loop-niter.c | 14 +++++++++++++- >> >> > gcc/testsuite/gcc.dg/pr102087.c | 25 +++++++++++++++++++++++++ >> >> > gcc/testsuite/gcc.dg/vect/pr101145_3.c | 4 +++- >> >> > 3 files changed, 41 insertions(+), 2 deletions(-) >> >> > create mode 100644 gcc/testsuite/gcc.dg/pr102087.c >> >> > >> >> > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c >> >> > index 7af92d1c893..747f04d3ce0 100644 >> >> > --- a/gcc/tree-ssa-loop-niter.c >> >> > +++ b/gcc/tree-ssa-loop-niter.c >> >> > @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *, >> >> > tree type, affine_iv *iv0, >> >> > affine_iv *iv1, class tree_niter_desc >> >> > *niter) >> >> > { >> >> > tree niter_type = unsigned_type_for (type); >> >> > - tree step, num, assumptions, may_be_zero; >> >> > + tree step, num, assumptions, may_be_zero, span; >> >> > wide_int high, low, max, min; >> >> > >> >> > may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base, >> >> > iv0->base); >> >> > @@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap (class loop *, >> >> > tree type, affine_iv *iv0, >> >> > low = wi::to_wide (iv0->base); >> >> > else >> >> > low = min; >> >> > + >> >> > + niter->control = *iv1; >> >> > } >> >> > /* {base, -C} < n. */ >> >> > else if (tree_int_cst_sign_bit (iv0->step) && integer_zerop >> >> > (iv1->step)) >> >> > @@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap (class loop *, >> >> > tree type, affine_iv *iv0, >> >> > high = wi::to_wide (iv1->base); >> >> > else >> >> > high = max; >> >> > + >> >> > + niter->control = *iv0; >> >> > } >> >> > else >> >> > return false; >> > >> > it looks like the above two should already be in effect from the >> > caller (guarding with integer_nozerop)? >> >> I add them just because set these fields in one function. >> Yes, they have been set in caller already, I could remove them here. >> >> > >> >> > @@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap (class loop *, >> >> > tree type, affine_iv *iv0, >> >> > niter->assumptions, assumptions); >> >> > >> >> > niter->control.no_overflow = false; >> >> > + niter->control.base = fold_build2 (MINUS_EXPR, niter_type, >> >> > + niter->control.base, >> >> > niter->control.step); >> > >> > how do we know IVn - STEP doesn't already wrap? >> >> The last IV value is just cross the max/min value of the type >> at the last iteration, then IVn - STEP is the nearest value >> to max(or min) and not wrap. >> >> > A comment might be >> > good to explain you're turning the simplified exit condition into >> > >> > { IVbase - STEP, +, STEP } != niter * STEP + (IVbase - STEP) >> > >> > which, when mathematically looking at it makes me wonder why there's >> > the seemingly redundant '- STEP' term? Also is NE_EXPR really >> > correct since STEP might be not 1? Only for non equality compares I may miss the question in previous mail. If STEP is not 1, NE_EXPR Would be still correct, because the niter is an integer, and the then after 'niter' iterations, the value should meet 'base + niter * STEP'.
BR, Jiufu. >> > the '- STEP' should matter? >> >> I need to add comments for this. This is a little tricky. >> The last value of the original IV just cross max/min at most one STEP, >> at there wrapping already happen. >> Using "{IVbase, +, STEP} != niter * STEP + IVbase" is not wrong >> in the aspect of exit condition. >> >> But this would not work well with existing code: >> like determine_exit_conditions, which will convert NE_EXP to >> LT_EXPR/GT_EXPR. And so, the '- STEP' is added to adjust the >> IV.base and bound, with '- STEP' the bound will be the last value >> just before wrap. > > Hmm. The control IV is documented as > > /* The simplified shape of the exit condition. The loop exits if > CONTROL CMP BOUND is false, where CMP is one of NE_EXPR, > LT_EXPR, or GT_EXPR, and step of CONTROL is positive if CMP is > LE_EXPR and negative if CMP is GE_EXPR. This information is used > by loop unrolling. */ > affine_iv control; > > but determine_exit_conditions seems to assume the IV does not wrap? > In fact determine_exit_conditions seems to just build ->base CMP bound > where bound is the IV bound biased by #unroll * step - step. So how > does biasing by step * 1 help? > > Does the control IV wrap in our case? > > Richard. > >> Thanks again for your review! >> >> BR. >> Jiufu >> >> > >> > Richard. >> > >> >> > + span = fold_build2 (MULT_EXPR, niter_type, niter->niter, >> >> > + fold_convert (niter_type, niter->control.step)); >> >> > + niter->bound = fold_build2 (PLUS_EXPR, niter_type, span, >> >> > + fold_convert (niter_type, >> >> > niter->control.base)); >> >> > + niter->bound = fold_convert (type, niter->bound); >> >> > + niter->cmp = NE_EXPR; >> >> > >> >> > return true; >> >> > } >> >> > diff --git a/gcc/testsuite/gcc.dg/pr102087.c >> >> > b/gcc/testsuite/gcc.dg/pr102087.c >> >> > new file mode 100644 >> >> > index 00000000000..ef1f9f5cba9 >> >> > --- /dev/null >> >> > +++ b/gcc/testsuite/gcc.dg/pr102087.c >> >> > @@ -0,0 +1,25 @@ >> >> > +/* { dg-do compile } */ >> >> > +/* { dg-options "-O3" } */ >> >> > + >> >> > +unsigned __attribute__ ((noinline)) >> >> > +foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n) >> >> > +{ >> >> > + while (n < ++l) >> >> > + *a++ = *b++ + 1; >> >> > + return l; >> >> > +} >> >> > + >> >> > +volatile int a[1]; >> >> > +unsigned b; >> >> > +int c; >> >> > + >> >> > +int >> >> > +check () >> >> > +{ >> >> > + int d; >> >> > + for (; b > 1; b++) >> >> > + for (c = 0; c < 2; c++) >> >> > + for (d = 0; d < 2; d++) >> >> > + a[0]; >> >> > + return 0; >> >> > +} >> >> > diff --git a/gcc/testsuite/gcc.dg/vect/pr101145_3.c >> >> > b/gcc/testsuite/gcc.dg/vect/pr101145_3.c >> >> > index 99289afec0b..40cb0240aaa 100644 >> >> > --- a/gcc/testsuite/gcc.dg/vect/pr101145_3.c >> >> > +++ b/gcc/testsuite/gcc.dg/vect/pr101145_3.c >> >> > @@ -1,5 +1,6 @@ >> >> > /* { dg-require-effective-target vect_int } */ >> >> > /* { dg-options "-O3 -fdump-tree-vect-details" } */ >> >> > + >> >> > #define TYPE int * >> >> > #define MIN ((TYPE)0) >> >> > #define MAX ((TYPE)((long long)-1)) >> >> > @@ -10,4 +11,5 @@ >> >> > >> >> > #include "pr101145.inc" >> >> > >> >> > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } >> >> > */ >> >> > +/* pointer size may not be vectorized, checking niter is ok. */ >> >> > +/* { dg-final { scan-tree-dump "Symbolic number of iterations is" >> >> > "vect" >> >> > } >> >> > } */ >> >> >> >>