https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734

--- Comment #20 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
  <bb 21> [local count: 21718864]:
...
  _54 = (short unsigned int) bits_106;
  _26 = _54 >> 9;
  _88 = _139 + 7;
  _89 = _88 & 7;
  _111 = _26 + 10;

  <bb 22> [local count: 181308616]:
  # i_66 = PHI <i_69(36), i_146(21)>
  # parse_tables_n_lsm.141_87 = PHI <_29(36), _111(21)>
  i_69 = i_66 + 1;
  table[i_66] = 0;
  _29 = parse_tables_n_lsm.141_87 + 65535;
  if (parse_tables_n_lsm.141_87 != 0)
    goto <bb 23>; [94.50%]
  else
    goto <bb 25>; [5.50%]

  <bb 23> [local count: 171336643]:
  if (i_69 != 306)
    goto <bb 36>; [93.84%]
  else
    goto <bb 24>; [6.16%]

  <bb 36> [local count: 160784290]:
  goto <bb 22>; [100.00%]

is the relevant part of the loop.

At the start of vect we determine:

misc.c:147:31: note:    All loop exits successfully analyzed.
misc.c:147:31: note:   Symbolic number of iterations is 306 - (unsigned int)
i_146
Creating dr for table[i_66]

but when we get to vect_transform_loop we've updated
loop->nb_iterations_upper_bound to be 137.

>>> p (int)loop->nb_iterations_upper_bound
$2 = 137

SCEV claims:

  result:
    # of iterations _26 + 10, bounded by 137
(instantiate_scev
  (instantiate_below = 21 -> 22)
  (evolution_loop = 4)
  (chrec = _26 + 10)
  (res = _26 + 10))
Statement (exit)if (parse_tables_n_lsm.141_87 != 0)
 is executed at most _26 + 10 (bounded by 137) + 1 times in loop 4.

and further down

Statement table[i_66] = 0;
 is executed at most 305 (bounded by 305) + 1 times in loop 4.

So it looks like we determine the latch will execute a maximum of 305 times,
but that the early exit is only executed 137 times.
Which means the loop is bounded by 137 iterations.

This looks like it's because _111 is unsigned short, so (0xFFFF >> 9) + 10 ==
137

This is fine, but when during vect_transform_loop when we update the bounds we
do:

wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
                             lowest_vf) - 1);

so we end up doing ((137 + 1) / 8) - 1 which is bogus.  This results in 16
iterations
while we know the vector loop must do 17.

The additional -1 is because the code assumes that the niters bounds is coming
from the latch iteration count, however in this case it's coming from another
exit and the latch never executes.

I think for early exits we should take the ceil instead, since we have cases
like this where an early exit restricts the loop's iteration count, but is not
choosen as the main exit.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 854e9d78bc7..0b1656fef2f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12171,7 +12171,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple
*loop_vectorized_call)
   /* True if the final iteration might not handle a full vector's
      worth of scalar iterations.  */
   bool final_iter_may_be_partial
-    = LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo);
+    = LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
   /* The minimum number of iterations performed by the epilogue.  This
      is 1 when peeling for gaps because we always need a final scalar
      iteration.  */

Fixes it, but this doesn't feel entirely right to me.  Because in particular it
would have still been wrong if the nbounds on (parse_tables_n_lsm.141_87 != 0)
was 136, since there's no decimal after the divide there we'd end up at 16
again.

So I think we need to know whether the bounds is coming from the loop exit or
an alternate exit and not decrement by -1 if the bounds limit does not come
from the latch.

Reply via email to