https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #3 from alalaw01 at gcc dot gnu.org ---
So in the not-vectorized case (-DFOO=1), we get for the inner loop:

<bb 6>:
  # i_27 = PHI <i_22(5), i_16(7)>
  _8 = (long unsigned int) i_27;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _13 = *_11;
  _14 = _13 + j_23;
  *_11 = _14;
  i_16 = i_27 + 1;
  if (i_16 <= max_24)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  goto <bb 6>;

  <bb 8>:
  # i_32 = PHI <i_16(6)>

the loop exit phi, i_32=PHI<i_16(6)>, makes i_16=i_27+1 relevant
(vec_stmt_relevant_p: used out of loop.), so we go through that on the worklist
and then i_27=PHI<i_22(5),i_16(7)>, marking the phi as STMT_VINFO_LIVE_P, and
hence "not vectorized: value used after loop". Kind of as expected, FORNOW.

In the -DFOO=0 case, a bunch of loop peeling, header-copying, and other
transforms, end up with this input to vectorization:

  <bb 5>: //header of inner loop
  # i_2 = PHI <i_25(4), i_15(6)>
  _8 = (long unsigned int) i_2;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _12 = *_11;
  _13 = _12 + j_26;
  *_11 = _13;
  i_15 = i_2 + 1;
  if (max_7 >= i_15)
    goto <bb 6>;
  else
    goto <bb 7>;

  <bb 6>:
  goto <bb 5>;

  <bb 7>: //bb 5 is only predecessor
  _19 = (unsigned int) i_25;
  _18 = (unsigned int) max_7;
  _17 = (unsigned int) i_25;
  _5 = _18 - _17;
  _4 = _5 + _19;
  _3 = _4 + 1;
  i_21 = (int) _3;

  <bb 8>:
  # i_23 = PHI <i_21(7), i_25(3)>
  //tests outer loop

note bb7 use i_25, not i_2; so neither i_15 nor i_2 escape the loop, and we
don't have the problem from above. (Yes bb7 is taking i_25 away from max_7 and
then adding it back on again, before adding 1, to give the value of i after the
inner loop.)

This arrangement of multiple i's live at the same time, is not present in
107t.ch2. 130t.loopinit introduces i_21, computed by an exit phi on leaving the
inner loop. 135t.sccp then changes this to the max_7-i_25+i_25 sequence which
removes the dependency on i_15 and allows vectorization.

Reply via email to