Hi,
One of biggest issues we have with GCC vectorization is bloated code size.
For example, vectorized version is 2.5 times of non-vectorized one for the
following simple code. One reason is that GCC often creates one loop copy
because of aliasing/alignment and one epilog loop because of loop iteration
constraint.

void foo (int *a, int *b, int N)
{
  int i;
  for (i = 0; i < N; i++)
  {
    a[i] = b[i];
  }
}

Looking closely, the epilog loop and alignement/aliasing loop are almost
identical, just different in initial values for some variables entering
the loop. Can they be merged into one in such situations? If yes, any 
suggestion on how to implement it? 

...
  <bb 7>:
  # i_39 = PHI <i_47(8), i_50(10)>
  _41 = (long unsigned int) i_39;
  _42 = _41 * 4;
  _43 = a_7(D) + _42;
  _44 = b_9(D) + _42;
  _45 = *_44;
  *_43 = _45;
  i_47 = i_39 + 1;
  if (N_4(D) > i_47)
    goto <bb 8>;
  else
    goto <bb 15>;

  <bb 8>:
  goto <bb 7>;

  <bb 9>:
  # i_51 = PHI <i_13(6)>
  tmp.6_56 = (int) ratio_mult_vf.5_38;
  if (niters.3_34 == ratio_mult_vf.5_38)
    goto <bb 16>;
  else
    goto <bb 10>;

  <bb 10>:
  # i_50 = PHI <tmp.6_56(9), 0(4)>
  goto <bb 7>;

  <bb 11>:
  goto <bb 6>;

  <bb 12>:

  <bb 13>:
  # i_24 = PHI <0(12), i_32(14)>
  _26 = (long unsigned int) i_24;
  _27 = _26 * 4;
  _28 = a_7(D) + _27;
  _29 = b_9(D) + _27;
  _30 = *_29;
  *_28 = _30;
  i_32 = i_24 + 1;
  if (N_4(D) > i_32)
    goto <bb 14>;
  else
    goto <bb 17>;
...

Thanks,
Bingfeng

Reply via email to