https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65962

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
The main difference is

+/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:39:3: 
note: LOOP VECTORIZED
+/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:39:3: 
note: OUTER LOOP VECTORIZED
...
-/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:9:5:
note: vectorized 1 loops in function.
+/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:9:5:
note: vectorized 2 loops in function.

so we now vectorize two loops.  The newly vectorized loop is

  /* Multidimensional array. Aligned. The "inner" dimensions
     are invariant in the inner loop. Vectorizable, but the
     vectorizer detects that everything is invariant and that
     the loop is better left untouched. (it should be optimized away). */
  for (i = 0; i < N; i++)
    {
      for (j = 0; j < N; j++)
        {
           ia[i][1][8] = ib[i];
        }
    }

on x86_64 the latch block is not empty - for some reason not so on ppc.
I suspect that if we had a cddce pass after loop invariant/store motion
(which should make the inner loop empty) we'd even remove the inner loop
and vectorize this regularly.

Ah, so on x86_64 we PREd ib[0] while on ppc the ib initializer is probably
in a constant pool entry.  Yes:

  <bb 2>:
  ib = *.LC0;

vs.

  <bb 2>:
  ib[0] = 0;
  ib[1] = 3;
  ib[2] = 6;
  ib[3] = 9;
...

The PRE heuristic to not confuse vectorization doesn't fire here.

I have a fix for that (and the testcase).

Reply via email to