https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65962
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- The main difference is +/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:39:3: note: LOOP VECTORIZED +/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:39:3: note: OUTER LOOP VECTORIZED ... -/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:9:5: note: vectorized 1 loops in function. +/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-62.c:9:5: note: vectorized 2 loops in function. so we now vectorize two loops. The newly vectorized loop is /* Multidimensional array. Aligned. The "inner" dimensions are invariant in the inner loop. Vectorizable, but the vectorizer detects that everything is invariant and that the loop is better left untouched. (it should be optimized away). */ for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { ia[i][1][8] = ib[i]; } } on x86_64 the latch block is not empty - for some reason not so on ppc. I suspect that if we had a cddce pass after loop invariant/store motion (which should make the inner loop empty) we'd even remove the inner loop and vectorize this regularly. Ah, so on x86_64 we PREd ib[0] while on ppc the ib initializer is probably in a constant pool entry. Yes: <bb 2>: ib = *.LC0; vs. <bb 2>: ib[0] = 0; ib[1] = 3; ib[2] = 6; ib[3] = 9; ... The PRE heuristic to not confuse vectorization doesn't fire here. I have a fix for that (and the testcase).