http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59651
--- Comment #3 from Bingfeng Mei <bmei at broadcom dot com> --- I can reproduce on aarch64. Still try to understand why. I constructed a similar test but with positive loop step. extern void abort (void); int a[] = { 6, 0, 0, 0 }; int b; int main () { for (;;) { b = 0; for (; b<3; b += 1) a[b] = a[0] > 1; break; } if (a[2] != 0) abort (); return 0; } Actually GCC behaves similarly during vectorization and does vectorize the loop. The only difference is around loop versioning. pr52943.c <bb 10>: if (1 != 0) goto <bb 11>; else goto <bb 12>; bb 11 leads to vectorized version. So scalar version gets optimized out. Above example: <bb 10>: if (0 != 0) goto <bb 11>; else goto <bb 12>; So vectorized version goes away and only scalar version remains.