https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741
--- Comment #34 from Sebastian Pop <spop at gcc dot gnu.org> --- r227567 extends the limits of a scop, and now we can detect a scop in the MAIN__ function corresponding to the following code: A=0.1D0 B=0.1D0 -fdump-tree-graphite-all shows that the loops have been tiled: tiled by 51 tiled by 51 ISL AST generated by ISL: { for (int c1 = 0; c1 <= 1023; c1 += 51) for (int c2 = 0; c2 <= 1023; c2 += 51) for (int c3 = c1; c3 <= min(1023, c1 + 50); c3 += 1) for (int c4 = c2; c4 <= min(1023, c2 + 50); c4 += 1) S_4(c3, c4); for (int c1 = 0; c1 <= 1023; c1 += 51) for (int c2 = 0; c2 <= 1023; c2 += 51) for (int c3 = c1; c3 <= min(1023, c1 + 50); c3 += 1) for (int c4 = c2; c4 <= min(1023, c2 + 50); c4 += 1) S_10(c3, c4); } What makes me wondering is why for memset kind of loops when tiling gets us a better performance as reported: before: 17.848000000000003 after: 15.847999999999999 Btw, what architecture have you used for this experiment? The same happens on an AArch64 machine where I was able to reproduce your results: the loop blocked initialization of arrays is consistently faster by about 10%. I noted that on a recent Intel x86_64 machine the first runs show some 10% speedup with loop blocking and then the speedup disappears in subsequent runs (I was alternating runs with and without loop block 10 times).