https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61680
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I think the problem is that for the w array (in C testcase) we have first a partial group store (stores 3 out of 4 elements), then a group load (again, loads 3 out of 4 elements) and finally the last store of the group store. When vectorizing the group store, we don't emit anything at all until the last store is done, but when vectorizing the group load, we are unaware that there is a pending group store that aliases the group load. So, I think we need to detect the case when in between the individual group store statements from some group there are some loads that may or must alias the vectorizable store. Either we can give up in that case altogether (probably the right thing for 4.9.x and 4.8.x?), or for must alias cases supposedly vectorization of the load could use whatever the earlier store is storing there (the question is why earlier loop optimizations haven't done that though, fre1 did it for some reason only for the w[i][2] store and load and not for the other two store/load pairs. Anyway, not familiar enough with group loads/stores to fix this, Richard, can you please have a look? With -O3 -msse2 -ffast-math -fno-vect-cost-model this started to be miscompiled with r148352, supposedly pcom used to optimize the loads/stores.