https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115759
--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Do you mean you want to see the codegen look like LLVM: https://godbolt.org/z/b7W88WTGo ? I personally think GCC has better codegen than LLVM for your case in general since LLVM is using strided store wheras GCC is using unit-stride store with SLP vectorization approach. It's true that GCC has more instructions in the header of the loop than LLVM, but I think GCC is vectorizing codes better than LLVM in general.