https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- We are vectorizing the store it dst[] now at -O2 since that appears profitable: t.c:10:10: note: Cost model analysis: r0.0_12 1 times scalar_store costs 12 in body r1.1_13 1 times scalar_store costs 12 in body r2.2_14 1 times scalar_store costs 12 in body r3.3_15 1 times scalar_store costs 12 in body r0.0_12 2 times unaligned_store (misalign -1) costs 24 in body node 0x4b2b1e0 1 times vec_construct costs 4 in prologue node 0x4b2b1e0 1 times vec_construct costs 4 in prologue t.c:10:10: note: Cost model analysis for part in loop 0: Vector cost: 32 Scalar cost: 48 t.c:10:10: note: Basic block will be vectorized using SLP