https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Using ymm might also trigger dynamic stack realignment if we ever spill, also using ymm can be slower when the memory is unaligned (and/or when the CPU has split AVX support only). It will also require vzeroupper. So I wonder if it is really worth for small structures like this? And with fast rep;movb isn't that even better? [can fast rep/movb stores be forwarded?]