https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Devin Hussey from comment #4) > Strangely, this doesn't seem to affect the ARM or aarch64 backends, although > I am on a December build (specifically Dec 29). 8.2 is also unaffected. This is due to those backends support very wide integer modes (OI, etc.). > aarch64-none-eabi-gcc -O3 -S test.c > > test: > ld1 {v16.16b - v19.16b}, [x1] > ld1 {v4.16b - v7.16b}, [x2] > add v0.4s, v16.4s, v4.4s > add v1.4s, v17.4s, v5.4s > add v2.4s, v18.4s, v6.4s > add v3.4s, v19.4s, v7.4s > st1 {v0.16b - v3.16b}, [x0] > ret This is not really that good code either on most if not all micro-arch of ARMv8. Doing, 8 ldr/ld1 and 4 st1 is almost always better.