https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Devin Hussey from comment #4)
> Strangely, this doesn't seem to affect the ARM or aarch64 backends, although
> I am on a December build (specifically Dec 29). 8.2 is also unaffected.

This is due to those backends support very wide integer modes (OI, etc.).


> aarch64-none-eabi-gcc -O3 -S test.c
> 
> test:
>         ld1     {v16.16b - v19.16b}, [x1]
>         ld1     {v4.16b - v7.16b}, [x2]
>         add     v0.4s, v16.4s, v4.4s
>         add     v1.4s, v17.4s, v5.4s
>         add     v2.4s, v18.4s, v6.4s
>         add     v3.4s, v19.4s, v7.4s
>         st1     {v0.16b - v3.16b}, [x0]
>         ret

This is not really that good code either on most if not all micro-arch of
ARMv8.
Doing, 8 ldr/ld1 and 4 st1 is almost always better.

Reply via email to