Yes, I have no idea about what causes this -- maybe some ARM expert can chip in.
I posted a bug to gcc bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005 This godbolt link shows the difference very clearly: https://godbolt.org/z/_hEykmWhen code is written using intrinsics, gcc is able to promote a stack array to SIMD registers on SSE, but not on NEON.
Joel
