https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121315
--- Comment #3 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Here is a reduced testcase (compile with -O3 -mcpu=neoverse-v2):
void copyReverseGeneric(int *dst, int *src) {
for (int i = 0; i < 10000; ++i)
dst[i] = __builtin_bswap32(src[i]);
}
of course using LDP/STP here would result in an extra add over the current
codegen (even auto-inc LDP/STP doesn't come for free), but maybe it is
worthwhile. I will look into it.
