https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #48 from Bernd Edlinger <bernd.edlinger at hotmail dot de> --- (In reply to wilco from comment #22) > > Anyway, there is another bug: on AArch64 we correctly recognize there are 8 > 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load > and a byte reverse. Although it is recognized on ARM and works correctly if > it is a little endian load, it doesn't perform the optimization if a byte > reverse is needed. As a result there are lots of 64-bit shifts and orrs > which create huge register pressure if not expanded early. Hmm... I think the test case does something invalid here: const SHA_LONG64 *W = in; T1 = X[0] = PULL64(W[0]); in is not aligned, but it is cast to a 8-byte aligned type. If the bswap pass assumes with your proposed patch it is OK to merge 4 byte accesses into an aligned word access, it may likely break openssl on -mno-unaligned targets. Even on our cortex-a9 the O/S will trap on unaligned accesses. I have checked that openssl still works on arm-none-eabi with my patch, but I am not sure about your patch.