https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #49 from Bernd Edlinger <bernd.edlinger at hotmail dot de> --- (In reply to Bernd Edlinger from comment #48) > (In reply to wilco from comment #22) > > > > Anyway, there is another bug: on AArch64 we correctly recognize there are 8 > > 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load > > and a byte reverse. Although it is recognized on ARM and works correctly if > > it is a little endian load, it doesn't perform the optimization if a byte > > reverse is needed. As a result there are lots of 64-bit shifts and orrs > > which create huge register pressure if not expanded early. > > Hmm... > > I think the test case does something invalid here: > > const SHA_LONG64 *W = in; > > T1 = X[0] = PULL64(W[0]); > > > in is not aligned, but it is cast to a 8-byte aligned type. > > If the bswap pass assumes with your proposed patch > it is OK to merge 4 byte accesses into an aligned word access, > it may likely break openssl on -mno-unaligned targets. > Even on our cortex-a9 the O/S will trap on unaligned accesses. > I have checked that openssl still works on arm-none-eabi > with my patch, but I am not sure about your patch. I tried it out. Although the code is bogus the code generation does not use the wrong alignment. With -mno-unaligned-access the ldr is split out into 4 ldb and the result is fed into the rev. At least in this configuration that is not profitable though.