[Bug target/77308] surprisingly large stack usage for sha512 on arm

bernd.edlinger at hotmail dot de Wed, 02 Nov 2016 03:11:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #49 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to Bernd Edlinger from comment #48)
> (In reply to wilco from comment #22)
> > 
> > Anyway, there is another bug: on AArch64 we correctly recognize there are 8
> > 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load
> > and a byte reverse. Although it is recognized on ARM and works correctly if
> > it is a little endian load, it doesn't perform the optimization if a byte
> > reverse is needed. As a result there are lots of 64-bit shifts and orrs
> > which create huge register pressure if not expanded early.
> 
> Hmm...
> 
> I think the test case does something invalid here:
> 
> const SHA_LONG64 *W = in;
> 
> T1 = X[0] = PULL64(W[0]);
> 
> 
> in is not aligned, but it is cast to a 8-byte aligned type.
> 
> If the bswap pass assumes with your proposed patch
> it is OK to merge 4 byte accesses into an aligned word access,
> it may likely break openssl on -mno-unaligned targets.
> Even on our cortex-a9 the O/S will trap on unaligned accesses.
> I have checked that openssl still works on arm-none-eabi 
> with my patch, but I am not sure about your patch.

I tried it out.  Although the code is bogus the code generation
does not use the wrong alignment.

With -mno-unaligned-access the ldr is split out into 4 ldb and
the result is fed into the rev.

At least in this configuration that is not profitable though.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to