[Bug target/77308] surprisingly large stack usage for sha512 on arm

wilco at gcc dot gnu.org Wed, 02 Nov 2016 04:12:46 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #51 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #49)
> (In reply to Bernd Edlinger from comment #48)
> > (In reply to wilco from comment #22)
> > > 
> > > Anyway, there is another bug: on AArch64 we correctly recognize there are 
> > > 8
> > > 1-byte loads, shifts and orrs which can be replaced by a single 8-byte 
> > > load
> > > and a byte reverse. Although it is recognized on ARM and works correctly 
> > > if
> > > it is a little endian load, it doesn't perform the optimization if a byte
> > > reverse is needed. As a result there are lots of 64-bit shifts and orrs
> > > which create huge register pressure if not expanded early.
> > 
> > Hmm...
> > 
> > I think the test case does something invalid here:
> > 
> > const SHA_LONG64 *W = in;
> > 
> > T1 = X[0] = PULL64(W[0]);
> > 
> > 
> > in is not aligned, but it is cast to a 8-byte aligned type.
> > 
> > If the bswap pass assumes with your proposed patch
> > it is OK to merge 4 byte accesses into an aligned word access,
> > it may likely break openssl on -mno-unaligned targets.
> > Even on our cortex-a9 the O/S will trap on unaligned accesses.
> > I have checked that openssl still works on arm-none-eabi 
> > with my patch, but I am not sure about your patch.
> 
> I tried it out.  Although the code is bogus the code generation
> does not use the wrong alignment.
> 
> With -mno-unaligned-access the ldr is split out into 4 ldb and
> the result is fed into the rev.
>
> At least in this configuration that is not profitable though.

Indeed, that's the reason behind the existing check. However it disables all
profitable bswap cases while still generating unaligned accesses if no bswap is
needed. So I am looking for a callback that gives the correct answer. It would
need to check -mno-unaligned-access and the target capabilities (eg. if
unaligned accesses are supported in hardware but really expensive we want to
avoid them).

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to