Eric Richter <[email protected]> writes:
> According to the ABI, the stack pointer is quadword aligned, so starting
> the stack storage at offset -8, may cause the return address to be
> stepped on. Adjusting to use -16 as the starting point, which also
> matches other POWER assembly code.
Thanks, applied!
I've noticed one more memory access issue when re-reading this code. The
loading of the input data is done using
define(`LOAD', `
IF_BE(`lxvw4x VSR(IV($1)), $2, INPUT')
IF_LE(`
lxvd2x VSR(IV($1)), $2, INPUT
vperm IV($1), IV($1), IV($1), VT0
')
')
[...]
LOAD(0, TC0)
LOAD(1, TC4)
LOAD(2, TC8)
LOAD(3, TC12)
[...]
As I understand this, like for the state registers, we only use 32 bits
of each of the vector registers representing the input block being
expanded (it would be nice if we could find a more compact
representation without complicating the input expansion logic, but that
may be quite difficult).
So we read the 16 bytes at INPUT into register v16, using the first 4 of
those bytes, then the 16 bytes as INPUT+4 into v17, using the first 4
bytes, etc.
So we do overlapping reads, and at the end we'll read 12 bytes beyond
the end of the input buffer?
I think it should be possible to replace this with something like
LOAD(0, TC0)
vsldoi IV(1), IV(0), IV(0), 4
vsldoi IV(2), IV(0), IV(0), 8
vsldoi IV(3), IV(0), IV(0), 12
LOAD(4, TC16)
[...]
Do you agree? We could then eliminate some of the TC registers as well.
Regards,
/Niels
--
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]