Eric Richter <[email protected]> writes:

> According to the ABI, the stack pointer is quadword aligned, so starting
> the stack storage at offset -8, may cause the return address to be
> stepped on. Adjusting to use -16 as the starting point, which also
> matches other POWER assembly code.

Thanks, applied!

I've noticed one more memory access issue when re-reading this code. The
loading of the input data is done using

define(`LOAD', `
        IF_BE(`lxvw4x   VSR(IV($1)), $2, INPUT')
        IF_LE(`
                lxvd2x  VSR(IV($1)), $2, INPUT
                vperm   IV($1), IV($1), IV($1), VT0
        ')
')
[...]
        LOAD(0, TC0)
        LOAD(1, TC4)
        LOAD(2, TC8)
        LOAD(3, TC12)
[...]

As I understand this, like for the state registers, we only use 32 bits
of each of the vector registers representing the input block being
expanded (it would be nice if we could find a more compact
representation without complicating the input expansion logic, but that
may be quite difficult).

So we read the 16 bytes at INPUT into register v16, using the first 4 of
those bytes, then the 16 bytes as INPUT+4 into v17, using the first 4
bytes, etc.

So we do overlapping reads, and at the end we'll read 12 bytes beyond
the end of the input buffer?

I think it should be possible to replace this with something like

        LOAD(0, TC0)
        vsldoi  IV(1), IV(0), IV(0), 4
        vsldoi  IV(2), IV(0), IV(0), 8
        vsldoi  IV(3), IV(0), IV(0), 12
        LOAD(4, TC16)
        [...]

Do you agree? We could then eliminate some of the TC registers as well. 

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to