Maamoun TK <maamoun...@googlemail.com> writes:

>  I'm not aware of a simple way to accomplish either approaches on POWER8, I
> recommend to use allocated stack buffer 

Let's leave that as is, then. Do you want to make another pull request
with only the fixes for register usage?

> to assist handling leftovers rather
> than making it complicated or we can use POWER9 specific instruction
> 'lxvll' which can used to load vector with length passed to general
> register as parameter, it also work on both endian modes without any
> post-loading operations, another benefit from switching to POWER ISA 3.0 is
> that we can use 'lxvb16x/stxvb16x' to load/store input and output data
> instead of 'lxvd2x/stxvd2x' instructions, this eliminate the need for
> post-loading/pre-storing permuting operations on little-endian mode.

I was thinking of something similar to how the unaligned input is
handled in arm/v6/sha1-compress.asm. And then, to handle leftovers at the
end, one would need to compare leftover size with the alignment related
address bits, to decide whether or not to load one more word. But perhaps
only worth the effort if there's a performance advantage in avoiding
unaligned loads also in the main loop.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to