On Fri, Oct 22, 2021 at 10:45 AM Niels Möller <ni...@lysator.liu.se> wrote:

> Maamoun TK <maamoun...@googlemail.com> writes:
>
> > I've added a new patch that optimizes SHA3 permute function for S390x
> > architecture
> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/36
> > More about the patch in merge request description.
>
> Really nice speedup, and interesting that it's significantly faster than
> your previous version using the special sha3 instructions.
>

Yes, special sha3 instruction of s390x arch doesn't fit well in the SHA3
permute function of nettle, it executes unneeded procedures that are
handled by other functions in nettle that slow down the performance
compared to regular vectorized optimization.


> I'm sorry the existing implementations are quite hard to follow, with
> irregular data movements and rather unstructured comments. It must have
> been a bit challenging to decipher the x86_64 version. Do you have any
> ideas on how to improve documentation and comments?
>

I made some documentation and comment improvements on the implementation,
the new doc illustrates the structure of main permute elements in more
detail. The update has also some improvements on the usage of instruction
set that yield a faster performance.
Let me know if there is any improvement potential there!

regards,
Mamone
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to