[fpc-devel] x86_64 SHA1 implementation

J. Gareth Moreton via fpc-devel Sat, 16 Sep 2023 06:14:14 -0700

Hi everyone,

So this past week I've been building on Rika's work by adding anassembly version of SHA-1 for x86_64 to complement Rika's i386 version. So far I've successfully made a version that runs twice as fast as thePascal code. I hoped to go even faster by making use of the SSE2instruction set, but currently the end result is slower even thoughcomputing the common parts of 4 rounds simultaneously should be muchfaster. This occurs even when I forgo writing to the stack and keeppretty much all of the state within registers. Preliminaryinvestigation suggests that the slowdown comes from using MOVD/Q totransfer data between the XMM registers and general-purpose registers,since they are different parts of the CPU. I'm still amazed it causesthis much latency though.

I'll keep investigating and seeing if I can squeeze out moreperformance, but otherwise I may just have to fall back on anon-SIMD-optimised implementation.


Kit

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

[fpc-devel] x86_64 SHA1 implementation

Reply via email to