J. Gareth Moreton via fpc-devel <fpc-devel@lists.freepascal.org> wrote: > So this past week I've been building on Rika's work by adding an > assembly version of SHA-1 for x86_64 to complement Rika's i386 version. > So far I've successfully made a version that runs twice as fast as the > Pascal code. I hoped to go even faster by making use of the SSE2 > instruction set...
In 2010 Intel published SSE3 code to improve SHA1 performance. Later that year it was incorporated into OpenSSL ASM code. The OpenSSL code also includes AVX and SHA acceleration extensions. Intel Article: https://www.intel.com/content/www/us/en/developer/articles/technical/improving-the-performance-of-the-secure-hash-algorithm-1.html Brief on Intel SHA extensions (also works for AMD Zen and later CPUs) https://en.wikipedia.org/wiki/Intel_SHA_extensions OpenSSL x86 64-bit assembly code and performance chart https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-x86_64.pl ###################################################################### # Current performance is summarized in following table. Numbers are # CPU clock cycles spent to process single byte (less is better). # # x86_64 SSSE3 AVX[2] # P4 9.05 - # Opteron 6.26 - # Core2 6.55 6.05/+8% - # Westmere 6.73 5.30/+27% - # Sandy Bridge 7.70 6.10/+26% 4.99/+54% # Ivy Bridge 6.06 4.67/+30% 4.60/+32% # Haswell 5.45 4.15/+31% 3.57/+53% # Skylake 5.18 4.06/+28% 3.54/+46% # Bulldozer 9.11 5.95/+53% # Ryzen 4.75 3.80/+24% 1.93/+150%(**) # VIA Nano 9.32 7.15/+30% # Atom 10.3 9.17/+12% # Silvermont 13.1(*) 9.37/+40% # Knights L 13.2(*) 9.68/+36% 8.30/+59% # Goldmont 8.13 6.42/+27% 1.70/+380%(**) # # (*) obviously suboptimal result, nothing was done about it, # because SSSE3 code is compiled unconditionally; # (**) SHAEXT result _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel