SHA-1: Dean already worked on this, using SSE2.
So far Dean has been working on 32-bit codes. The reason he refers to Opteron is rather because it's another SSE2-capable CPU to compare with, than 64-bit one. Right?
it looks like the openssl cvs HEAD generally beats my sha1 code for 32-bit x86 platforms in most cases now, and generally ties my sha256 code when compiled with gcc... nice work Andy.
Keep in mind that we do *not* have x86 assembler SHA256 implementation, so you're racing compiler in the latter case.
here's some data i collected saturday -- it's all in cycles per byte (lower is better) for 8192-byte buffer:
openssl SSE2 SSE2 SSE2 cvs head gcc-cvs gcc34 icc71
sha1:
p4 model 3 9.59 16.9 21.4 14.2 p4 model 2 10.6 15.4 28.4 13.5 p-m 10.3 15.0 14.4 13.3 k8 8.18 10.4 10.3 8.70 efficeon 9.40 7.1 7.04 6.20
sha256:
p4 model 3 31.8 51.8 38.8 31.3 p4 model 2 38.6 46.5 38.1 39.2 p-m 32.7 34.3 32.0 29.0 k8 25.9 29.2 22.2 21.6 efficeon 27.9 20.9 15.4 16.4
So that effectively Efficeon is the only IA-32 implementation which benefits from ahead-permutation in SSE2. I mean marginal SHA256 improvement on P-M abd K8 alone wouldn't really justify increased complexity and tight dependency on compiler version... I see no reason why it would be different in 64-bit case, so that pure IALU 64-bit implementation should suffice just fine...
My first step will be to study the only existing AMD64 implementation of AES: loop-aes, merged in Linux kernel 2.6.8-rc3 by Brian Gladman.
yeah gladman aes is the way to go ... the gladman code and linux-kernel variations on it
Keep in mind that [unlike Gladman's code] OpenSSL code has to be position independent! It surely no problem on x86_64, but on x86 this puts you in very tight spot. But I've sketched some 32-bit PIC code already [as previously mentioned "I might have an opportunity to play with AES some day *this* year"], so give me few more days...
i know how i could do a better job natively on efficeon using tables only twice as large as gladman, but that's also "breaking the rules" :)
Huh? Is it possible to reach for native instruction set on Transmeta CPUs? I was under impression that it was not possible... A.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]
