SHA-1: Dean already worked on this, using SSE2.

So far Dean has been working on 32-bit codes. The reason he refers to Opteron is rather because it's another SSE2-capable CPU to compare with, than 64-bit one. Right?


it looks like the openssl cvs HEAD generally beats my sha1 code for 32-bit x86 platforms in most cases now, and generally ties my sha256 code when compiled with gcc... nice work Andy.

Keep in mind that we do *not* have x86 assembler SHA256 implementation, so you're racing compiler in the latter case.


here's some data i collected saturday -- it's all in cycles per byte (lower is better) for 8192-byte buffer:

                openssl         SSE2            SSE2            SSE2
                cvs head        gcc-cvs         gcc34           icc71

sha1:

p4 model 3       9.59           16.9            21.4            14.2
p4 model 2      10.6            15.4            28.4            13.5
p-m             10.3            15.0            14.4            13.3
k8               8.18           10.4            10.3             8.70
efficeon         9.40            7.1             7.04            6.20

sha256:

p4 model 3      31.8            51.8            38.8            31.3
p4 model 2      38.6            46.5            38.1            39.2
p-m             32.7            34.3            32.0            29.0
k8              25.9            29.2            22.2            21.6
efficeon        27.9            20.9            15.4            16.4

So that effectively Efficeon is the only IA-32 implementation which benefits from ahead-permutation in SSE2. I mean marginal SHA256 improvement on P-M abd K8 alone wouldn't really justify increased complexity and tight dependency on compiler version... I see no reason why it would be different in 64-bit case, so that pure IALU 64-bit implementation should suffice just fine...


My first step will be to study the only existing AMD64 implementation of
AES: loop-aes, merged in Linux kernel 2.6.8-rc3 by Brian Gladman.

yeah gladman aes is the way to go ... the gladman code and linux-kernel variations on it

Keep in mind that [unlike Gladman's code] OpenSSL code has to be position independent! It surely no problem on x86_64, but on x86 this puts you in very tight spot. But I've sketched some 32-bit PIC code already [as previously mentioned "I might have an opportunity to play with AES some day *this* year"], so give me few more days...


i know how i could do a better job natively on efficeon using tables only twice as large as gladman, but that's also "breaking the rules" :)

Huh? Is it possible to reach for native instruction set on Transmeta CPUs? I was under impression that it was not possible... A.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]

Reply via email to