there is a cpuid test in rc4_skey.c which tests the hyperthreading cpuid bit to distinguish between two implementations of rc4... unfortunately this fails to properly distinguish the cpus. all dual core cpus (intel or amd) report HT support even if they don't use symmetric-multithreading like some p4 do.

So HT flag is no longer HyperThreading, but something else... Will look into it... There is another place HTT flag is checked and it's AES...

it seems somewhat fortunate that core2 CPUs track the p4 behaviour
w.r.t. these two rc4 implementations.  here are the core2 results with the
stock code / HT test:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             166799.58k   180552.87k   182437.93k   183381.67k   183206.87k

and with cpuid test disabled:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             123361.30k   128102.17k   129876.57k   128787.22k   129419.95k

for the record, core2 64-bit code seriously underperforming the 32-bit
code...  here's the 32-bit results (with cpuid test enabled):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             254164.64k   279901.10k   279364.38k   283617.62k   276690.26k

... The key feature in 32-bit code with cpuid test is that corresponding loop is not unrolled. Can you test following in *64-bit* build on Core2 hardware. Open rc4-x86_64.pl in text editor and make jump to .Lcloop1 at line 154 unconditional, i.e. replace jz to jmp. make, benchmark and report back. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to