Hi,
since e-mail signing/DKIM (RFC 4871) becomes more and more important
and our production servers all run stable S10u7 with openssl 0.9.7d
and we don't wanna mix it up with 0.9.8* for obvious reasons, I decided
to pull out the pk11_RSA_{sign|verify} stuff from the openssl pkcs11
contrib and flanged it to libdkim, so that it has [rsa-]sha256 support
as well. To get an idea, see
http://iws.cs.uni-magdeburg.de/~elkner/dkim/dkim.c.patch
To decide, whether it is worth to pull out the digest stuff too
or just go with the digest lib (much less maintainance burden) I made
some "benchmarks", i.e. runinng the t-{verify|signi}perf of libdkim
(sorry for the wide format):
openssl MHz
sign % cor% verify % cor%
1) snv_115,Opteron 254, 0.9.8a 2813 394 100 100
3901 100 100
9) svn_110,UltraSPARC-IIIi, 0.9.8a 1503 130 33
62 4832 124 232
7) snv_110,UltraSPARC-IV+, 0.9.8a 1500 135 34
64 5870 150 282
8) svn_110,UltraSPARC-IIIi,0.9.8a/pkcs 1503 329 84 157
5249 135 252
6) snv_110,UltraSPARC-IV+, 0.9.8a/pkcs 1500 363 92 173
5523 142 266
4) s10u7,DC Opteron 285, 0.9.7d/pkcs/md 2593 874 222 241 16317
418 454
3) snv_115,Opteron 254, 0.9.8a/pkcs 2813 1210 307 307
6000 154 154
2) snv_115,Opteron 254, 0.9.7d/pkcs/md 2813 1332 338 338 6250
160 160
5) s10u7,UltraSPARC-IIIi,0.9.7d/pkcs/md 1503 3363 854 1598 19028
488 913
# vim: ts=4 filetype=txt
# LESS: -x4 -M -F -X
sign is rsa-sha256 signing speed in msgs/sec, % is wrt. to test 1),
cor% is % wrt. 1) but correlated to cpu clock (cycles/s). Analog
verify is rsa-sha256 verifying speed in msgs/sec ...
More details (hw, results, cc options used) are available via
http://iws.cs.uni-magdeburg.de/~elkner/dkim/dkimbench.txt
Actually, I got, what I really didn't have expected:
1) though, that an x86 64bit machine with ~ 2x the clock rate of a sparc
machine should be at least as fast as the sparc machine
2) thought, that openssl would automatically choose the pkcs11 engine on
Solaris, when it is available
3) even with pkcs11 enabled, Nevada performance wrt. S10u7 is
disappointing: for signing it is a ~ factor 10, for verifying it is
about factor 4!
So questions:
1) Is x86 pkcs11 not yet optimized very well? Or is it simply a hardware
limitation, which prevents further opts?
2) Isn't it possible, to have Solaris openssl automatically load the
pkcs11 engine and set it as default? I mean, there is a lot of SW
which uses openssl today, however, the only one I know, which lets
you choose the engine to use, is apache httpd2+ ...
3) Wrt. to the test and HW I don't think, a V240 is that much different
to a Blade-1500 (silver). The only difference I can see, which could
IMHO cause a difference, is that on machine 4) and 5) the provider
is set to /usr/lib/security/$ISA/pkcs11_softtoken_extra.so - probably
a "relict" from S10u3 (SUNWcry) times ... If this is really the key
of the problem, why is this lib not part of Nevada?
Last but not least an impl. question:
Since not an pkcs11 expert and documentation wrt. to openssl/pkcs11 is
missing the part: How to cleanup? The documentation say, that one should
call C_Finalize(NULL), when the application is done with the crypto
stuff, but also states, that a library should not call it, because it
may have side effects (which one, and which one, if one doesn't it).
So the openssl/contrib/crypto/engine/hw_pk11.c::pk11_finish(ENGINE *e)
(I guess, called by EVP_cleanup()) doesn't C_Finalize, but sets the
pFuncList back to NULL. So calling C_Finalize before EVP_cleanup()
is IMHO not ok, but calling it after EVP_cleanup() would be wrong as
well. But save a pointer to the function list and calling it later
(e.g. see: http://iws.cs.uni-magdeburg.de/~elkner/dkim/dkim-crypto.c.patch)
seems to be wrong as well, because of the pkcs11 dso unload/cleanup.
So what is the right thing, todo here?
Regards,
jel.
--
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
_______________________________________________
perf-discuss mailing list
[email protected]