While risking to be of too much annoyance,
What's annoying is that despite expressed desire [or let's put is in
clear text: *requirement*] to see more portable code [such as
http://cvs.openssl.org/chngview?cn=15317], you choose to go more and
more specific to your particular platform and even compiler.
((mode(V16QI)))? Is it accepted on all platforms? No, some [well, ppc]
even require to specify flag which affects ABI. Isn't it work in
progress? If not, how come gcc-4 calls it deprecated? Is it even
documented? Is it really worth the trouble if portable code can deliver
just as much and on *all* platforms? And indeed, committed code performs
virtually as fast as ECB [naturally on aligned input].
2. Currently, aes-128-cfb works slower than it can (by
more than 20% and often beyond that) and suffers from
ecrypt/decrypt speed assymetry (36 MB/sec encryption
vs 30 MB/sec decryption on one of my machines
Initially you stated "... improves performance by more than 50% in most
cases." I wonder what these "most cases" are? It's obvious that CFB
performance won't surpass ECB and the only case with that large
difference between ECB and CFB I could find is Intel P4. On other
platforms it differs distinguishingly less... On the other hand 30MBps
doesn't sound like P4, you should see more... What is this mystical "one
of your machines"? And what does corresponding 'openssl version -a' return?
- can be
of issue in life media streaming).
If absolute performance is of such great concern, why not RC4 then? It's
several *times* faster than AES and is fast enough for modern CPU to
sustain 1Gbps rates [which might be of interest on server side].
3. From my experience with gcc on powerpc, gcc handles
large unaligned load/stores correctly by splitting
them (sometimes unnecessary), but the code remains
correct and in working order.
??? PowerPC handles unaligned load/stores in hardware[*], so why would
compiler get involved at all? Care to provide C snippet and
corresponding assembler listing? Well, manual says that unaligned 64-bit
load/stores cause exception, and if such cases are kernel assisted[**],
then it's definitely good idea to split them to two 32-bit loads, but I
failed to write snippet, which would trick compiler to do so... A.
[*] as far as I understand with exception for early 403s?
[**] "kernel assisted" means that exception handler can just as well
split the load/store, put the value to target register and return to
next instruction.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [EMAIL PROTECTED]