While risking to be of too much annoyance,

What's annoying is that despite expressed desire [or let's put is in clear text: *requirement*] to see more portable code [such as http://cvs.openssl.org/chngview?cn=15317], you choose to go more and more specific to your particular platform and even compiler. ((mode(V16QI)))? Is it accepted on all platforms? No, some [well, ppc] even require to specify flag which affects ABI. Isn't it work in progress? If not, how come gcc-4 calls it deprecated? Is it even documented? Is it really worth the trouble if portable code can deliver just as much and on *all* platforms? And indeed, committed code performs virtually as fast as ECB [naturally on aligned input].

2. Currently, aes-128-cfb works slower than it can (by
more than 20% and often beyond that) and suffers from
ecrypt/decrypt speed assymetry (36 MB/sec encryption
vs 30 MB/sec decryption on one of my machines

Initially you stated "... improves performance by more than 50% in most cases." I wonder what these "most cases" are? It's obvious that CFB performance won't surpass ECB and the only case with that large difference between ECB and CFB I could find is Intel P4. On other platforms it differs distinguishingly less... On the other hand 30MBps doesn't sound like P4, you should see more... What is this mystical "one of your machines"? And what does corresponding 'openssl version -a' return?

- can be
of issue in life media streaming).

If absolute performance is of such great concern, why not RC4 then? It's several *times* faster than AES and is fast enough for modern CPU to sustain 1Gbps rates [which might be of interest on server side].

3. From my experience with gcc on powerpc, gcc handles
large unaligned load/stores correctly by splitting
them (sometimes unnecessary), but the code remains
correct and in working order.

??? PowerPC handles unaligned load/stores in hardware[*], so why would compiler get involved at all? Care to provide C snippet and corresponding assembler listing? Well, manual says that unaligned 64-bit load/stores cause exception, and if such cases are kernel assisted[**], then it's definitely good idea to split them to two 32-bit loads, but I failed to write snippet, which would trick compiler to do so... A.

[*] as far as I understand with exception for early 403s?
[**] "kernel assisted" means that exception handler can just as well split the load/store, put the value to target register and return to next instruction.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to