Re: [patch] make AES-cfb128-encrypt faster by uglifying it

Andy Polyakov Tue, 30 May 2006 01:45:15 -0700

While risking to be of too much annoyance,

What's annoying is that despite expressed desire [or let's put is inclear text: *requirement*] to see more portable code [such ashttp://cvs.openssl.org/chngview?cn=15317], you choose to go more andmore specific to your particular platform and even compiler.((mode(V16QI)))? Is it accepted on all platforms? No, some [well, ppc]even require to specify flag which affects ABI. Isn't it work inprogress? If not, how come gcc-4 calls it deprecated? Is it evendocumented? Is it really worth the trouble if portable code can deliverjust as much and on *all* platforms? And indeed, committed code performsvirtually as fast as ECB [naturally on aligned input].

2. Currently, aes-128-cfb works slower than it can (by
more than 20% and often beyond that) and suffers from
ecrypt/decrypt speed assymetry (36 MB/sec encryption
vs 30 MB/sec decryption on one of my machines

Initially you stated "... improves performance by more than 50% in mostcases." I wonder what these "most cases" are? It's obvious that CFBperformance won't surpass ECB and the only case with that largedifference between ECB and CFB I could find is Intel P4. On otherplatforms it differs distinguishingly less... On the other hand 30MBpsdoesn't sound like P4, you should see more... What is this mystical "oneof your machines"? And what does corresponding 'openssl version -a' return?

- can be
of issue in life media streaming).

If absolute performance is of such great concern, why not RC4 then? It'sseveral *times* faster than AES and is fast enough for modern CPU tosustain 1Gbps rates [which might be of interest on server side].

3. From my experience with gcc on powerpc, gcc handles
large unaligned load/stores correctly by splitting
them (sometimes unnecessary), but the code remains
correct and in working order.

??? PowerPC handles unaligned load/stores in hardware[*], so why wouldcompiler get involved at all? Care to provide C snippet andcorresponding assembler listing? Well, manual says that unaligned 64-bitload/stores cause exception, and if such cases are kernel assisted[**],then it's definitely good idea to split them to two 32-bit loads, but Ifailed to write snippet, which would trick compiler to do so... A.


[*] as far as I understand with exception for early 403s?

[**] "kernel assisted" means that exception handler can just as wellsplit the load/store, put the value to target register and return tonext instruction.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [EMAIL PROTECTED]

Re: [patch] make AES-cfb128-encrypt faster by uglifying it

Reply via email to