> Intel Xeon E-1220 processor (Sandy-Bridge), 3.1GHz, supporting
> SSE4.1/4.2, AVX, AES-NI

> OpenSSL 0.9.8 
> =============
> ...
> - comments
> 
> RC4 is clearly faster when built from C and optimized by the compiler.

It has lesser to do with compiler [allegedly performing miracles],
rather with architectural differences among processors. In assembler
module there are two code paths: one that maintains key schedule as
vector of *bytes*, and one - as vector of *32-bit values*. In C case
latter is the only option. On Intel processors assembler was opting for
dense key schedule, obviously suboptimal choice for latest family
additions. Compiler-generated code was outperforming assembler, because
hardware is handling sparse key schedule better.

> OpenSSL 1.0.0
> =============
> ...
> - comments
> 
>>From OpenSSL-0.9.8 to OpenSSL-1.0.0, when using ASM version, AES encryption
> speed goes down. It's not a regression: the ASM version was tweaked to handle
> some shared cache attack vector:
> 
>>From Andy Polyakov <[email protected]>:
>> Assembler appears slower, because it's taking code path resistant to
>> cache-timing attacks [on multi-core CPUs with shared cache].

Relevant question would be how would *equivalent* compiler-generated
code perform. The question is rhetorical.

> OpenSSL 1.0.1
> =============
> 
>     RC4 is clearly faster compared to OpenSSL 1.0.0.
>     It's even faster than the C version.

As implied above this is because it's using sparse key schedule now. And
kudos to Intel contributors for showing how to make it perform
adequately on pre-Sandy Bridge processors.

>     Note: it is possible to disable AES-NI support by setting OPENSSL_ia32cap
>     environment variable, ...
> 
>     With -evp, performances are reduced by half when AES-NI is no more 
> available.
>     In the latter case, OpenSSL is probably relying on AVX, SSE, etc. to keep 
>     good performances.

For reference, new alternative code paths are SSSE3-specific. It has
some dark sides on not-so-latest SSSE3-capable processors, nor was it
benchmarked on latest AMD processors. Meaning that readers should not
consider conclusions in originating post as universally applicable and
keep in mind that tests were performed on Sandy Bridge. Not to mention
that CBC encrypt is worst test case to show AES-NI advantages,
especially on Sandy Bridge. Try 'speed -evp aes-128-cbc -decrypt', or
'speed -evp aes-128-ctr'...

> - comments
> 
> RC4 ASM get a lot of improvement.
> 
> AES ASM get a lot of improvement too,

On contemporary CPUs! Those that are SSSE3- and AES-NI-capable. On older
CPUs you'll observe 1.0.0 performance.

> Even without AES-NI, OpenSSL 1.0.1 might be interesting to use to
> improve SSL throughput.

In TLS/SSL there is no encryption without message authentication. In
other words in the context you should also look at MAC performance, not
only cipher. Indeed, it doesn't really matter if cipher is 10 or 20
times faster than MAC, does it? Well, I'm not saying that AES
performance in 1.0.0 is better than say SHA1, I'm only saying that *if*
you bring up SSL, then you may not omit MAC. And 1.0.1 takes most
popular cipher/MAC combinations, RC4+MD5 and AES+SHA1, to the next
level. I'm referring to so called "stitched" implementations...

> Using the C version of algorithms instead of the ASM version is no
> more needed to get improved performances.

Formally speaking it was never actually case. It's just that one ended
up comparing apples and oranges.

> But output of 'openssl speed' without arguments doesn't show the
> improvement, which can be misleading for users. One have to test each
> algorithm using -evp option.

It's as appropriate to point out that TLS/SSL layer uses exclusively EVP
interface, so that speed -evp is the one that adequately reflects Web
server performance.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to