> Intel Xeon E-1220 processor (Sandy-Bridge), 3.1GHz, supporting > SSE4.1/4.2, AVX, AES-NI
> OpenSSL 0.9.8 > ============= > ... > - comments > > RC4 is clearly faster when built from C and optimized by the compiler. It has lesser to do with compiler [allegedly performing miracles], rather with architectural differences among processors. In assembler module there are two code paths: one that maintains key schedule as vector of *bytes*, and one - as vector of *32-bit values*. In C case latter is the only option. On Intel processors assembler was opting for dense key schedule, obviously suboptimal choice for latest family additions. Compiler-generated code was outperforming assembler, because hardware is handling sparse key schedule better. > OpenSSL 1.0.0 > ============= > ... > - comments > >>From OpenSSL-0.9.8 to OpenSSL-1.0.0, when using ASM version, AES encryption > speed goes down. It's not a regression: the ASM version was tweaked to handle > some shared cache attack vector: > >>From Andy Polyakov <[email protected]>: >> Assembler appears slower, because it's taking code path resistant to >> cache-timing attacks [on multi-core CPUs with shared cache]. Relevant question would be how would *equivalent* compiler-generated code perform. The question is rhetorical. > OpenSSL 1.0.1 > ============= > > RC4 is clearly faster compared to OpenSSL 1.0.0. > It's even faster than the C version. As implied above this is because it's using sparse key schedule now. And kudos to Intel contributors for showing how to make it perform adequately on pre-Sandy Bridge processors. > Note: it is possible to disable AES-NI support by setting OPENSSL_ia32cap > environment variable, ... > > With -evp, performances are reduced by half when AES-NI is no more > available. > In the latter case, OpenSSL is probably relying on AVX, SSE, etc. to keep > good performances. For reference, new alternative code paths are SSSE3-specific. It has some dark sides on not-so-latest SSSE3-capable processors, nor was it benchmarked on latest AMD processors. Meaning that readers should not consider conclusions in originating post as universally applicable and keep in mind that tests were performed on Sandy Bridge. Not to mention that CBC encrypt is worst test case to show AES-NI advantages, especially on Sandy Bridge. Try 'speed -evp aes-128-cbc -decrypt', or 'speed -evp aes-128-ctr'... > - comments > > RC4 ASM get a lot of improvement. > > AES ASM get a lot of improvement too, On contemporary CPUs! Those that are SSSE3- and AES-NI-capable. On older CPUs you'll observe 1.0.0 performance. > Even without AES-NI, OpenSSL 1.0.1 might be interesting to use to > improve SSL throughput. In TLS/SSL there is no encryption without message authentication. In other words in the context you should also look at MAC performance, not only cipher. Indeed, it doesn't really matter if cipher is 10 or 20 times faster than MAC, does it? Well, I'm not saying that AES performance in 1.0.0 is better than say SHA1, I'm only saying that *if* you bring up SSL, then you may not omit MAC. And 1.0.1 takes most popular cipher/MAC combinations, RC4+MD5 and AES+SHA1, to the next level. I'm referring to so called "stitched" implementations... > Using the C version of algorithms instead of the ASM version is no > more needed to get improved performances. Formally speaking it was never actually case. It's just that one ended up comparing apples and oranges. > But output of 'openssl speed' without arguments doesn't show the > improvement, which can be misleading for users. One have to test each > algorithm using -evp option. It's as appropriate to point out that TLS/SSL layer uses exclusively EVP interface, so that speed -evp is the one that adequately reflects Web server performance. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
