Hi, Here's a report about OpenSSL speed command regarding ASM vs C and AES-NI support.
The tests were done on a HP ML110 G7 computer using one Intel Xeon E-1220 processor (Sandy-Bridge), 3.1GHz, supporting SSE4.1/4.2, AVX, AES-NI http://ark.intel.com/products/52269/ http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#Xeon_E3_.28uniprocessor.29 http://en.wikipedia.org/wiki/Sandy_Bridge All tests were run on Linux Fedora 16, x86_64 architecture, using GCC 4.6.2. In short: openssl speed doesn't show AES-NI improvement by default. Long version: I have tested four branches on OpenSSL CVS with and without ASM support in order to compare performances of RC4 and AES algorithms. I was trying to understand why 'openssl speed' on this computer was reporting so low values given the AES-NI instruction set available. Let's have a look to the results and embedded comments. OpenSSL 0.9.8 ============= - built with -g -march=native OPENSSL_ia32cap: 0x1fbae3fffffbffff OpenSSL 0.9.8s xx XXX xxxx built on: Tue Dec 6 15:34:52 CET 2011 platform: linux-x86_64 options: bn(64,64) md2(int) rc4(1x,char) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 365479.36k 375070.91k 385547.90k 386414.59k 388161.07k aes-128 cbc 141240.55k 219880.52k 253280.68k 265520.80k 268593.83k aes-256 cbc 119127.67k 168873.37k 188704.28k 194237.78k 197057.33k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 330131.13k 367080.58k 383146.38k 385731.24k 388210.39k aes-128-cbc 178643.98k 238537.07k 260175.88k 266410.67k 269840.10k aes-256-cbc 143636.49k 179758.55k 191965.84k 195133.78k 197131.30k - built with no-asm -g -march=native OpenSSL 0.9.8s xx XXX xxxx built on: Tue Dec 6 15:38:18 CET 2011 platform: linux-x86_64 options: bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 499831.86k 532782.87k 546130.96k 551370.41k 561267.07k aes-128 cbc 196030.83k 204629.31k 206407.92k 207066.45k 207800.08k aes-256 cbc 153582.03k 158641.12k 159015.59k 160347.78k 159937.88k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 449829.69k 527424.32k 549180.59k 559130.97k 561576.67k aes-128-cbc 189320.74k 202767.55k 205958.08k 206878.72k 207860.36k aes-256-cbc 149526.70k 157224.30k 159310.00k 159739.22k 160439.91k - comments RC4 is clearly faster when built from C and optimized by the compiler. AES is faster when built from ASM. OpenSSL 1.0.0 ============= - built with -g -march=native OPENSSL_ia32cap: 0x1fbae3fffffbffff OpenSSL 1.0.0f-dev xx XXX xxxx built on: Tue Dec 6 15:40:12 CET 2011 platform: linux-x86_64 options: bn(64,64) rc4(1x,char) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DWHIRLPOOL_ASM OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 359001.99k 373739.22k 384728.53k 386213.55k 386670.59k aes-128 cbc 100021.25k 110124.93k 112996.00k 113950.99k 114088.72k aes-256 cbc 73726.15k 79047.71k 80190.89k 80632.83k 81002.17k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 322879.64k 365185.75k 382775.74k 385150.29k 388161.07k aes-128-cbc 99804.36k 109730.71k 112919.37k 113531.22k 114186.62k aes-256-cbc 73109.67k 78685.01k 80398.13k 80638.63k 81032.31k - built with no-asm -g -march=native OpenSSL 1.0.0f-dev xx XXX xxxx built on: Tue Dec 6 15:43:43 CET 2011 platform: linux-x86_64 options: bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 508054.69k 544800.04k 558017.58k 561510.40k 560674.13k aes-128 cbc 209249.52k 218106.77k 219770.69k 220843.69k 222433.49k aes-256 cbc 161253.61k 165987.39k 167491.38k 167257.43k 168681.23k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 453767.10k 527468.14k 553088.86k 560333.48k 562595.87k aes-128-cbc 200486.55k 214642.90k 219314.34k 219471.87k 221784.02k aes-256-cbc 154192.97k 164381.23k 167281.96k 167910.74k 168716.84k - comments >From OpenSSL-0.9.8 to OpenSSL-1.0.0, when using ASM version, AES encryption speed goes down. It's not a regression: the ASM version was tweaked to handle some shared cache attack vector: >From Andy Polyakov <[email protected]>: > Assembler appears slower, because it's taking code path resistant to > cache-timing attacks [on multi-core CPUs with shared cache]. http://thread.gmane.org/gmane.comp.encryption.openssl.devel/19836 OpenSSL 1.0.1 ============= - introduction OpenSSL 1.0.1 will be the first official release to support AES-NI instruction set on selected IA32 (x86) / IA32-64 (x86_64) CPUs. According to reports, the ASM version should outperform C code when AES-NI is available. See http://zombe.es/post/4059999783/openssl-outmoded-asm http://zombe.es/post/4078724716/openssl-cipher-selection - built with -g -march=native OPENSSL_ia32cap: 0x1fbae3ffffebffff OpenSSL 1.0.1-dev xx XXX xxxx built on: Tue Dec 6 15:45:43 CET 2011 platform: linux-x86_64 options: bn(64,64) rc4(16x,int) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 393359.81k 650551.77k 756817.42k 785193.64k 798966.58k aes-128 cbc 103657.40k 111393.54k 112868.01k 114050.23k 114555.60k aes-256 cbc 75946.45k 79640.42k 80535.03k 80626.35k 80993.95k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 365485.77k 620910.21k 743474.69k 783664.13k 798566.57k aes-128-cbc 616508.39k 654628.93k 667379.07k 667955.88k 670826.17k aes-256-cbc 451046.04k 471781.78k 478611.44k 478298.11k 480360.80k RC4 is clearly faster compared to OpenSSL 1.0.0. It's even faster than the C version. AES results can be disappointing: it report quite the same results than OpenSSL 1.0.0, so one can think that AES-NI is not supported/enable. In fact, it uses AES-NI only when using -evp option. Running "openssl speed" without argument don't enable AES-NI, so output of "openssl speed" could be misleading. Note: it is possible to disable AES-NI support by setting OPENSSL_ia32cap environment variable, see http://www.openssl.org/docs/crypto/OPENSSL_ia32cap.html $ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 387707.67k 638157.95k 754812.32k 784587.09k 799741.95k aes-128 cbc 103448.79k 110932.74k 112797.20k 113323.35k 114137.30k aes-256 cbc 75890.73k 79319.25k 80567.82k 80671.40k 80737.62k $ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed -evp rc4 $ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed -evp aes-128-cbc $ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 350467.72k 615278.63k 743796.61k 783338.15k 797854.22k aes-128-cbc 284524.76k 334516.81k 346131.54k 351568.55k 354776.62k aes-256-cbc 225860.68k 246779.97k 250945.41k 253871.45k 255108.20k Without argument, it doesn't change performances when AES-NI capability is disabled (clearing bit #57), so it's clear it's not used by default. With -evp, performances are reduced by half when AES-NI is no more available. In the latter case, OpenSSL is probably relying on AVX, SSE, etc. to keep good performances. - built with no-asm -g -march=native OpenSSL 1.0.1-dev xx XXX xxxx built on: Tue Dec 6 15:49:19 CET 2011 platform: linux-x86_64 options: bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -m64 -DL_ENDIAN -DTERMIO -O3 -Wall OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 505006.12k 542314.37k 554624.77k 560733.50k 559669.25k aes-128 cbc 212067.18k 218710.55k 219933.88k 219514.54k 221425.10k aes-256 cbc 162949.50k 166550.44k 167589.42k 167799.13k 167862.27k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 421951.93k 507765.55k 536764.87k 555513.86k 561615.03k aes-128-cbc 199574.42k 214917.40k 218769.89k 220301.65k 221384.01k aes-256-cbc 155335.94k 164490.94k 166929.21k 167646.89k 168420.94k - comments RC4 ASM get a lot of improvement. AES ASM get a lot of improvement too, supporting AES-NI and others new x86/x86_64 features. Even without AES-NI, OpenSSL 1.0.1 might be interesting to use to improve SSL throughput. Using the C version of algorithms instead of the ASM version is no more needed to get improved performances. But output of 'openssl speed' without arguments doesn't show the improvement, which can be misleading for users. One have to test each algorithm using -evp option. OpenSSL HEAD ============ - built with -g -march=native OPENSSL_ia32cap: 0x1fbae3ffffebffff OpenSSL 1.1.0-dev xx XXX xxxx built on: Tue Dec 6 15:51:23 CET 2011 platform: linux-x86_64 options: bn(64,64) rc4(16x,int) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 391473.78k 654590.76k 754901.53k 787076.10k 794228.05k aes-128 cbc 99483.99k 109890.43k 112946.34k 113581.06k 114472.86k aes-256 cbc 74067.31k 78761.92k 80394.27k 80624.64k 80996.69k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 344488.60k 622975.68k 744553.57k 784174.08k 795989.33k aes-128-cbc 616249.00k 654574.91k 667327.36k 667941.55k 670837.13k aes-256-cbc 450887.97k 471786.41k 478660.84k 478294.36k 480319.70k $ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 394437.10k 647182.57k 753425.38k 783471.27k 796393.47k aes-128 cbc 99345.87k 109899.18k 112962.35k 113545.90k 113797.80k aes-256 cbc 73585.58k 78777.51k 80134.66k 80881.96k 80723.97k $ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed -evp rc4 $ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed -evp aes-128-cbc $ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 343865.52k 622493.46k 745380.13k 784606.89k 798733.70k aes-128-cbc 299715.38k 339932.74k 349128.78k 352311.64k 355113.61k aes-256-cbc 225594.49k 247244.54k 251943.73k 254298.11k 255365.74k - built with no-asm -g -march=native OpenSSL 1.1.0-dev xx XXX xxxx built on: Tue Dec 6 15:55:00 CET 2011 platform: linux-x86_64 options: bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g -march=native -m64 -DL_ENDIAN -DTERMIO -O3 -Wall OPENSSLDIR: "/usr/local/ssl" $ ./openssl speed aes-256-cbc aes-128-cbc rc4 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 507304.88k 544111.91k 555745.43k 561106.94k 559917.74k aes-128 cbc 209288.98k 217863.10k 219801.34k 220673.71k 221685.38k aes-256 cbc 161791.28k 166828.93k 167001.69k 167978.67k 168727.80k $ ./openssl speed -evp rc4 $ ./openssl speed -evp aes-128-cbc $ ./openssl speed -evp aes-256-cbc type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 424801.00k 509637.67k 537982.63k 556546.39k 562431.49k aes-128-cbc 198968.55k 215421.74k 219251.59k 219529.22k 221866.21k aes-256-cbc 154743.45k 164798.51k 167183.33k 167921.66k 168673.01k - comments Results from HEAD are looking like those from 1.0.1. HEAD allows one to easily disable selected capabilities without having to "guess" which was the capabilities vector detected by OpenSSL. PS: comparing with 32bits could be interesting, but wasn't done for this report. -- Yann Droneaud OPTEYA ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
