Hi,

Here's a report about OpenSSL speed command regarding
ASM vs C and AES-NI support.

The tests were done on a HP ML110 G7 computer using one
Intel Xeon E-1220 processor (Sandy-Bridge), 3.1GHz, supporting SSE4.1/4.2, AVX, 
AES-NI

http://ark.intel.com/products/52269/
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#Xeon_E3_.28uniprocessor.29
http://en.wikipedia.org/wiki/Sandy_Bridge

All tests were run on Linux Fedora 16, x86_64 architecture, using GCC 4.6.2.

In short: openssl speed doesn't show AES-NI improvement by default.

Long version:

I have tested four branches on OpenSSL CVS with and without ASM support
in order to compare performances of RC4 and AES algorithms.
I was trying to understand why 'openssl speed' on this computer was reporting
so low values given the AES-NI instruction set available.

Let's have a look to the results and embedded comments.


OpenSSL 0.9.8 
=============

- built with -g -march=native

OPENSSL_ia32cap: 0x1fbae3fffffbffff
OpenSSL 0.9.8s xx XXX xxxx
built on: Tue Dec  6 15:34:52 CET 2011
platform: linux-x86_64
options:  bn(64,64) md2(int) rc4(1x,char) des(idx,cisc,16,int) idea(int) 
blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall 
-DMD32_REG_T=int -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM 
-DMD5_ASM -DAES_ASM
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             365479.36k   375070.91k   385547.90k   386414.59k   388161.07k
aes-128 cbc     141240.55k   219880.52k   253280.68k   265520.80k   268593.83k
aes-256 cbc     119127.67k   168873.37k   188704.28k   194237.78k   197057.33k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             330131.13k   367080.58k   383146.38k   385731.24k   388210.39k
aes-128-cbc     178643.98k   238537.07k   260175.88k   266410.67k   269840.10k
aes-256-cbc     143636.49k   179758.55k   191965.84k   195133.78k   197131.30k

- built with no-asm -g -march=native

OpenSSL 0.9.8s xx XXX xxxx
built on: Tue Dec  6 15:38:18 CET 2011
platform: linux-x86_64
options:  bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) idea(int) 
blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall 
-DMD32_REG_T=int
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             499831.86k   532782.87k   546130.96k   551370.41k   561267.07k
aes-128 cbc     196030.83k   204629.31k   206407.92k   207066.45k   207800.08k
aes-256 cbc     153582.03k   158641.12k   159015.59k   160347.78k   159937.88k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             449829.69k   527424.32k   549180.59k   559130.97k   561576.67k
aes-128-cbc     189320.74k   202767.55k   205958.08k   206878.72k   207860.36k
aes-256-cbc     149526.70k   157224.30k   159310.00k   159739.22k   160439.91k

- comments

RC4 is clearly faster when built from C and optimized by the compiler.
AES is faster when built from ASM.


OpenSSL 1.0.0
=============

- built with -g -march=native

OPENSSL_ia32cap: 0x1fbae3fffffbffff
OpenSSL 1.0.0f-dev xx XXX xxxx
built on: Tue Dec  6 15:40:12 CET 2011
platform: linux-x86_64
options:  bn(64,64) rc4(1x,char) des(idx,cisc,16,int) idea(int) blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall 
-DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DSHA1_ASM 
-DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DWHIRLPOOL_ASM
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             359001.99k   373739.22k   384728.53k   386213.55k   386670.59k
aes-128 cbc     100021.25k   110124.93k   112996.00k   113950.99k   114088.72k
aes-256 cbc      73726.15k    79047.71k    80190.89k    80632.83k    81002.17k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             322879.64k   365185.75k   382775.74k   385150.29k   388161.07k
aes-128-cbc      99804.36k   109730.71k   112919.37k   113531.22k   114186.62k
aes-256-cbc      73109.67k    78685.01k    80398.13k    80638.63k    81032.31k

- built with no-asm -g -march=native

OpenSSL 1.0.0f-dev xx XXX xxxx
built on: Tue Dec  6 15:43:43 CET 2011
platform: linux-x86_64
options:  bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall 
-DMD32_REG_T=int
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             508054.69k   544800.04k   558017.58k   561510.40k   560674.13k
aes-128 cbc     209249.52k   218106.77k   219770.69k   220843.69k   222433.49k
aes-256 cbc     161253.61k   165987.39k   167491.38k   167257.43k   168681.23k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             453767.10k   527468.14k   553088.86k   560333.48k   562595.87k
aes-128-cbc     200486.55k   214642.90k   219314.34k   219471.87k   221784.02k
aes-256-cbc     154192.97k   164381.23k   167281.96k   167910.74k   168716.84k

- comments

>From OpenSSL-0.9.8 to OpenSSL-1.0.0, when using ASM version, AES encryption
speed goes down. It's not a regression: the ASM version was tweaked to handle
some shared cache attack vector:

>From Andy Polyakov <[email protected]>:
> Assembler appears slower, because it's taking code path resistant to
> cache-timing attacks [on multi-core CPUs with shared cache].

http://thread.gmane.org/gmane.comp.encryption.openssl.devel/19836


OpenSSL 1.0.1
=============

- introduction

OpenSSL 1.0.1 will be the first official release to support AES-NI instruction 
set
on selected IA32 (x86) / IA32-64 (x86_64) CPUs.

According to reports, the ASM version should outperform C code when AES-NI is 
available. See 

    http://zombe.es/post/4059999783/openssl-outmoded-asm
    http://zombe.es/post/4078724716/openssl-cipher-selection

- built with -g -march=native

OPENSSL_ia32cap: 0x1fbae3ffffebffff
OpenSSL 1.0.1-dev xx XXX xxxx
built on: Tue Dec  6 15:45:43 CET 2011
platform: linux-x86_64
options:  bn(64,64) rc4(16x,int) des(idx,cisc,16,int) idea(int) blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             393359.81k   650551.77k   756817.42k   785193.64k   798966.58k
aes-128 cbc     103657.40k   111393.54k   112868.01k   114050.23k   114555.60k
aes-256 cbc      75946.45k    79640.42k    80535.03k    80626.35k    80993.95k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             365485.77k   620910.21k   743474.69k   783664.13k   798566.57k
aes-128-cbc     616508.39k   654628.93k   667379.07k   667955.88k   670826.17k
aes-256-cbc     451046.04k   471781.78k   478611.44k   478298.11k   480360.80k

    RC4 is clearly faster compared to OpenSSL 1.0.0.
    It's even faster than the C version.

    AES results can be disappointing: it report quite the same results than 
    OpenSSL 1.0.0, so one can think that AES-NI is not supported/enable.
    In fact, it uses AES-NI only when using -evp option.

    Running "openssl speed" without argument don't enable AES-NI,
    so output of "openssl speed" could be misleading.

    Note: it is possible to disable AES-NI support by setting OPENSSL_ia32cap
    environment variable, see

        http://www.openssl.org/docs/crypto/OPENSSL_ia32cap.html

$ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             387707.67k   638157.95k   754812.32k   784587.09k   799741.95k
aes-128 cbc     103448.79k   110932.74k   112797.20k   113323.35k   114137.30k
aes-256 cbc      75890.73k    79319.25k    80567.82k    80671.40k    80737.62k

$ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed -evp rc4
$ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed -evp aes-128-cbc
$ OPENSSL_ia32cap=0x1dbae3ffffebffff ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             350467.72k   615278.63k   743796.61k   783338.15k   797854.22k
aes-128-cbc     284524.76k   334516.81k   346131.54k   351568.55k   354776.62k
aes-256-cbc     225860.68k   246779.97k   250945.41k   253871.45k   255108.20k
    
    Without argument, it doesn't change performances when AES-NI capability
    is disabled (clearing bit #57), so it's clear it's not used by default.

    With -evp, performances are reduced by half when AES-NI is no more 
available.
    In the latter case, OpenSSL is probably relying on AVX, SSE, etc. to keep 
    good performances.

- built with no-asm -g -march=native

OpenSSL 1.0.1-dev xx XXX xxxx
built on: Tue Dec  6 15:49:19 CET 2011
platform: linux-x86_64
options:  bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -m64 -DL_ENDIAN -DTERMIO -O3 -Wall
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             505006.12k   542314.37k   554624.77k   560733.50k   559669.25k
aes-128 cbc     212067.18k   218710.55k   219933.88k   219514.54k   221425.10k
aes-256 cbc     162949.50k   166550.44k   167589.42k   167799.13k   167862.27k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             421951.93k   507765.55k   536764.87k   555513.86k   561615.03k
aes-128-cbc     199574.42k   214917.40k   218769.89k   220301.65k   221384.01k
aes-256-cbc     155335.94k   164490.94k   166929.21k   167646.89k   168420.94k

- comments

RC4 ASM get a lot of improvement.

AES ASM get a lot of improvement too,
supporting AES-NI and others new x86/x86_64 features.

Even without AES-NI, OpenSSL 1.0.1 might be interesting to use to improve SSL 
throughput.

Using the C version of algorithms instead of the ASM version is no more needed 
to get improved
performances.

But output of 'openssl speed' without arguments doesn't show the improvement,
which can be misleading for users. One have to test each algorithm using -evp 
option.


OpenSSL HEAD
============

- built with -g -march=native

OPENSSL_ia32cap: 0x1fbae3ffffebffff
OpenSSL 1.1.0-dev xx XXX xxxx
built on: Tue Dec  6 15:51:23 CET 2011
platform: linux-x86_64
options:  bn(64,64) rc4(16x,int) des(idx,cisc,16,int) idea(int) blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -Wa,--noexecstack -m64 -DL_ENDIAN -DTERMIO -O3 -Wall 
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM 
-DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             391473.78k   654590.76k   754901.53k   787076.10k   794228.05k
aes-128 cbc      99483.99k   109890.43k   112946.34k   113581.06k   114472.86k
aes-256 cbc      74067.31k    78761.92k    80394.27k    80624.64k    80996.69k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             344488.60k   622975.68k   744553.57k   784174.08k   795989.33k
aes-128-cbc     616249.00k   654574.91k   667327.36k   667941.55k   670837.13k
aes-256-cbc     450887.97k   471786.41k   478660.84k   478294.36k   480319.70k

$ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed aes-256-cbc aes-128-cbc 
rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             394437.10k   647182.57k   753425.38k   783471.27k   796393.47k
aes-128 cbc      99345.87k   109899.18k   112962.35k   113545.90k   113797.80k
aes-256 cbc      73585.58k    78777.51k    80134.66k    80881.96k    80723.97k

$ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed -evp rc4
$ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed -evp aes-128-cbc
$ OPENSSL_ia32cap=~0x0200000000000000 ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             343865.52k   622493.46k   745380.13k   784606.89k   798733.70k
aes-128-cbc     299715.38k   339932.74k   349128.78k   352311.64k   355113.61k
aes-256-cbc     225594.49k   247244.54k   251943.73k   254298.11k   255365.74k

- built with no-asm -g -march=native

OpenSSL 1.1.0-dev xx XXX xxxx
built on: Tue Dec  6 15:55:00 CET 2011
platform: linux-x86_64
options:  bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx) 
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -g 
-march=native -m64 -DL_ENDIAN -DTERMIO -O3 -Wall
OPENSSLDIR: "/usr/local/ssl"

$ ./openssl speed aes-256-cbc aes-128-cbc rc4

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             507304.88k   544111.91k   555745.43k   561106.94k   559917.74k
aes-128 cbc     209288.98k   217863.10k   219801.34k   220673.71k   221685.38k
aes-256 cbc     161791.28k   166828.93k   167001.69k   167978.67k   168727.80k

$ ./openssl speed -evp rc4
$ ./openssl speed -evp aes-128-cbc
$ ./openssl speed -evp aes-256-cbc

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             424801.00k   509637.67k   537982.63k   556546.39k   562431.49k
aes-128-cbc     198968.55k   215421.74k   219251.59k   219529.22k   221866.21k
aes-256-cbc     154743.45k   164798.51k   167183.33k   167921.66k   168673.01k

- comments

Results from HEAD are looking like those from 1.0.1.

HEAD allows one to easily disable selected capabilities without having to 
"guess"
which was the capabilities vector detected by OpenSSL.


PS: comparing with 32bits could be interesting, but wasn't done for this report.


-- 
Yann Droneaud
OPTEYA
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to