Hi Jan,

Yeah, need option " -elapsed" because OpenSSL counts user time instead of
total time(user+sys time) without this option. You can see:
* aes-128-cbc and sha1 are accelerated by HW engine. I believe speed is
faster for openvpn dco module because it uses the HW engine in kernel space
and bypasses the path between openssl and cryptodev.
* aes-128-gcm is NOT accelerated by HW engine.
* aes-128-ccm is NOT accelerated by HW engine but it seems that it is
accelerated by HW instruction or other. I don't know my device has such
function. SoC type is al314.


With cryptodev: # openssl speed -evp aes-128-cbc -elapsed You have chosen
to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s
on 16 size blocks: 252783 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s
on 64 size blocks: 253044 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s
on 256 size blocks: 251746 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s
on 1024 size blocks: 190306 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s
on 8192 size blocks: 122657 aes-128-cbc's in 3.00s ......................
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1348.18k
5398.27k 21482.33k 64957.78k 334935.38k # openssl speed -evp aes-128-gcm
-elapsed You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 3509485 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 900678 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 256 size blocks: 228961 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 57475 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 7189 aes-128-gcm's in 3.00s
.................. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-gcm 18717.25k 19214.46k 19538.01k 19618.13k 19630.76k
# openssl speed -evp aes-128-ccm -elapsed You have chosen to measure
elapsed time instead of user CPU time. Doing aes-128-ccm for 3s on 16 size
blocks: 10179383 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 64 size
blocks: 10179215 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 256
size blocks: 10179785 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on
1024 size blocks: 10182095 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s
on 8192 size blocks: 10179225 aes-128-ccm's in 3.00s ..................
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm
54290.04k 217156.59k 868674.99k 3475488.43k 27796070.40k # openssl speed
-evp sha1 -elapsed You have chosen to measure elapsed time instead of user
CPU time. Doing sha1 for 3s on 16 size blocks: 95252 sha1's in 3.00s Doing
sha1 for 3s on 64 size blocks: 95166 sha1's in 3.00s Doing sha1 for 3s on
256 size blocks: 76177 sha1's in 3.00s Doing sha1 for 3s on 1024 size
blocks: 68799 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 53034
sha1's in 3.00s ................. type 16 bytes 64 bytes 256 bytes 1024
bytes 8192 bytes sha1 508.01k 2030.21k 6500.44k 23483.39k 144818.18k

Without cryptodev:
# openssl speed -evp aes-128-cbc -elapsed You have chosen to measure
elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size
blocks: 9235207 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size
blocks: 2498066 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size
blocks: 645288 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size
blocks: 161372 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size
blocks: 20385 aes-128-cbc's in 3.00s ................ type 16 bytes 64
bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 49254.44k 53292.07k
55064.58k 55081.64k 55664.64k
# openssl speed -evp aes-128-gcm -elapsed You have chosen to measure
elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size
blocks: 3507422 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size
blocks: 901036 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size
blocks: 228857 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size
blocks: 57411 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size
blocks: 7188 aes-128-gcm's in 3.00s ................ type 16 bytes 64 bytes
256 bytes 1024 bytes 8192 bytes aes-128-gcm 18706.25k 19222.10k 19529.13k
19596.29k 19628.03k
# openssl speed -evp aes-128-ccm -elapsed You have chosen to measure
elapsed time instead of user CPU time. Doing aes-128-ccm for 3s on 16 size
blocks: 10170897 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 64 size
blocks: 10167692 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 256
size blocks: 10166117 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on
1024 size blocks: 10167095 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s
on 8192 size blocks: 10172046 aes-128-ccm's in 3.00s ................. type
16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm 54244.78k
216910.76k 867508.65k 3470368.43k 27776466.94k
openssl speed -evp sha1 -elapsed You have chosen to measure elapsed time
instead of user CPU time. Doing sha1 for 3s on 16 size blocks: 1877571
sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 1250523 sha1's in
3.00s Doing sha1 for 3s on 256 size blocks: 603090 sha1's in 3.00s Doing
sha1 for 3s on 1024 size blocks: 198963 sha1's in 3.00s Doing sha1 for 3s
on 8192 size blocks: 27380 sha1's in 3.00s ............... type 16 bytes 64
bytes 256 bytes 1024 bytes 8192 bytes sha1 10013.71k 26677.82k 51463.68k
67912.70k 74765.65k

Tony

Jan Just Keijser <janj...@nikhef.nl> 于2020年12月2日周三 下午11:24写道:

> Hi Tony,
>
> On 02/12/20 15:51, Jan Just Keijser wrote:
>
>
> On 02/12/20 15:22, Tony He wrote:
>
> Hi Jan,
>
> Welcome to join the discussion.
>
> >the second set of numbers doesn't make sense, and a much better test is
> to do an actual encryption test
> I don't compile cryptodev kernel module for my PC and can not reproduce
> this issue for now.  You don't understand  the reason why the performance
> is much worse with cryptodev module for *big* blocks, right?
> If yes, I guess the reason maybe kernel assign the work to multi cores
> while OpenSSL uses one core. Would you share the output of command "mpstat
> -P ALL 2"?
>
> sure, while using the cryptodev I see this:
>
> 15:28:36     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
> %guest  %gnice   %idle
> 15:28:38     all    1.87    0.00   23.19    0.12    0.00    0.00
> 0.00    0.00    0.00   74.81
> 15:28:38       0    0.00    0.00    0.00    0.50    0.00    0.00
> 0.00    0.00    0.00   99.50
> 15:28:38       1    7.00    0.00   93.00    0.00    0.00    0.00
> 0.00    0.00    0.00    0.00
> 15:28:38       2    0.00    0.00    0.00    0.00    0.00    0.00
> 0.00    0.00    0.00  100.00
> 15:28:38       3    0.00    0.00    0.00    0.00    0.00    0.00
> 0.00    0.00    0.00  100.00
>
> 15:28:38     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
> %guest  %gnice   %idle
> 15:28:40     all    0.75    0.00   24.19    0.00    0.00    0.00
> 0.00    0.00    0.00   75.06
> 15:28:40       0    0.00    0.00    0.00    0.50    0.00    0.00
> 0.00    0.00    0.00   99.50
> 15:28:40       1    3.50    0.00   96.50    0.00    0.00    0.00
> 0.00    0.00    0.00    0.00
> 15:28:40       2    0.00    0.00    0.00    0.00    0.00    0.00
> 0.00    0.00    0.00  100.00
> 15:28:40       3    0.00    0.00    0.00    0.00    0.00    0.00
> 0.00    0.00    0.00  100.00
>
> on a 4 core box; this means that 1 core is used 100% (which is what I
> expected).
>
>
> I suspect the main reason the cryptodev results on my i5-6800 go off the
> rails is due to this:
> (look at the "Doing aes-128-cbc lines")
>
> $ ./openssl speed -evp aes-128-cbc
> Doing aes-128-cbc for 3s on 16 size blocks: 2835368 aes-128-cbc's in 1.14s
> Doing aes-128-cbc for 3s on 64 size blocks: 2720745 aes-128-cbc's in 1.01s
> Doing aes-128-cbc for 3s on 256 size blocks: 2377830 aes-128-cbc's in
> *0.74s*
> Doing aes-128-cbc for 3s on 1024 size blocks: 1538693 aes-128-cbc's in
> *0.40s*
> Doing aes-128-cbc for 3s on 8192 size blocks: 370202 aes-128-cbc's in
> *0.11s*
> OpenSSL 1.0.2m  2 Nov 2017
> built on: reproducible build, date unspecified
> options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int)
> blowfish(idx)
> compiler: gcc -I. -I.. -I../include  -DOPENSSL_THREADS -D_REENTRANT
> -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
> -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2
> -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m
> -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM
> -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
> The 'numbers' are in 1000s of bytes per second processed.
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes
> aes-128-cbc      39794.64k   172403.64k   822600.65k  3939054.08k
> 27569952.58k
>
>
> The timing for how quickly the results are returned are way off and
> probably just wrong. The Openssl speed test is supposed to run for 3
> seconds. The actual results returned for 8192 byte blocks is
>
> Doing aes-128-cbc for 3s on 8192 size blocks: 370202 aes-128-cbc's in
> *0.11s*
>
> whereas without cryptodev I see
>
> Doing aes-128-cbc for 3s on 8192 size blocks: 457255 aes-128-cbc's in
> *3.00s*
>
> So you can see that without cryptodev the i5-6800 actually says it's doing
> more blocks (457,255 vs 370,202) but with cryptodev it is doing it in WAY
> less time.  This leads me to believe the openssl speed code when using
> cryptodev just "goes wrong".
> It will be very interesting to see what the encryption test will bring -
> that is a much better real-life-like example than a simple speed test.
>
> as a follow-up : someone whispered in my ear (thanks, André ;) ) that one
> should use the -elapsed option for this, so here are new results:
>
> *with* cryptodev:
>
> ./openssl speed -evp aes-128-cbc -elapsed
> You have chosen to measure elapsed time instead of user CPU time.
> Doing aes-128-cbc for 3s on 16 size blocks: 2825786 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 64 size blocks: 2716822 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 256 size blocks: 2369723 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 1024 size blocks: 1536054 aes-128-cbc's in
> 3.00s
> Doing aes-128-cbc for 3s on 8192 size blocks: 369984 aes-128-cbc's in 3.00s
> [...]
> aes-128-cbc      15,070.86k    57,958.87k   202,216.36k   524,306.43k
> 1,010,302.98k
>
> *without* cryptodev:
>
> $ openssl speed -evp aes-128-cbc -elapsed
> You have chosen to measure elapsed time instead of user CPU time.
> Doing aes-128-cbc for 3s on 16 size blocks: 207188725 aes-128-cbc's in
> 3.00s
> Doing aes-128-cbc for 3s on 64 size blocks: 56855717 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 256 size blocks: 14382122 aes-128-cbc's in
> 3.00s
> Doing aes-128-cbc for 3s on 1024 size blocks: 3618996 aes-128-cbc's in
> 3.00s
> Doing aes-128-cbc for 3s on 8192 size blocks: 456727 aes-128-cbc's in 3.00s
> [...]
> aes-128-cbc    1,105,006.53k  1,212,921.96k  1,227,274.41k  1,235,283.97k
> 1,247,169.19k
>
> which more or less reflects the encryption test results I posted earlier.
> The question becomes, what are you results when using the -elapsed flag?
>
> JJK
>
>
> >My advice is to rerun your tests *without* the cryptodev module and then
> decide wheter you really need CBC+CCM hmacs.
> Yes, I confirm that without the cryptodev the performance is very bad for
> my device. I don't have that device in my hand right now. But I saved one
> aes-256-cbc result in my web notebook as below:
>
> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
> aes-256-cbc 19626.95k 24289.71k 25054.46k 25347.75k 25337.86k
> Please note, there are two modes to accelerate encryption/decryption.
> 1. HW instructions like intel x86 CPU.
> 2. Using a crypto engine.
> When your device is 2 and its CPU is not powerful, normally with cryptodev
> speed is much faster at least for big blocks. Maybe for small blocks it's
> slower because
> it needs the time to push the work to kernel and then HW engine and the
> time spent is may longer than the time costed by OpenSSL directly does the
> encryption/decryption.
> Tony
>
> Jan Just Keijser <janj...@nikhef.nl> 于2020年12月2日周三 下午7:24写道:
>
>> hi Tony,
>>
>> On 01/12/20 02:50, Tony He wrote:
>>
>> Hi Arne,
>>
>> openssl speed -evp aes-128-cbc
>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc
>> 20035.60k 123261.54k 267081.60k 1094764.09k 9181370.18k
>> openssl speed -evp aes-128-gcm
>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm
>> 18738.76k 19284.91k 19524.44k 19606.87k 19685.46k
>> openssl speed -evp aes-128-ccm
>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm
>> 53859.07k 215581.12k 862070.02k 3460786.43k 27566347.61k
>> openssl speed -evp sha1
>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 3108.57k
>> 12177.79k 57325.18k 181610.34k 1207364.27k
>> openssl speed -evp chacha20-poly1305
>> chacha20-poly1305 is an unknown cipher or digest
>> Using old openssl, so chacha20-poly1305 is not supported.
>>
>>
>> these numbers look suspiciously like you're using the linux cryptodev
>> module. Openssl speed results for the linux cryptodev module are totally
>> unreliable and I'd even go so far as to say that the *only* numbers I trust
>> in the output above are for aes-128-gcm
>>
>> For example, if I do the same on an i5-6800 I get *without* the cryptodev
>> module:
>>   $ openssl speed -evp aes-128-cbc
>>   aes-128-cbc    1,104,599.38k  1,208,651.07k  1,231,766.70k
>> 1,237,545.64k  1,248,793.94k
>>
>> and with the module I get
>>   aes-128-cbc      45,087.41k   127,822.72k   581,517.17k  2,256,593.19k
>> 27,583,804.51k
>>
>> the second set of numbers doesn't make sense, and a much better test is
>> to do an actual encryption test, e.g.
>>
>> *without* the module
>> cat BIGFILE | openssl aes-256-cbc -e -pass  pass:thisisabadpassword |  pv
>> > /dev/null
>> 2.93GB 0:00:05 [ 549MB/s] [
>> <=>
>> ]
>>
>> ('pv' aka 'pipeview' is a handy tool to measure the throughput of a UNIX
>> pipe)
>>
>> and with the module:
>> cat BIGFILE | ./openssl aes-256-cbc -e -pass  pass:thisisabadpassword
>> -engine cryptodev|  pv > /dev/null
>> engine "cryptodev" set.
>> 2.93GB 0:00:07 [ 426MB/s] [              <=>
>>
>> so you see that using the cryptodev module actually slows things down -
>> which is to be expected, as the application needs to do more work using the
>> cryptodev module.
>>
>> My advice is to rerun your tests *without* the cryptodev module and then
>> decide wheter you really need CBC+CCM hmacs.
>>
>> HTH,
>>
>> JJK
>>
>>
>> Arne Schwabe <a...@rfc2549.org> 于2020年11月26日周四 下午6:40写道:
>>
>>> Am 26.11.20 um 10:41 schrieb Tony He:
>>> > Hi Arne,
>>> >
>>> >>Since the original thread was not on the mailing list I am missing your
>>> >>goal but if your crypto acelator already works with OpenSSL, then it
>>> >>will also work with the "normal" OpenVPN
>>> >
>>> > Yes, it wokrs with "normal" OpenVPN(OpenVPN2), but according to the
>>> test
>>> > result, it's still not fast(about 60Mbps).
>>> > The bottleneck is not encryption operation any more. It comes from the
>>> > switch of user space and kernel space in the OpenVPN2,
>>> > which makes the poor CPU of  embedded device very busy. That's why we
>>> > need OpenVPN3 running in the kernel space.
>>>
>>>
>>> What numbers are we are talking in crypto speed? Could you provide from
>>> your "poor" device:
>>>
>>>
>>> openssl speed -evp aes-128-cbc
>>> openssl speed -evp aes-128-gcm
>>> openssl speed -evp aes-128-ccm
>>> openssl speed -evp sha1
>>> openssl speed -evp chacha20-poly1305
>>>
>>> I want to what difference/gain in terms of raw crypto speed we are
>>> talking here.
>>>
>>> Arne
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Openvpn-devel mailing 
>> listOpenvpn-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/openvpn-devel
>>
>>
>>
>
>
_______________________________________________
Openvpn-devel mailing list
Openvpn-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openvpn-devel

Reply via email to