Hi Jan, Yeah, need option " -elapsed" because OpenSSL counts user time instead of total time(user+sys time) without this option. You can see: * aes-128-cbc and sha1 are accelerated by HW engine. I believe speed is faster for openvpn dco module because it uses the HW engine in kernel space and bypasses the path between openssl and cryptodev. * aes-128-gcm is NOT accelerated by HW engine. * aes-128-ccm is NOT accelerated by HW engine but it seems that it is accelerated by HW instruction or other. I don't know my device has such function. SoC type is al314.
With cryptodev: # openssl speed -evp aes-128-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 252783 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 253044 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 251746 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 190306 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 122657 aes-128-cbc's in 3.00s ...................... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1348.18k 5398.27k 21482.33k 64957.78k 334935.38k # openssl speed -evp aes-128-gcm -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 3509485 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 900678 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 228961 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 57475 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 7189 aes-128-gcm's in 3.00s .................. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 18717.25k 19214.46k 19538.01k 19618.13k 19630.76k # openssl speed -evp aes-128-ccm -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-ccm for 3s on 16 size blocks: 10179383 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 64 size blocks: 10179215 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 256 size blocks: 10179785 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 1024 size blocks: 10182095 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 8192 size blocks: 10179225 aes-128-ccm's in 3.00s .................. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm 54290.04k 217156.59k 868674.99k 3475488.43k 27796070.40k # openssl speed -evp sha1 -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing sha1 for 3s on 16 size blocks: 95252 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 95166 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 76177 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 68799 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 53034 sha1's in 3.00s ................. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 508.01k 2030.21k 6500.44k 23483.39k 144818.18k Without cryptodev: # openssl speed -evp aes-128-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 9235207 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 2498066 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 645288 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 161372 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 20385 aes-128-cbc's in 3.00s ................ type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 49254.44k 53292.07k 55064.58k 55081.64k 55664.64k # openssl speed -evp aes-128-gcm -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 3507422 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 901036 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 228857 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 57411 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 7188 aes-128-gcm's in 3.00s ................ type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 18706.25k 19222.10k 19529.13k 19596.29k 19628.03k # openssl speed -evp aes-128-ccm -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-ccm for 3s on 16 size blocks: 10170897 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 64 size blocks: 10167692 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 256 size blocks: 10166117 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 1024 size blocks: 10167095 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 8192 size blocks: 10172046 aes-128-ccm's in 3.00s ................. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm 54244.78k 216910.76k 867508.65k 3470368.43k 27776466.94k openssl speed -evp sha1 -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing sha1 for 3s on 16 size blocks: 1877571 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 1250523 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 603090 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 198963 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 27380 sha1's in 3.00s ............... type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 10013.71k 26677.82k 51463.68k 67912.70k 74765.65k Tony Jan Just Keijser <janj...@nikhef.nl> 于2020年12月2日周三 下午11:24写道: > Hi Tony, > > On 02/12/20 15:51, Jan Just Keijser wrote: > > > On 02/12/20 15:22, Tony He wrote: > > Hi Jan, > > Welcome to join the discussion. > > >the second set of numbers doesn't make sense, and a much better test is > to do an actual encryption test > I don't compile cryptodev kernel module for my PC and can not reproduce > this issue for now. You don't understand the reason why the performance > is much worse with cryptodev module for *big* blocks, right? > If yes, I guess the reason maybe kernel assign the work to multi cores > while OpenSSL uses one core. Would you share the output of command "mpstat > -P ALL 2"? > > sure, while using the cryptodev I see this: > > 15:28:36 CPU %usr %nice %sys %iowait %irq %soft %steal > %guest %gnice %idle > 15:28:38 all 1.87 0.00 23.19 0.12 0.00 0.00 > 0.00 0.00 0.00 74.81 > 15:28:38 0 0.00 0.00 0.00 0.50 0.00 0.00 > 0.00 0.00 0.00 99.50 > 15:28:38 1 7.00 0.00 93.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > 15:28:38 2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 100.00 > 15:28:38 3 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 100.00 > > 15:28:38 CPU %usr %nice %sys %iowait %irq %soft %steal > %guest %gnice %idle > 15:28:40 all 0.75 0.00 24.19 0.00 0.00 0.00 > 0.00 0.00 0.00 75.06 > 15:28:40 0 0.00 0.00 0.00 0.50 0.00 0.00 > 0.00 0.00 0.00 99.50 > 15:28:40 1 3.50 0.00 96.50 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > 15:28:40 2 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 100.00 > 15:28:40 3 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 100.00 > > on a 4 core box; this means that 1 core is used 100% (which is what I > expected). > > > I suspect the main reason the cryptodev results on my i5-6800 go off the > rails is due to this: > (look at the "Doing aes-128-cbc lines") > > $ ./openssl speed -evp aes-128-cbc > Doing aes-128-cbc for 3s on 16 size blocks: 2835368 aes-128-cbc's in 1.14s > Doing aes-128-cbc for 3s on 64 size blocks: 2720745 aes-128-cbc's in 1.01s > Doing aes-128-cbc for 3s on 256 size blocks: 2377830 aes-128-cbc's in > *0.74s* > Doing aes-128-cbc for 3s on 1024 size blocks: 1538693 aes-128-cbc's in > *0.40s* > Doing aes-128-cbc for 3s on 8192 size blocks: 370202 aes-128-cbc's in > *0.11s* > OpenSSL 1.0.2m 2 Nov 2017 > built on: reproducible build, date unspecified > options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) > blowfish(idx) > compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT > -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS > -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 > -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m > -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM > -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 > bytes > aes-128-cbc 39794.64k 172403.64k 822600.65k 3939054.08k > 27569952.58k > > > The timing for how quickly the results are returned are way off and > probably just wrong. The Openssl speed test is supposed to run for 3 > seconds. The actual results returned for 8192 byte blocks is > > Doing aes-128-cbc for 3s on 8192 size blocks: 370202 aes-128-cbc's in > *0.11s* > > whereas without cryptodev I see > > Doing aes-128-cbc for 3s on 8192 size blocks: 457255 aes-128-cbc's in > *3.00s* > > So you can see that without cryptodev the i5-6800 actually says it's doing > more blocks (457,255 vs 370,202) but with cryptodev it is doing it in WAY > less time. This leads me to believe the openssl speed code when using > cryptodev just "goes wrong". > It will be very interesting to see what the encryption test will bring - > that is a much better real-life-like example than a simple speed test. > > as a follow-up : someone whispered in my ear (thanks, André ;) ) that one > should use the -elapsed option for this, so here are new results: > > *with* cryptodev: > > ./openssl speed -evp aes-128-cbc -elapsed > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-128-cbc for 3s on 16 size blocks: 2825786 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 64 size blocks: 2716822 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 256 size blocks: 2369723 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 1024 size blocks: 1536054 aes-128-cbc's in > 3.00s > Doing aes-128-cbc for 3s on 8192 size blocks: 369984 aes-128-cbc's in 3.00s > [...] > aes-128-cbc 15,070.86k 57,958.87k 202,216.36k 524,306.43k > 1,010,302.98k > > *without* cryptodev: > > $ openssl speed -evp aes-128-cbc -elapsed > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-128-cbc for 3s on 16 size blocks: 207188725 aes-128-cbc's in > 3.00s > Doing aes-128-cbc for 3s on 64 size blocks: 56855717 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 256 size blocks: 14382122 aes-128-cbc's in > 3.00s > Doing aes-128-cbc for 3s on 1024 size blocks: 3618996 aes-128-cbc's in > 3.00s > Doing aes-128-cbc for 3s on 8192 size blocks: 456727 aes-128-cbc's in 3.00s > [...] > aes-128-cbc 1,105,006.53k 1,212,921.96k 1,227,274.41k 1,235,283.97k > 1,247,169.19k > > which more or less reflects the encryption test results I posted earlier. > The question becomes, what are you results when using the -elapsed flag? > > JJK > > > >My advice is to rerun your tests *without* the cryptodev module and then > decide wheter you really need CBC+CCM hmacs. > Yes, I confirm that without the cryptodev the performance is very bad for > my device. I don't have that device in my hand right now. But I saved one > aes-256-cbc result in my web notebook as below: > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256-cbc 19626.95k 24289.71k 25054.46k 25347.75k 25337.86k > Please note, there are two modes to accelerate encryption/decryption. > 1. HW instructions like intel x86 CPU. > 2. Using a crypto engine. > When your device is 2 and its CPU is not powerful, normally with cryptodev > speed is much faster at least for big blocks. Maybe for small blocks it's > slower because > it needs the time to push the work to kernel and then HW engine and the > time spent is may longer than the time costed by OpenSSL directly does the > encryption/decryption. > Tony > > Jan Just Keijser <janj...@nikhef.nl> 于2020年12月2日周三 下午7:24写道: > >> hi Tony, >> >> On 01/12/20 02:50, Tony He wrote: >> >> Hi Arne, >> >> openssl speed -evp aes-128-cbc >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc >> 20035.60k 123261.54k 267081.60k 1094764.09k 9181370.18k >> openssl speed -evp aes-128-gcm >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm >> 18738.76k 19284.91k 19524.44k 19606.87k 19685.46k >> openssl speed -evp aes-128-ccm >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm >> 53859.07k 215581.12k 862070.02k 3460786.43k 27566347.61k >> openssl speed -evp sha1 >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 3108.57k >> 12177.79k 57325.18k 181610.34k 1207364.27k >> openssl speed -evp chacha20-poly1305 >> chacha20-poly1305 is an unknown cipher or digest >> Using old openssl, so chacha20-poly1305 is not supported. >> >> >> these numbers look suspiciously like you're using the linux cryptodev >> module. Openssl speed results for the linux cryptodev module are totally >> unreliable and I'd even go so far as to say that the *only* numbers I trust >> in the output above are for aes-128-gcm >> >> For example, if I do the same on an i5-6800 I get *without* the cryptodev >> module: >> $ openssl speed -evp aes-128-cbc >> aes-128-cbc 1,104,599.38k 1,208,651.07k 1,231,766.70k >> 1,237,545.64k 1,248,793.94k >> >> and with the module I get >> aes-128-cbc 45,087.41k 127,822.72k 581,517.17k 2,256,593.19k >> 27,583,804.51k >> >> the second set of numbers doesn't make sense, and a much better test is >> to do an actual encryption test, e.g. >> >> *without* the module >> cat BIGFILE | openssl aes-256-cbc -e -pass pass:thisisabadpassword | pv >> > /dev/null >> 2.93GB 0:00:05 [ 549MB/s] [ >> <=> >> ] >> >> ('pv' aka 'pipeview' is a handy tool to measure the throughput of a UNIX >> pipe) >> >> and with the module: >> cat BIGFILE | ./openssl aes-256-cbc -e -pass pass:thisisabadpassword >> -engine cryptodev| pv > /dev/null >> engine "cryptodev" set. >> 2.93GB 0:00:07 [ 426MB/s] [ <=> >> >> so you see that using the cryptodev module actually slows things down - >> which is to be expected, as the application needs to do more work using the >> cryptodev module. >> >> My advice is to rerun your tests *without* the cryptodev module and then >> decide wheter you really need CBC+CCM hmacs. >> >> HTH, >> >> JJK >> >> >> Arne Schwabe <a...@rfc2549.org> 于2020年11月26日周四 下午6:40写道: >> >>> Am 26.11.20 um 10:41 schrieb Tony He: >>> > Hi Arne, >>> > >>> >>Since the original thread was not on the mailing list I am missing your >>> >>goal but if your crypto acelator already works with OpenSSL, then it >>> >>will also work with the "normal" OpenVPN >>> > >>> > Yes, it wokrs with "normal" OpenVPN(OpenVPN2), but according to the >>> test >>> > result, it's still not fast(about 60Mbps). >>> > The bottleneck is not encryption operation any more. It comes from the >>> > switch of user space and kernel space in the OpenVPN2, >>> > which makes the poor CPU of embedded device very busy. That's why we >>> > need OpenVPN3 running in the kernel space. >>> >>> >>> What numbers are we are talking in crypto speed? Could you provide from >>> your "poor" device: >>> >>> >>> openssl speed -evp aes-128-cbc >>> openssl speed -evp aes-128-gcm >>> openssl speed -evp aes-128-ccm >>> openssl speed -evp sha1 >>> openssl speed -evp chacha20-poly1305 >>> >>> I want to what difference/gain in terms of raw crypto speed we are >>> talking here. >>> >>> Arne >>> >>> >>> >> >> >> >> _______________________________________________ >> Openvpn-devel mailing >> listOpenvpn-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/openvpn-devel >> >> >> > >
_______________________________________________ Openvpn-devel mailing list Openvpn-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openvpn-devel