Hi Jan, >what HW engine is this? I think your best bet is to actually get the engine to support GCM; with AES and SHA acceleration in place there is very little to stop the HW engine from not being able to support GCM.. The HW engine is a part of SoC al314. It connects with A15 CPU via PCI in SoC. Chip vendor will not support GCM due to all kinds of reasons.
>the numbers do suggest some form of cryptodev acceleration - can you unload the cryptodev module or block access to it (e.g. chmod 000 /dev/crypto) ? In my second set of test numbers, I uploaded the cryptodev moduled. You can see the CCM performance is almost same. Tony Jan Just Keijser <janj...@nikhef.nl> 于2020年12月4日周五 下午5:49写道: > Hi Tony, > > On 04/12/20 08:41, Tony He wrote: > > Hi Jan, > Yeah, need option " -elapsed" because OpenSSL counts user time instead of > total time(user+sys time) without this option. You can see: > * aes-128-cbc and sha1 are accelerated by HW engine. I believe speed is > faster for openvpn dco module because it uses the HW engine in kernel space > and bypasses the path between openssl and cryptodev. > > that is correct the openvpn dco module sits in kernel space and does need > to pass the userspace<->kernelspace barrier and thus should have better > performance > > * aes-128-gcm is NOT accelerated by HW engine. > > what HW engine is this? I think your best bet is to actually get the > engine to support GCM; with AES and SHA acceleration in place there is very > little to stop the HW engine from not being able to support GCM... > > * aes-128-ccm is NOT accelerated by HW engine but it seems that it is > accelerated by HW instruction or other. I don't know my device has such > function. SoC type is al314. > > the numbers do suggest some form of cryptodev acceleration - can you > unload the cryptodev module or block access to it (e.g. chmod 000 > /dev/crypto) ? > > The AL314 is a quad core Cortex A15 CPU @ 1.7 GHz ; the numbers *without* > cryptodev look about right for that particular CPU. > > Most modern crypto packages use AES-GCM or chacha20-poly1305 as they are > considered more secure. CBC is considered a bit outdated and as far as I > know no openvpn release supports CCM thus far (which is a shame, really). > > HTH, > > JJK > > > > With cryptodev: # openssl speed -evp aes-128-cbc -elapsed You have chosen > to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s > on 16 size blocks: 252783 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s > on 64 size blocks: 253044 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s > on 256 size blocks: 251746 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s > on 1024 size blocks: 190306 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s > on 8192 size blocks: 122657 aes-128-cbc's in 3.00s ...................... > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1348.18k > 5398.27k 21482.33k 64957.78k 334935.38k # openssl speed -evp aes-128-gcm > -elapsed You have chosen to measure elapsed time instead of user CPU time. > Doing aes-128-gcm for 3s on 16 size blocks: 3509485 aes-128-gcm's in 3.00s > Doing aes-128-gcm for 3s on 64 size blocks: 900678 aes-128-gcm's in 3.00s > Doing aes-128-gcm for 3s on 256 size blocks: 228961 aes-128-gcm's in 3.00s > Doing aes-128-gcm for 3s on 1024 size blocks: 57475 aes-128-gcm's in 3.00s > Doing aes-128-gcm for 3s on 8192 size blocks: 7189 aes-128-gcm's in 3.00s > .................. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-128-gcm 18717.25k 19214.46k 19538.01k 19618.13k 19630.76k > # openssl speed -evp aes-128-ccm -elapsed You have chosen to measure > elapsed time instead of user CPU time. Doing aes-128-ccm for 3s on 16 size > blocks: 10179383 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 64 size > blocks: 10179215 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 256 > size blocks: 10179785 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on > 1024 size blocks: 10182095 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s > on 8192 size blocks: 10179225 aes-128-ccm's in 3.00s .................. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm > 54290.04k 217156.59k 868674.99k 3475488.43k 27796070.40k # openssl speed > -evp sha1 -elapsed You have chosen to measure elapsed time instead of user > CPU time. Doing sha1 for 3s on 16 size blocks: 95252 sha1's in 3.00s Doing > sha1 for 3s on 64 size blocks: 95166 sha1's in 3.00s Doing sha1 for 3s on > 256 size blocks: 76177 sha1's in 3.00s Doing sha1 for 3s on 1024 size > blocks: 68799 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 53034 > sha1's in 3.00s ................. type 16 bytes 64 bytes 256 bytes 1024 > bytes 8192 bytes sha1 508.01k 2030.21k 6500.44k 23483.39k 144818.18k > Without cryptodev: > # openssl speed -evp aes-128-cbc -elapsed You have chosen to measure > elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size > blocks: 9235207 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size > blocks: 2498066 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size > blocks: 645288 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size > blocks: 161372 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size > blocks: 20385 aes-128-cbc's in 3.00s ................ type 16 bytes 64 > bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 49254.44k 53292.07k > 55064.58k 55081.64k 55664.64k > # openssl speed -evp aes-128-gcm -elapsed You have chosen to measure > elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size > blocks: 3507422 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size > blocks: 901036 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size > blocks: 228857 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size > blocks: 57411 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size > blocks: 7188 aes-128-gcm's in 3.00s ................ type 16 bytes 64 bytes > 256 bytes 1024 bytes 8192 bytes aes-128-gcm 18706.25k 19222.10k 19529.13k > 19596.29k 19628.03k > # openssl speed -evp aes-128-ccm -elapsed You have chosen to measure > elapsed time instead of user CPU time. Doing aes-128-ccm for 3s on 16 size > blocks: 10170897 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 64 size > blocks: 10167692 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on 256 > size blocks: 10166117 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s on > 1024 size blocks: 10167095 aes-128-ccm's in 3.00s Doing aes-128-ccm for 3s > on 8192 size blocks: 10172046 aes-128-ccm's in 3.00s ................. type > 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm 54244.78k > 216910.76k 867508.65k 3470368.43k 27776466.94k > openssl speed -evp sha1 -elapsed You have chosen to measure elapsed time > instead of user CPU time. Doing sha1 for 3s on 16 size blocks: 1877571 > sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 1250523 sha1's in > 3.00s Doing sha1 for 3s on 256 size blocks: 603090 sha1's in 3.00s Doing > sha1 for 3s on 1024 size blocks: 198963 sha1's in 3.00s Doing sha1 for 3s > on 8192 size blocks: 27380 sha1's in 3.00s ............... type 16 bytes 64 > bytes 256 bytes 1024 bytes 8192 bytes sha1 10013.71k 26677.82k 51463.68k > 67912.70k 74765.65k > Tony > > Jan Just Keijser <janj...@nikhef.nl> 于2020年12月2日周三 下午11:24写道: > >> Hi Tony, >> >> On 02/12/20 15:51, Jan Just Keijser wrote: >> >> >> On 02/12/20 15:22, Tony He wrote: >> >> Hi Jan, >> >> Welcome to join the discussion. >> >> >the second set of numbers doesn't make sense, and a much better test is >> to do an actual encryption test >> I don't compile cryptodev kernel module for my PC and can not reproduce >> this issue for now. You don't understand the reason why the performance >> is much worse with cryptodev module for *big* blocks, right? >> If yes, I guess the reason maybe kernel assign the work to multi cores >> while OpenSSL uses one core. Would you share the output of command "mpstat >> -P ALL 2"? >> >> sure, while using the cryptodev I see this: >> >> 15:28:36 CPU %usr %nice %sys %iowait %irq %soft %steal >> %guest %gnice %idle >> 15:28:38 all 1.87 0.00 23.19 0.12 0.00 0.00 >> 0.00 0.00 0.00 74.81 >> 15:28:38 0 0.00 0.00 0.00 0.50 0.00 0.00 >> 0.00 0.00 0.00 99.50 >> 15:28:38 1 7.00 0.00 93.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> 15:28:38 2 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 100.00 >> 15:28:38 3 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 100.00 >> >> 15:28:38 CPU %usr %nice %sys %iowait %irq %soft %steal >> %guest %gnice %idle >> 15:28:40 all 0.75 0.00 24.19 0.00 0.00 0.00 >> 0.00 0.00 0.00 75.06 >> 15:28:40 0 0.00 0.00 0.00 0.50 0.00 0.00 >> 0.00 0.00 0.00 99.50 >> 15:28:40 1 3.50 0.00 96.50 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> 15:28:40 2 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 100.00 >> 15:28:40 3 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 100.00 >> >> on a 4 core box; this means that 1 core is used 100% (which is what I >> expected). >> >> >> I suspect the main reason the cryptodev results on my i5-6800 go off the >> rails is due to this: >> (look at the "Doing aes-128-cbc lines") >> >> $ ./openssl speed -evp aes-128-cbc >> Doing aes-128-cbc for 3s on 16 size blocks: 2835368 aes-128-cbc's in 1.14s >> Doing aes-128-cbc for 3s on 64 size blocks: 2720745 aes-128-cbc's in 1.01s >> Doing aes-128-cbc for 3s on 256 size blocks: 2377830 aes-128-cbc's in >> *0.74s* >> Doing aes-128-cbc for 3s on 1024 size blocks: 1538693 aes-128-cbc's in >> *0.40s* >> Doing aes-128-cbc for 3s on 8192 size blocks: 370202 aes-128-cbc's in >> *0.11s* >> OpenSSL 1.0.2m 2 Nov 2017 >> built on: reproducible build, date unspecified >> options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) >> idea(int) blowfish(idx) >> compiler: gcc -I. -I.. -I../include -DOPENSSL_THREADS -D_REENTRANT >> -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS >> -Wa,--noexecstack -m64 -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 >> -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m >> -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM >> -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM >> The 'numbers' are in 1000s of bytes per second processed. >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 >> bytes >> aes-128-cbc 39794.64k 172403.64k 822600.65k 3939054.08k >> 27569952.58k >> >> >> The timing for how quickly the results are returned are way off and >> probably just wrong. The Openssl speed test is supposed to run for 3 >> seconds. The actual results returned for 8192 byte blocks is >> >> Doing aes-128-cbc for 3s on 8192 size blocks: 370202 aes-128-cbc's in >> *0.11s* >> >> whereas without cryptodev I see >> >> Doing aes-128-cbc for 3s on 8192 size blocks: 457255 aes-128-cbc's in >> *3.00s* >> >> So you can see that without cryptodev the i5-6800 actually says it's >> doing more blocks (457,255 vs 370,202) but with cryptodev it is doing it in >> WAY less time. This leads me to believe the openssl speed code when using >> cryptodev just "goes wrong". >> It will be very interesting to see what the encryption test will bring - >> that is a much better real-life-like example than a simple speed test. >> >> as a follow-up : someone whispered in my ear (thanks, André ;) ) that one >> should use the -elapsed option for this, so here are new results: >> >> *with* cryptodev: >> >> ./openssl speed -evp aes-128-cbc -elapsed >> You have chosen to measure elapsed time instead of user CPU time. >> Doing aes-128-cbc for 3s on 16 size blocks: 2825786 aes-128-cbc's in 3.00s >> Doing aes-128-cbc for 3s on 64 size blocks: 2716822 aes-128-cbc's in 3.00s >> Doing aes-128-cbc for 3s on 256 size blocks: 2369723 aes-128-cbc's in >> 3.00s >> Doing aes-128-cbc for 3s on 1024 size blocks: 1536054 aes-128-cbc's in >> 3.00s >> Doing aes-128-cbc for 3s on 8192 size blocks: 369984 aes-128-cbc's in >> 3.00s >> [...] >> aes-128-cbc 15,070.86k 57,958.87k 202,216.36k 524,306.43k >> 1,010,302.98k >> >> *without* cryptodev: >> >> $ openssl speed -evp aes-128-cbc -elapsed >> You have chosen to measure elapsed time instead of user CPU time. >> Doing aes-128-cbc for 3s on 16 size blocks: 207188725 aes-128-cbc's in >> 3.00s >> Doing aes-128-cbc for 3s on 64 size blocks: 56855717 aes-128-cbc's in >> 3.00s >> Doing aes-128-cbc for 3s on 256 size blocks: 14382122 aes-128-cbc's in >> 3.00s >> Doing aes-128-cbc for 3s on 1024 size blocks: 3618996 aes-128-cbc's in >> 3.00s >> Doing aes-128-cbc for 3s on 8192 size blocks: 456727 aes-128-cbc's in >> 3.00s >> [...] >> aes-128-cbc 1,105,006.53k 1,212,921.96k 1,227,274.41k >> 1,235,283.97k 1,247,169.19k >> >> which more or less reflects the encryption test results I posted earlier. >> The question becomes, what are you results when using the -elapsed flag? >> >> JJK >> >> >> >My advice is to rerun your tests *without* the cryptodev module and then >> decide wheter you really need CBC+CCM hmacs. >> Yes, I confirm that without the cryptodev the performance is very bad for >> my device. I don't have that device in my hand right now. But I saved one >> aes-256-cbc result in my web notebook as below: >> >> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes >> aes-256-cbc 19626.95k 24289.71k 25054.46k 25347.75k 25337.86k >> Please note, there are two modes to accelerate encryption/decryption. >> 1. HW instructions like intel x86 CPU. >> 2. Using a crypto engine. >> When your device is 2 and its CPU is not powerful, normally with >> cryptodev speed is much faster at least for big blocks. Maybe for small >> blocks it's slower because >> it needs the time to push the work to kernel and then HW engine and the >> time spent is may longer than the time costed by OpenSSL directly does the >> encryption/decryption. >> Tony >> >> Jan Just Keijser <janj...@nikhef.nl> 于2020年12月2日周三 下午7:24写道: >> >>> hi Tony, >>> >>> On 01/12/20 02:50, Tony He wrote: >>> >>> Hi Arne, >>> >>> openssl speed -evp aes-128-cbc >>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc >>> 20035.60k 123261.54k 267081.60k 1094764.09k 9181370.18k >>> openssl speed -evp aes-128-gcm >>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm >>> 18738.76k 19284.91k 19524.44k 19606.87k 19685.46k >>> openssl speed -evp aes-128-ccm >>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-ccm >>> 53859.07k 215581.12k 862070.02k 3460786.43k 27566347.61k >>> openssl speed -evp sha1 >>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes sha1 3108.57k >>> 12177.79k 57325.18k 181610.34k 1207364.27k >>> openssl speed -evp chacha20-poly1305 >>> chacha20-poly1305 is an unknown cipher or digest >>> Using old openssl, so chacha20-poly1305 is not supported. >>> >>> >>> these numbers look suspiciously like you're using the linux cryptodev >>> module. Openssl speed results for the linux cryptodev module are totally >>> unreliable and I'd even go so far as to say that the *only* numbers I trust >>> in the output above are for aes-128-gcm >>> >>> For example, if I do the same on an i5-6800 I get *without* the >>> cryptodev module: >>> $ openssl speed -evp aes-128-cbc >>> aes-128-cbc 1,104,599.38k 1,208,651.07k 1,231,766.70k >>> 1,237,545.64k 1,248,793.94k >>> >>> and with the module I get >>> aes-128-cbc 45,087.41k 127,822.72k 581,517.17k 2,256,593.19k >>> 27,583,804.51k >>> >>> the second set of numbers doesn't make sense, and a much better test is >>> to do an actual encryption test, e.g. >>> >>> *without* the module >>> cat BIGFILE | openssl aes-256-cbc -e -pass pass:thisisabadpassword | >>> pv > /dev/null >>> 2.93GB 0:00:05 [ 549MB/s] [ >>> <=> >>> ] >>> >>> ('pv' aka 'pipeview' is a handy tool to measure the throughput of a UNIX >>> pipe) >>> >>> and with the module: >>> cat BIGFILE | ./openssl aes-256-cbc -e -pass pass:thisisabadpassword >>> -engine cryptodev| pv > /dev/null >>> engine "cryptodev" set. >>> 2.93GB 0:00:07 [ 426MB/s] [ <=> >>> >>> so you see that using the cryptodev module actually slows things down - >>> which is to be expected, as the application needs to do more work using the >>> cryptodev module. >>> >>> My advice is to rerun your tests *without* the cryptodev module and then >>> decide wheter you really need CBC+CCM hmacs. >>> >>> HTH, >>> >>> JJK >>> >>> >>> Arne Schwabe <a...@rfc2549.org> 于2020年11月26日周四 下午6:40写道: >>> >>>> Am 26.11.20 um 10:41 schrieb Tony He: >>>> > Hi Arne, >>>> > >>>> >>Since the original thread was not on the mailing list I am missing >>>> your >>>> >>goal but if your crypto acelator already works with OpenSSL, then it >>>> >>will also work with the "normal" OpenVPN >>>> > >>>> > Yes, it wokrs with "normal" OpenVPN(OpenVPN2), but according to the >>>> test >>>> > result, it's still not fast(about 60Mbps). >>>> > The bottleneck is not encryption operation any more. It comes from the >>>> > switch of user space and kernel space in the OpenVPN2, >>>> > which makes the poor CPU of embedded device very busy. That's why we >>>> > need OpenVPN3 running in the kernel space. >>>> >>>> >>>> What numbers are we are talking in crypto speed? Could you provide from >>>> your "poor" device: >>>> >>>> >>>> openssl speed -evp aes-128-cbc >>>> openssl speed -evp aes-128-gcm >>>> openssl speed -evp aes-128-ccm >>>> openssl speed -evp sha1 >>>> openssl speed -evp chacha20-poly1305 >>>> >>>> I want to what difference/gain in terms of raw crypto speed we are >>>> talking here. >>>> >>> >
_______________________________________________ Openvpn-devel mailing list Openvpn-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openvpn-devel