2017-07-14 21:04 GMT+08:00 Daniel P. Berrange <berra...@redhat.com>: > On Fri, Jul 14, 2017 at 07:38:22AM -0400, longpeng.m...@gmail.com wrote: >> From: "Longpeng(Mike)" <longpe...@huawei.com> >> [...]
>> >> NOTE: If we use specific hardware crypto cards, I think afalg-backend >> would even faster. >> >> test-environment: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz >> >> *sha256* >> chunk_size(bytes) MB/sec(afalg:sha256-ssse3) MB/sec(nettle) >> 512 93.03 185.87 >> 1024 146.32 201.78 >> 2048 213.32 210.93 >> 4096 275.48 215.26 >> 8192 321.77 217.49 >> 16384 349.60 219.26 >> 32768 363.59 219.73 >> 65536 375.79 219.99 >> >> *hmac(sha256)* >> chunk_size(bytes) MB/sec(afalg:sha256-ssse3) MB/sec(nettle) >> 512 71.26 165.55 >> 1024 117.43 189.15 >> 2048 180.96 203.24 >> 4096 247.60 211.38 >> 8192 301.99 215.65 >> 16384 340.79 218.22 >> 32768 365.51 219.49 >> 65536 377.92 220.24 >> >> *cbc(aes128)* >> chunk_size(bytes) MB/sec(afalg:cbc-aes-aesni) MB/sec(nettle) >> 512 371.76 188.41 >> 1024 559.86 189.64 >> 2048 768.66 192.11 >> 4096 939.15 192.40 >> 8192 1029.48 192.49 >> 16384 1072.79 190.52 >> 32768 1109.38 190.41 >> 65536 1102.38 190.40 > > So I've attempted to replicate these results, and see totally > different outcome. NB, I hacked your code so that setting > QEMU_DISABLE_AF_ALG=1 would skip the af-alg impl. The results > I get are: > > $ tests/benchmark-crypto-hash --quiet > sha256: Testing chunk_size 512 bytes done: 197.31 MB in 5.00 secs: 39.46 > MB/sec > sha256: Testing chunk_size 1024 bytes done: 337.03 MB in 5.00 secs: 67.41 > MB/sec > sha256: Testing chunk_size 2048 bytes done: 516.27 MB in 5.00 secs: 103.25 > MB/sec > sha256: Testing chunk_size 4096 bytes done: 675.18 MB in 5.00 secs: 135.04 > MB/sec > sha256: Testing chunk_size 8192 bytes done: 837.73 MB in 5.00 secs: 167.55 > MB/sec > sha256: Testing chunk_size 16384 bytes done: 946.78 MB in 5.00 secs: 189.35 > MB/sec > sha256: Testing chunk_size 32768 bytes done: 1008.56 MB in 5.00 secs: 201.71 > MB/sec > sha256: Testing chunk_size 65536 bytes done: 1037.19 MB in 5.00 secs: 207.43 > MB/sec [...] > > I of course don't have the same CPU as you, but it is a representative > current model Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz > > I can, however, imagine that there are scenarios where this is faster, > particularly if using this in an embedded scenario with a relatively > low perf main CPU, but a hardware accelerator available. > > Based on this though, I'm very reluctant to enable AF_ALG by default > when building QEMU, because I think it'll likely cause a major perf > regression for the common case of people with fast CPUs and no > hardware accelerator. > > I think in the immediate term we should add a switch to configure > --enable-crypto-afalg, that must be opt-in when building QEMU, > so those people who know they have good hardware accelerator > present can use it, but in the general case we avoid it. > OK. We can take this measure currently. But some hardware accelerators only support limit amount of algos, maybe in the next step we need a cmdline param to specify which algo uses afalg- backend and other algos still use library-backend even though we '--enale-crypto-afalg'. Anyway, I'll modify the code as your suggestion first. :) > For the general case, I think we need to figure out how to make > direct use of CPU insturctions for crypto, eg Intel aesni. This > might be possible by using GNUTLS for ciphers (though it lacks > coverage for all the combinations we want) > IIUC, newer gcrypt/nettle would use CPU insturctions for crypto if CPU support. -- Regards, Longpeng > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|