Re: All CPU threads
Werner Koch wrote: On Mon, 11 Sep 2023 22:29, Jacob Bachmeyer said: So using threads to compute a blinded RSA operation would just about recover the computational cost of blinding the calculation? How would No. I gave this as an example where you could else see on how to speed up things. For example if you do not need to mitigate local side-channel attacks. OK, I get it now: you were suggesting that there are easier trade-offs for similar performance gains. Thanks. -- Jacob ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
Re: All CPU threads
On Mon, 11 Sep 2023 22:29, Jacob Bachmeyer said: > So using threads to compute a blinded RSA operation would just about > recover the computational cost of blinding the calculation? How would No. I gave this as an example where you could else see on how to speed up things. For example if you do not need to mitigate local side-channel attacks. Shalom-Salam, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein openpgp-digital-signature.asc Description: PGP signature ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
Re: All CPU threads
Werner Koch via Gnupg-users wrote: [...] On Sat, 9 Sep 2023 22:07, Robert J. Hansen said: and for the vast majority of users isn't worth it. The easy wins (28% cost savings on RSA encryption! Whee, almost half a millisecond!) are The blinding we use for RSA (to mitigate side-channel attacks) should be in the same range as these wins. I bet that by adding threads to the computation you will open another can of side-channel attacks. So using threads to compute a blinded RSA operation would just about recover the computational cost of blinding the calculation? How would hypothetical thread-related side channels matter if we are using blinding around the parallel calculation? -- Jacob ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
Re: All CPU threads
Hi! Thanks Rob for your comments. Here are some additional points: On Sat, 9 Sep 2023 22:07, Robert J. Hansen said: > and for the vast majority of users isn't worth it. The easy wins (28% > cost savings on RSA encryption! Whee, almost half a millisecond!) are The blinding we use for RSA (to mitigate side-channel attacks) should be in the same range as these wins. I bet that by adding threads to the computation you will open another can of side-channel attacks. > performance. I'm sure that if and when the next RFC is officially > released, there will be interest in getting parallelization support OCB mode is already used and deployed for years. With a decent Libgcrypt (1.10) I get these figures for the old (CFB) and the new mode (OCB) AES256 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz CFB enc | 0.691 ns/B 1379 MiB/s 5.14 c/B 7440±1 CFB dec | 0.064 ns/B 14959 MiB/s 0.470 c/B 7372±2 OCB enc | 0.070 ns/B 13547 MiB/s 0.522 c/B 7415±2 OCB dec | 0.071 ns/B 13451 MiB/s 0.520 c/B 7336±3 These values are for the low level crypto routines. In reality we also do a SHA-1 hashing in addition to CFB which makes it even slower. OTOH. the protocol requires buffering and the way gpg implements things has a large impact on the performance. Fortunately, Jussi Kivilinna also worked on gpg's buffering and gained a lot of extra speed: * gpg: Threefold decryption speedup for large files. https://dev.gnupg.org/rGab177eed51 (For the old CFB mode) * gpg: Nearly double the AES256.OCB encryption speed. https://dev.gnupg.org/rG99e2c178c7 Thus in 2.4 we get this for symmetric encryption of a 4 GiB file from RAM to /dev/null on a Ryzen5800X: AES256.CFB encryption 1.3 GiB/s AES256.OCB encryption 4.2 GiB/s FWIW there are also improvements in signature verification: * gpg: Up to five times faster verification of detached signatures. Doubled detached signing speed. https://dev.gnupg.org/rG4e27b9defc https://dev.gnupg.org/rGf8943ce098 YMMV depending on what kind of data you encrypt, whether signing and compression comes into the game. Compression is a major performance hog - feeding gpg from a (threaded) bzip2 and using -z0 will in general give better performance than the using the internal compressor code. Shalom-Salam, Werner -- The pioneers of a warless world are the youth that refuse military service. - A. Einstein openpgp-digital-signature.asc Description: PGP signature ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
Re: All CPU threads
Thank you for reply. I was thinking about speeding up the encryption process. But if that's not possible then that's how it is. Thank you for sending a plain-text email to the list! :) The answer is a little complicated, but this should be an accurate-enough explanation. Encryption speed is dominated by disk speed first and foremost. If you're encrypting a 1Mb file, you have to read in the file and write it out again when you're done: your absolute minimum time is given by however long it takes to read and write a 1Mb file. This is unfortunate, because disk I/O is *slow*. Even SSDs, which are about ten to twenty times as fast as older spinning metal platter hard drives, can't completely bridge this gap. So at the end of the day, your bottleneck for encryption is going to be disk I/O. There are various games people play, like keeping an in-memory filesystem. If you're doing that, then we can look at other places for speed improvement. Remember, as you read what follows: we're doing all of these weird things to improve things by a very tiny bit -- the bottleneck is in disk I/O! = Encryption generates a random session key and encrypts that with your recipient's public key. Here's your next problem: there are *so many* algorithms GnuPG supports, and there isn't a single effective parallelization strategy for all of them. Take RSA as an example: the expensive part of the encryption operation is P = C^e (mod n), or as normal humans call it, "modular exponentiation". I've got an IEEE paper on my desk (by Budikafa and Pulungan) dating from 2017 that says you can parallelize modular exponentiation to get up to a 28% speed improvement. That's really nice! The problem is the phrase "up to" a 28% speed improvement, and the fact that only RSA uses modular exponentiation, so if your correspondent is using ECC you're kind of out of luck. So, when it comes to the asymmetric part of the encryption: a sequential version takes a couple of milliseconds, and best-case scenario by throwing multiple threads at it you can save 28% on two milliseconds. This is not a big enough win to justify the multithreading. Once you've encrypted the random session key for each recipient, now you have to process the file 16 bytes at a time. For each block after the first, the result of the last block's encryption is an input to the current block's encryption. Block 0 (which is the first -- remember, computer scientists are weird, we start counting at zero) doesn't depend on anything; block 1 depends on having the output of block 0; block 2 depends on having the output of block 1; and so on. Even if you were to spin up one thread per block you'd still get no speed improvement. You'd be encrypting sequentially, one block at a time until you were complete. Multi-threading is thus theoretically possible, but offers no advantages. (Note that Phil Rogaway kind of disagrees with me: he characterizes parallelizing cipher feedback modes as possible "but awkward". When Phil Rogaway, one of the sharpest cryptographers in the world, describes an optimization as "awkward", I very quietly turn around and start moving in the opposite direction. Clearly I am in over my head and I need to escape.) https://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf -- search for the words "but awkward". Etcetera, etcetera. Speeding up encryption operations with multiple threads is a *deeply* challenging cryptographic engineering problem, and for the vast majority of users isn't worth it. The easy wins (28% cost savings on RSA encryption! Whee, almost half a millisecond!) are too trivial, and the big wins are somewhere between "Rogaway says it's awkward" and "Rogaway says it's impossible". That said, the next RFC draft -- when it comes out -- will be offering new encryption modes that may offer better parallelization performance. I'm sure that if and when the next RFC is officially released, there will be interest in getting parallelization support for them. ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
Re: All CPU threads
On 9/10/23 01:21, Robert J. Hansen via Gnupg-users wrote: Please do not send HTML to this list. Many of the people you very much hope to read your questions will not read HTML email. Anyone knows if there is a way to use all CPU threads with *gnupg-desktop-2.4.3.0-x86_64.AppImage* ? What exactly are you hoping to speed up? The classic mode of encryption used in RFC2440 and RFC4880 is a hacked-up cipher feedback mode, which is not parallelizable and doesn't benefit from using multiple threads. You can of course use multiple threads, but you won't get any benefit. So my question is, what exactly is it that you need to speed up? Once we know that, we'll be able to give suggestions for how you might proceed. ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users Thank you for reply. I was thinking about speeding up the encryption process. But if that's not possible then that's how it is. Is this message now plain text only? Best, Jozsef K. ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
Re: All CPU threads
Please do not send HTML to this list. Many of the people you very much hope to read your questions will not read HTML email. Anyone knows if there is a way to use all CPU threads with *gnupg-desktop-2.4.3.0-x86_64.AppImage* ? What exactly are you hoping to speed up? The classic mode of encryption used in RFC2440 and RFC4880 is a hacked-up cipher feedback mode, which is not parallelizable and doesn't benefit from using multiple threads. You can of course use multiple threads, but you won't get any benefit. So my question is, what exactly is it that you need to speed up? Once we know that, we'll be able to give suggestions for how you might proceed. ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users
All CPU threads
Hi! Anyone knows if there is a way to use all CPU threads with *gnupg-desktop-2.4.3.0-x86_64.AppImage* ? Best, JK ___ Gnupg-users mailing list Gnupg-users@gnupg.org https://lists.gnupg.org/mailman/listinfo/gnupg-users