Re: [Bacula-users] Possibility of parallelising encryption?
On 8/04/2010 8:03 PM, Phil Stracchino wrote: > On 04/08/10 02:16, Craig Ringer wrote: >> Bacula should probably work with Intel's hardware crypto out of the box. >> If it doesn't, most likely all that'd be required would be to call: >> >> ENGINE_load_builtin_engines(); >> ENGINE_register_all_complete(); > Sounds like a good idea to me. Want to write up a patch? I will, at that. I even have some Via C3 machines to test it on, though they're thin clients so I'll need to take one out of service for a bit and chuck a disk in it. -- Craig Ringer -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
On 04/08/10 02:16, Craig Ringer wrote: > Bacula should probably work with Intel's hardware crypto out of the box. > If it doesn't, most likely all that'd be required would be to call: > > ENGINE_load_builtin_engines(); > ENGINE_register_all_complete(); > > in init_crypto() , and: > > ENGINE_cleanup(); > > when crypto is cleaned up. See "man 3 engine". In fact, as PadLock > support comes pre-loaded and Intel crypto probably will too, it may not > even be neccessary to call ENGINE_load_builtin_engines() at all, only > ENGINE_register_all_complete(). Asking openssl to load all engines will > let it use other less common hardware crypto systems like some of the > add-on hw crypto PCI cards, though, and it's cheap enough not to even be > detectable in something as long-lived as bacula-fd. > > OpenSSL is smart enough to pick a hardware engine if one exists, and > fall back to software if there's no suitable engine for the task at > hand. IIRC that's all I had to do to patch PadLock support into OpenSSH > when I needed it for my thin clients at work. > > It's a trivial change that would enable Bacula to use any builtin > hardware crypto engine supported by OpenSSL. Worth making, so that by > the time the new Intel hardware hits Bacula supports it? > > -- > Craig Ringer Sounds like a good idea to me. Want to write up a patch? -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, Free Stater It's not the years, it's the mileage. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
Phil Stracchino wrote: > FWIW, memory says we tried using GnuTLS once. It turns out that for > this purpose it is (or was) horribly broken. In fact, at the time, I > seem to recall the mere presence of Red Hat's GnuTLS lib broke Bacula > altogether. Yeah. I've found it to be pretty broken in other ways - I'm STILL having to rebuild the subversion client against openssl to get working support for client certificates due to a gnutls bug, and I rebuild netatalk against openssl to get dhx password hashes that aren't supported in gnutls. It doesn't look like gnutls is any better than openssl in terms of parallel encryption support anyway. The whole affair seems to be a licensing kerfuffle rather than anything else. >> It'd be lovely to be able to use the IPP libraries in Bacula (and many >> other things) for parallel crypto and many other parallel tasks, as >> they're excellent even without special hardware. Unfortunately they're >> rather GPL-incompatible and are only "free" for non-commercial use. > > It would indeed be very nice to be able to use that kind of hardware > crypto support without having to jump through licensing hoops. You don't have to jump through any licensing hoops to use the hardware crypto. Intel have submitted patches that've been accepted into OpenSSL 1.0 . For Via's C3/C7 PadLock hardware crypto, support has been around for ages and is certainly in OpenSSL 0.9.8. Run "openssl engine" for a list of built-in engines in your version of OpenSSL. Dynamically loaded engines for rarer hardware crypto devices and/or those that need special configuration are availible in /usr/lib/ssl/engines/ . Those licensing hoops would have to be jumped to use Intel's fast parallel *software* crypto engine to do multi-core/multi-threaded AES-CTR mode encryption. Bacula should probably work with Intel's hardware crypto out of the box. If it doesn't, most likely all that'd be required would be to call: ENGINE_load_builtin_engines(); ENGINE_register_all_complete(); in init_crypto() , and: ENGINE_cleanup(); when crypto is cleaned up. See "man 3 engine". In fact, as PadLock support comes pre-loaded and Intel crypto probably will too, it may not even be neccessary to call ENGINE_load_builtin_engines() at all, only ENGINE_register_all_complete(). Asking openssl to load all engines will let it use other less common hardware crypto systems like some of the add-on hw crypto PCI cards, though, and it's cheap enough not to even be detectable in something as long-lived as bacula-fd. OpenSSL is smart enough to pick a hardware engine if one exists, and fall back to software if there's no suitable engine for the task at hand. IIRC that's all I had to do to patch PadLock support into OpenSSH when I needed it for my thin clients at work. It's a trivial change that would enable Bacula to use any builtin hardware crypto engine supported by OpenSSL. Worth making, so that by the time the new Intel hardware hits Bacula supports it? -- Craig Ringer -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
Craig Ringer wrote: Snip interesting crypto discussion... >> I'd use the hardware encryption (which presumably has no performance >> impact), that is an option on this autochanger, except they want $2500 >> for it... > > Probably because it has a custom ASIC for the crypto algorithm in use to > allow it to go fast enough. Well, I suspect the ASIC is already onboard - the $2500 seems to get you a couple of USB keys to enable it (HP MSL series libraries). > The trouble with this is that if your tape drive/changer dies, you > generally need another one with the same hardware crypto to restore. > This is a really, really ugly situation for disaster recovery. As long as you obtain another MSL library and use one of your original USB keys, you're back in business, but I take your point. Regards, Richard -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
On 04/07/10 05:09, Craig Ringer wrote: > Richard Scobie wrote: >> Would it be possible to optimise this task by perhaps reading data in >> "chunks", which in turn can be encrypted by a core each, before being >> recombined and written out to tape? > > Bacula uses OpenSSL for crypto support. It doesn't seem to support any > other crypto libraries like NSS or GnuTLS. FWIW, memory says we tried using GnuTLS once. It turns out that for this purpose it is (or was) horribly broken. In fact, at the time, I seem to recall the mere presence of Red Hat's GnuTLS lib broke Bacula altogether. > Some hardware, like the Via C7 series of CPUs, have built-in AES crypto > hardware (PadLock) that on a single thread can do *insane* encryption > rates. On the older C3 series CPUs I've had no problems saturating a > 100MBit line with encrypted ssh data, despite the gutless 400MHz C3 CPU. > > Intel has introduced similar instructions on their Xeon 5600 series: > > http://software.intel.com/en-us/articles/boosting-openssl-aes-encryption-with-intel-ipp/ > > It'd be lovely to be able to use the IPP libraries in Bacula (and many > other things) for parallel crypto and many other parallel tasks, as > they're excellent even without special hardware. Unfortunately they're > rather GPL-incompatible and are only "free" for non-commercial use. It would indeed be very nice to be able to use that kind of hardware crypto support without having to jump through licensing hoops. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, Free Stater It's not the years, it's the mileage. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
Richard Scobie wrote: > I have a 2.8GHz Core i7 machine backing up uncompressable data spooled > onto an 8 drive RAID5, to LTO-4 tape. > > Our requirements now dictate that data encryption must be used on the > tapes and having configured this, it seems that one core is saturated > encrypting the data and the result is that tape write speed is now about > 50% slower than when encryption is not used. > > Would it be possible to optimise this task by perhaps reading data in > "chunks", which in turn can be encrypted by a core each, before being > recombined and written out to tape? Sorry to reply-to-self, but after a bit more reading it seems that the algorithm/cypher used by Bacula, aes-256-cbc, can't really be parallel encrypted on multiple cores because encrypting one block affects the way the next block is encrypted. aes-ctr mode was developed to address that issue. It doesn't seem to be hugely widely adopted yet, but is used in IPSec, appears in revisions to the TLS standard, etc. OpenSSL supports AES CTR mode at least for 128-bit, but only single-threaded. The implementation of multi-threaded aes-ctr encryption for OpenSSH I referenced earlier was also presented to the OpenSSL folks: http://marc.info/?l=openssl-dev&m=120180007117054&w=2 but I can't find any response or follow-up to it. So I guess the short answer is "parallel encryption is way harder to do than it looks". -- Craig Ringer -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
Richard Scobie wrote: > I have a 2.8GHz Core i7 machine backing up uncompressable data spooled > onto an 8 drive RAID5, to LTO-4 tape. > > Our requirements now dictate that data encryption must be used on the > tapes and having configured this, it seems that one core is saturated > encrypting the data and the result is that tape write speed is now about > 50% slower than when encryption is not used. The process saturating a core is the file daemon, right? > Would it be possible to optimise this task by perhaps reading data in > "chunks", which in turn can be encrypted by a core each, before being > recombined and written out to tape? Bacula uses OpenSSL for crypto support. It doesn't seem to support any other crypto libraries like NSS or GnuTLS. OpenSSL supports hardware crypto acceleration for some cyphers in a largely transparent manner. This is one option. If Bacula doesn't "just work" with hardware crypto I'd expect it to be a one-line patch to add support, going by what I've had to do to enable it in other software. Some hardware, like the Via C7 series of CPUs, have built-in AES crypto hardware (PadLock) that on a single thread can do *insane* encryption rates. On the older C3 series CPUs I've had no problems saturating a 100MBit line with encrypted ssh data, despite the gutless 400MHz C3 CPU. Intel has introduced similar instructions on their Xeon 5600 series: http://software.intel.com/en-us/articles/boosting-openssl-aes-encryption-with-intel-ipp/ I'd be lovely to be able to use the IPP libraries in Bacula (and many other things) for parallel crypto and many other parallel tasks, as they're excellent even without special hardware. Unfortunately they're rather GPL-incompatible and are only "free" for non-commercial use. ( The Intel Thread Building Blocks library *is* open source under GPLv2, though, and if Bacula wasn't already using pthreads directly would be rather nice: http://www.threadingbuildingblocks.org/ ) Anyway, if you need a software-only option, it's necessary to: 1) get OpenSSL to use multiple cores for encryption internally; 2) get Bacula to use OpenSSL to encrypt blocks using worker threads using a suitable block cypher; or 3) Use another crypto library that automatically parallelizes. None of these look easy by any stretch. (2) is probably most realistic, but as OpenSSL does some internal locking and serialization it may not be possible to encrypt on multiple threads even when using a simple block cypher where one block doesn't depend on the next or previous. I don't know much about OpenSSL and can't say more without a lot more digging. For all I know it might be necessary to ask OpenSSL for the session key, then use its low level crypto functions to encrypt blocks rather than using the higher-level stream/session interface. While not directly OpenSSL related, this might also be of interest: http://www.psc.edu/networking/projects/hpn-ssh/ http://www.psc.edu/networking/projects/hpn-ssh/papers/a14-rapier.pdf It doesn't touch on OpenSSL, but at least it's using a highly parallel AES cypher... > I'd use the hardware encryption (which presumably has no performance > impact), that is an option on this autochanger, except they want $2500 > for it... Probably because it has a custom ASIC for the crypto algorithm in use to allow it to go fast enough. The trouble with this is that if your tape drive/changer dies, you generally need another one with the same hardware crypto to restore. This is a really, really ugly situation for disaster recovery. ( Maybe things have improved since I switched away from tape, and now the hardware crypto is just an accelerator and you can still load your keys for driver-based crypto instead. I doubt it, though. ) -- Craig Ringer -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Possibility of parallelising encryption?
And of course it's even worse if you have compressible data. Since it's uncompressible once encrypted, you can't let the tape drive handle it... So in our case, on a quad-core server, we see a single core saturated apparently doing both the compress and encrypt routines, while 3 cores idle. Leads to much slower backup times than it would otherwise be capable of. Even having one core compress and another encrypt would be more efficient in our case. From: Richard Scobie [rich...@sauce.co.nz] Sent: Sunday, March 28, 2010 11:26 AM To: bacula-users Subject: [Bacula-users] Possibility of parallelising encryption? I have a 2.8GHz Core i7 machine backing up uncompressable data spooled onto an 8 drive RAID5, to LTO-4 tape. Our requirements now dictate that data encryption must be used on the tapes and having configured this, it seems that one core is saturated encrypting the data and the result is that tape write speed is now about 50% slower than when encryption is not used. Would it be possible to optimise this task by perhaps reading data in "chunks", which in turn can be encrypted by a core each, before being recombined and written out to tape? I'd use the hardware encryption (which presumably has no performance impact), that is an option on this autochanger, except they want $2500 for it... Regards, Richard -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Possibility of parallelising encryption?
I have a 2.8GHz Core i7 machine backing up uncompressable data spooled onto an 8 drive RAID5, to LTO-4 tape. Our requirements now dictate that data encryption must be used on the tapes and having configured this, it seems that one core is saturated encrypting the data and the result is that tape write speed is now about 50% slower than when encryption is not used. Would it be possible to optimise this task by perhaps reading data in "chunks", which in turn can be encrypted by a core each, before being recombined and written out to tape? I'd use the hardware encryption (which presumably has no performance impact), that is an option on this autochanger, except they want $2500 for it... Regards, Richard -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users