Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 01, 2014 at 03:23:03PM -0800, Alex Elsayed wrote: > > I have not seen any evidence that combining hashes like that actually > > reduces the chances of collision, but if we assume it does, then > > again, the non-crypto hashes would be faster. For example, 128-bit > > Spooky2 combined with 128-bit CityHash would produce a 256-bit hash > > and would be faster than MD5 + whatever. > > It has no real benefit, but _why_ depends on what your model is. > > There's a saying that engineers worry about stochastic failure; security > professionals have to worry about malicious failure. > > If your only concern is stochastic failure (random bitflips, etc), then the > chances of collision with 128-bit CityHash or MurmurHash or SipHash or what- > have-you are already so small that every single component in your laptop > dying simultaneously is more likely. Adding another hash is thus just a > waste of cycles. So as far as speed is preferred over strength, it does not matter much which algorithm we choose and combining more together does not bring a significant benefit. Crc32 is weak but has served well over the years, the improvement to anything 128bit-based should be obvious. > If your concern is malicious failure (in-band deduplication attack or > similar, ignoring for now that btrfs actually compares the extent data as > well IIRC), then it's well-known in the cryptographic community that the > concatenation of multiple hashes is as strong as the strongest hash, _but no > stronger_ [1]. In-band dedup uses sha anyway, the block checksums are only used for verification. > Since the strongest cipher in the above list is either a non-cryptographic > hash or MD5, which is known-weak to the point of there being numerous toy > programs finding collisions for arbitrary data, it would not be worth much. > > The only place this might be of use is if you used N strong/unbroken hashes, > in order to hedge against up to N-1 of them being broken. However, the gain > of that is (again) infinetismal, and the performance cost quite large > indeed. Thanks for you valuable input! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Alex Elsayed wrote: > Christoph Anton Mitterer wrote: > >> On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote: >>> including that MAC-then-encrypt is fragile >>> against a number of attacks, mainly in the padding-oracle category (See: >>> TLS BEAST attack). >> Well but here we talk about disk encryption... how would the MtE oracle >> problems apply to that? Either you're already in the system, i.e. beyond >> disk encryption (and can measure any timing difference)... or you're >> not, but then you cannot measure anything. > > Arguable. On a system with sufficiently little noise in the signal (say... > systemd, on SSD, etc) you could possibly get some real information from > corrupting padding on a relatively long extent used early in the boot > process, by measuring how it affects time-to-boot. To make this more concrete: Alice owns the computer, and has root. /etc/shadow has the correct permissions. Eve has _an_ account, but does not have root - and she wants it. For simplicity, let's presume this is a laptop, Alice and Eve are sisters, and Eve wants to peek at Alice's diary. Eve can boot into a livecd, selectively corrupt blocks, and get Alice to unlock the drive for a normal boot. With this, she can execute the padding oracle attack against /etc/shadow, and deduce its contents. The first rule of crypto is "Don't roll your own" largely because it is _brutally_ unforgiving of minor mistakes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Christoph Anton Mitterer wrote: > On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote: >> including that MAC-then-encrypt is fragile >> against a number of attacks, mainly in the padding-oracle category (See: >> TLS BEAST attack). > Well but here we talk about disk encryption... how would the MtE oracle > problems apply to that? Either you're already in the system, i.e. beyond > disk encryption (and can measure any timing difference)... or you're > not, but then you cannot measure anything. Arguable. On a system with sufficiently little noise in the signal (say... systemd, on SSD, etc) you could possibly get some real information from corrupting padding on a relatively long extent used early in the boot process, by measuring how it affects time-to-boot. And padding oracles are just one issue. Overall, the problem is that MtE isn't generically secure. EtM or pure AEAD modes are, which means you can simply mark any attack that doesn't rely on one of the underlying primitives being weak as "Not applicable." It also means you can compose it out of arbitrary secure primitives, rather than needing to do your proof of security over again for every combination. That's an _enormous_ win in terms of how easy it is to be sure a system is secure. Without it, you can't really be sure there isn't Yet Another Vector You Missed. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, 2014-12-01 at 16:43 -0800, Alex Elsayed wrote: > including that MAC-then-encrypt is fragile > against a number of attacks, mainly in the padding-oracle category (See: TLS > BEAST attack). Well but here we talk about disk encryption... how would the MtE oracle problems apply to that? Either you're already in the system, i.e. beyond disk encryption (and can measure any timing difference)... or you're not, but then you cannot measure anything. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Christoph Anton Mitterer wrote: > On Sat, 2014-11-29 at 13:00 -0800, John Williams wrote: >> On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed >> wrote: >> > Why not just use the kernel crypto API? Then the user can just specify >> > any hash the kernel supports. >> >> One reason is that crytographic hashes are an order of magnitude >> slower than the fastest non-cryptographic hashes. And for filesystem >> checksums, I do not see a need for crypotgraphic hashes. > > I'm not that crypto expert, but wouldn't the combination of a > cryptographic hash, in combination with e.g. dm-crypt below the > filesystem give us what dm-crypt alone cannot really give us > (authenticated integrity)? > > Would that combination of hash+encrypt basically work like a MAC? Sadly, no. Partially because in order for an encrypted hash to be a secure MAC, the encryption must be nonmalleable, which would require CMC or EME - encryption modes which Linux does not presently support as I understand it. There are other issues as well, including that MAC-then-encrypt is fragile against a number of attacks, mainly in the padding-oracle category (See: TLS BEAST attack). AEAD modes are also nonmalleable, but as they are length-expanding they cannot be used for LUKS. However, as eCryptFS and possibly the recent ext4 encryption work shows, using them at a higher-level (encrypting extents or files) does work. Of course, if you're using an AEAD mode in the filesystem anyway, just use it directly and have done with it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 4:15 PM, Alex Elsayed wrote: >> There's a thing called the transitive property. When CRC32 is faster than >> SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is >> faster than CRC32, there are comparisons that can be made. > > And yet you applied the transitive property with poor assumptions and > in a convoluted way to come up with an incorrect conclusion. > > >> It's that the flat assertion that "CityHash/SpookyHash/etc is always >> faster" is _unwarranted_, as hardware acceleration _has a huge effect_. > > Actually, the assertion is true and backed up by evidence that I > cited. I'm not sure why you think hardware acceleration only helps > SHA-1 and does not help CityHash or SpookyHash. ...because the hardware acceleration is in the form of instructions like "Update SHA1 state" ? https://software.intel.com/en-us/articles/intel-sha-extensions https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf (page 99, the SHA1{C,P,M,H,SU0,SU1} instructions) On SPARC it's a full-on crypto coprocessor. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 4:15 PM, Alex Elsayed wrote: > There's a thing called the transitive property. When CRC32 is faster than > SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is > faster than CRC32, there are comparisons that can be made. And yet you applied the transitive property with poor assumptions and in a convoluted way to come up with an incorrect conclusion. > It's that the flat assertion that "CityHash/SpookyHash/etc is always faster" > is _unwarranted_, as hardware acceleration _has a huge effect_. Actually, the assertion is true and backed up by evidence that I cited. I'm not sure why you think hardware acceleration only helps SHA-1 and does not help CityHash or SpookyHash. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Sat, 2014-11-29 at 13:00 -0800, John Williams wrote: > On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed wrote: > > Why not just use the kernel crypto API? Then the user can just specify any > > hash the kernel supports. > > One reason is that crytographic hashes are an order of magnitude > slower than the fastest non-cryptographic hashes. And for filesystem > checksums, I do not see a need for crypotgraphic hashes. I'm not that crypto expert, but wouldn't the combination of a cryptographic hash, in combination with e.g. dm-crypt below the filesystem give us what dm-crypt alone cannot really give us (authenticated integrity)? Would that combination of hash+encrypt basically work like a MAC? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed wrote: >> And I'm not sure what is "convoluted" or "incorrect" about saying "Look, >> empirical evidence!" > > No empirical evidence of the speed of SpookyHash or CityHash versus > SHA-1 was cited. The only empirical data mentioned was on an > UltraSPARC CPU, and did not include any SpookyHash or CityHash > measurements, and yet you made a claim about the speeds on Intel and > ARM CPUs. There's a thing called the transitive property. When CRC32 is faster than SpookyHash and CityHash (while admittedly weaker), and SHA-1 on SPARC is faster than CRC32, there are comparisons that can be made. And what I've been trying to say this whole time is not some point about an individual architecture. It's that the flat assertion that "CityHash/SpookyHash/etc is always faster" is _unwarranted_, as hardware acceleration _has a huge effect_. On SPARC, it's empirically enough for SHA-1 to match CRC32. On ARMv8, it brings SHA-1 from 4-8 cycles per byte down to _2_. On Intel, when the Skylake SHA extensions land, it will likely have an enormous impact as well. Broad, sweeping generalizations are great - so long as they are _properly qualified_. For instance, I would agree *wholeheartedly* that a good software implementation of CityHash/SpookyHash/etc would beat the *pants* off a good software implementation of SHA-1. No question. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 4:06 PM, Alex Elsayed wrote: > https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-armv8.pl > > # hardware-assisted software(*) > # Apple A72.31 4.13 (+14%) > # Cortex-A53 2.19 8.73 (+108%) > # Cortex-A57 2.35 7.88 (+74%) Note that those are showing 2 cycles per byte. > From the CityHash readme, on a Xeon X5550 (which is _considerably_ more > powerful than any of the above): > > On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at > about 5 to 5.5 bytes/cycle. 5 bytes per cycle is 0.2 cycles per byte. So your own citation shows that CityHash is 10 times faster. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Alex Elsayed wrote: > So CityHash is - at best - half as fast as SHA1 with acceleration. > > In fact, on the Apple A7, it would likely be slower than _software_ SHA-1. Argh, ignore this. The CityHash readme is in bytes/cycle, which I missed on first readthrough (why on earth they are not using either MB/s for rate, or cycles/byte, eludes me completely.) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed wrote: > And that _is_ the case; they are faster... *when both are software > implementations* They are also faster when both are optimized to use special instructions of the CPU. According to this Intel whitepaper, SHA-1 does not achieve less than 1 cycle/byte in any of the situations they tested: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/haswell-cryptographic-performance-paper.pdf SpookyHash and CityHash obtain better than 0.5 cycle/byte, and in the case of CityHash256, better than 0.2 cycle/byte https://code.google.com/p/cityhash/source/browse/trunk/README -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed wrote: >> hard evidence shows that SHA-1 was equal to or faster than CRC32, which >> is unequivocally simpler and faster than CityHash (though CityHash comes >> close). >> >> And the CPUs in question are *not* particularly rare - Intel since Sandy >> Bridge or so, the majority of SPARC systems, a goodly number of ARM >> systems via coprocessors... > > By the way, your "hard evidence" is imaginary. > > Here you can see that SHA-1 is about 5 cycles per byte on Sandybridge: > > https://blake2.net/ > > While SpookyHash (and CityHash) are about 3 bytes per cycle (on long > keys) which is about 0.33 cycles per byte. More than 10 times faster > than SHA-1. > > http://burtleburtle.net/bob/hash/spooky.html On further examination, I did indeed make a mistake - the hardware acceleration for SHA on Intel will be in Skylake; only the AES acceleration was added in Sandy Bridge. So you are correct to some degree with the rarity argument. However, performance-wise, that means SHA-1 on Intel is still a software implementation. Let's look at ARMv8. The ARM v8 architecture added a few cryptographic instructions, including for SHA-1. The results: https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-armv8.pl # hardware-assisted software(*) # Apple A72.31 4.13 (+14%) # Cortex-A53 2.19 8.73 (+108%) # Cortex-A57 2.35 7.88 (+74%) >From the CityHash readme, on a Xeon X5550 (which is _considerably_ more powerful than any of the above): On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about 5 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers around CityHashCrc256 and should have similar performance on long strings. (CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The fastest Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle. We expect the peak speed of CityHash128 to dominate CityHash64, which is aimed more toward short strings or use in hash tables. So CityHash is - at best - half as fast as SHA1 with acceleration. In fact, on the Apple A7, it would likely be slower than _software_ SHA-1. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 3:46 PM, Alex Elsayed wrote: > And I'm not sure what is "convoluted" or "incorrect" about saying "Look, > empirical evidence!" No empirical evidence of the speed of SpookyHash or CityHash versus SHA-1 was cited. The only empirical data mentioned was on an UltraSPARC CPU, and did not include any SpookyHash or CityHash measurements, and yet you made a claim about the speeds on Intel and ARM CPUs. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed wrote: >> Incidentally, you can be 'skeptical' all you like - per Austin's message >> upthread, he was testing the Crypto API. Thus, skeptical as you may be, >> hard evidence shows that SHA-1 was equal to or faster than CRC32, which >> is unequivocally simpler and faster than CityHash (though CityHash comes >> close). >> >> And the CPUs in question are *not* particularly rare - Intel since Sandy >> Bridge or so, the majority of SPARC systems, a goodly number of ARM >> systems via coprocessors... > > You can make convoluted, incorrect claims all you like, but the fact > is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel > CPUs, and Murmur3 is faster on ARM systems. And it is not even close. > Your claims are absurd. And that _is_ the case; they are faster... *when both are software implementations* And I'm not sure what is "convoluted" or "incorrect" about saying "Look, empirical evidence!" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed wrote: > hard evidence shows that SHA-1 was equal to or faster than CRC32, which is > unequivocally simpler and faster than CityHash (though CityHash comes > close). > > And the CPUs in question are *not* particularly rare - Intel since Sandy > Bridge or so, the majority of SPARC systems, a goodly number of ARM systems > via coprocessors... By the way, your "hard evidence" is imaginary. Here you can see that SHA-1 is about 5 cycles per byte on Sandybridge: https://blake2.net/ While SpookyHash (and CityHash) are about 3 bytes per cycle (on long keys) which is about 0.33 cycles per byte. More than 10 times faster than SHA-1. http://burtleburtle.net/bob/hash/spooky.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed wrote: >> Incidentally, you can be 'skeptical' all you like - per Austin's message >> upthread, he was testing the Crypto API. Thus, skeptical as you may be, >> hard evidence shows that SHA-1 was equal to or faster than CRC32, which >> is unequivocally simpler and faster than CityHash (though CityHash comes >> close). >> >> And the CPUs in question are *not* particularly rare - Intel since Sandy >> Bridge or so, the majority of SPARC systems, a goodly number of ARM >> systems via coprocessors... > > You can make convoluted, incorrect claims all you like, but the fact > is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel > CPUs, and Murmur3 is faster on ARM systems. And it is not even close. > Your claims are absurd. And that is t -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 3:05 PM, Alex Elsayed wrote: > Incidentally, you can be 'skeptical' all you like - per Austin's message > upthread, he was testing the Crypto API. Thus, skeptical as you may be, hard > evidence shows that SHA-1 was equal to or faster than CRC32, which is > unequivocally simpler and faster than CityHash (though CityHash comes > close). > > And the CPUs in question are *not* particularly rare - Intel since Sandy > Bridge or so, the majority of SPARC systems, a goodly number of ARM systems > via coprocessors... You can make convoluted, incorrect claims all you like, but the fact is that SHA-1 is not as fast as Spooky2 or CityHash128 on x64 Intel CPUs, and Murmur3 is faster on ARM systems. And it is not even close. Your claims are absurd. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 12:35 PM, Austin S Hemmelgarn > wrote: >> My only reasoning is that with this set of hashes (crc32c, adler32, and >> md5), the statistical likely-hood of running into a hash collision with >> more than one of them at a time is infinitesimally small compared to the >> likely-hood of any one of them having a collision (or even compared to >> something ridiculous like the probability of being killed by a meteor >> strike), and the combination is faster on most systems that I have tried >> than many 256-bit crypto hashes. > > I have not seen any evidence that combining hashes like that actually > reduces the chances of collision, but if we assume it does, then > again, the non-crypto hashes would be faster. For example, 128-bit > Spooky2 combined with 128-bit CityHash would produce a 256-bit hash > and would be faster than MD5 + whatever. It has no real benefit, but _why_ depends on what your model is. There's a saying that engineers worry about stochastic failure; security professionals have to worry about malicious failure. If your only concern is stochastic failure (random bitflips, etc), then the chances of collision with 128-bit CityHash or MurmurHash or SipHash or what- have-you are already so small that every single component in your laptop dying simultaneously is more likely. Adding another hash is thus just a waste of cycles. If your concern is malicious failure (in-band deduplication attack or similar, ignoring for now that btrfs actually compares the extent data as well IIRC), then it's well-known in the cryptographic community that the concatenation of multiple hashes is as strong as the strongest hash, _but no stronger_ [1]. Since the strongest cipher in the above list is either a non-cryptographic hash or MD5, which is known-weak to the point of there being numerous toy programs finding collisions for arbitrary data, it would not be worth much. The only place this might be of use is if you used N strong/unbroken hashes, in order to hedge against up to N-1 of them being broken. However, the gain of that is (again) infinetismal, and the performance cost quite large indeed. [1] http://eprint.iacr.org/2008/075 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed > wrote: >> Actually, I said "Sure" here, but this isn't strictly true. At some >> point, you're more memory-bound than CPU-bound, and with CPU intrinsic >> instructions (like SPARC and recent x86 have for SHA) you're often past >> that. Then, you're not going to see any real difference - and the >> accelerated cryptographic hashes may even win out, because the intrinsics >> may be faster (less stuff of the I$, pipelined single instruction beating >> multiple simpler instructions, etc) than the software non-cryptographic >> hash. > > In practice, I am skeptical whether any 128- or 256-bit crypto hashes > will be as fast as the non-crypto hashes I mentioned, even on CPUs > with specific instructions for the crypto hashes. The non-crypto > hashes can (and do) take advantage of special CPU instructions as > well. > > But even if true that the crypto hashes approach the speed of > non-crypto hashes on certain CPUs, that does not provide a strong > argument for using the crypto hashes, since on the common x64 CPUs, > the non-crypto hashes I mentioned are significantly faster than the > equivalent crypto hashes. > > So, you have some rare architectures where the crypto hashes may > almost be as fast as the non-crypto, and common CPUs where the > non-crypto are much faster. That makes the non-crypto hash functions I > mentioned the obvious choice in the vast majority of systems. Incidentally, you can be 'skeptical' all you like - per Austin's message upthread, he was testing the Crypto API. Thus, skeptical as you may be, hard evidence shows that SHA-1 was equal to or faster than CRC32, which is unequivocally simpler and faster than CityHash (though CityHash comes close). And the CPUs in question are *not* particularly rare - Intel since Sandy Bridge or so, the majority of SPARC systems, a goodly number of ARM systems via coprocessors... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed > wrote: >> Actually, I said "Sure" here, but this isn't strictly true. At some >> point, you're more memory-bound than CPU-bound, and with CPU intrinsic >> instructions (like SPARC and recent x86 have for SHA) you're often past >> that. Then, you're not going to see any real difference - and the >> accelerated cryptographic hashes may even win out, because the intrinsics >> may be faster (less stuff of the I$, pipelined single instruction beating >> multiple simpler instructions, etc) than the software non-cryptographic >> hash. > > In practice, I am skeptical whether any 128- or 256-bit crypto hashes > will be as fast as the non-crypto hashes I mentioned, even on CPUs > with specific instructions for the crypto hashes. The non-crypto > hashes can (and do) take advantage of special CPU instructions as > well. > > But even if true that the crypto hashes approach the speed of > non-crypto hashes on certain CPUs, that does not provide a strong > argument for using the crypto hashes, since on the common x64 CPUs, > the non-crypto hashes I mentioned are significantly faster than the > equivalent crypto hashes. > > So, you have some rare architectures where the crypto hashes may > almost be as fast as the non-crypto, and common CPUs where the > non-crypto are much faster. That makes the non-crypto hash functions I > mentioned the obvious choice in the vast majority of systems. And as I said upthread, one benefit of the Crypto API is that the filesystem developers _no longer have to choose_. By using the shash or ahash interface to the Crypto API, the _user_ can choose *any* hash the kernel supports. And the default is (and will almost certainly continue to be) crc32, so the user would need to specify a hash anyway - making whether some other non- cryptographic hash is the "obvious choice" a completely moot point. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 12:35 PM, Austin S Hemmelgarn wrote: > My only reasoning is that with this set of hashes (crc32c, adler32, and > md5), the statistical likely-hood of running into a hash collision with more > than one of them at a time is infinitesimally small compared to the > likely-hood of any one of them having a collision (or even compared to > something ridiculous like the probability of being killed by a meteor > strike), and the combination is faster on most systems that I have tried > than many 256-bit crypto hashes. I have not seen any evidence that combining hashes like that actually reduces the chances of collision, but if we assume it does, then again, the non-crypto hashes would be faster. For example, 128-bit Spooky2 combined with 128-bit CityHash would produce a 256-bit hash and would be faster than MD5 + whatever. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 12:08 PM, Alex Elsayed wrote: > Actually, I said "Sure" here, but this isn't strictly true. At some point, > you're more memory-bound than CPU-bound, and with CPU intrinsic instructions > (like SPARC and recent x86 have for SHA) you're often past that. Then, > you're not going to see any real difference - and the accelerated > cryptographic hashes may even win out, because the intrinsics may be faster > (less stuff of the I$, pipelined single instruction beating multiple simpler > instructions, etc) than the software non-cryptographic hash. In practice, I am skeptical whether any 128- or 256-bit crypto hashes will be as fast as the non-crypto hashes I mentioned, even on CPUs with specific instructions for the crypto hashes. The non-crypto hashes can (and do) take advantage of special CPU instructions as well. But even if true that the crypto hashes approach the speed of non-crypto hashes on certain CPUs, that does not provide a strong argument for using the crypto hashes, since on the common x64 CPUs, the non-crypto hashes I mentioned are significantly faster than the equivalent crypto hashes. So, you have some rare architectures where the crypto hashes may almost be as fast as the non-crypto, and common CPUs where the non-crypto are much faster. That makes the non-crypto hash functions I mentioned the obvious choice in the vast majority of systems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014-12-01 13:37, David Sterba wrote: On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote: On 2014-11-26 08:38, Brendan Hide wrote: On 2014/11/25 18:47, David Sterba wrote: We could provide an interface for external applications that would make use of the strong checksums. Eg. external dedup, integrity db. The benefit here is that the checksum is always up to date, so there's no need to compute the checksums again. At the obvious cost. I can imagine some use-cases where you might even want more than one algorithm to be used and stored. Not sure if that makes me a madman, though. ;) Not crazy at all, I would love to have the ability to store multiple different weak but fast hash values. For example, on my laptop, it is actually faster to compute crc32c, adler32, and md5 hashes together than it is to compute pretty much any 256-bit hash I've tried. Well, this is doable :) there's space for 256 bits in general, the order of checksum bytes in one "checksum word" would be given by fixed order the algorighms are defined. The code complexity would increase, but not that much I think. This then brings up the issue of what to do when we try to mount such a fs on a system that doesn't support some or all of the hashes used. I see two modes: first fail if all not present, or relaxed by a mount option to accept at least one. But let's keep this open, I'm not yet convinced that combining more weak algos makes sense from the crypto POV. If this should protect against random bitflips, would one fast-but-weak be comparable to a combination? Or other expectations. My only reasoning is that with this set of hashes (crc32c, adler32, and md5), the statistical likely-hood of running into a hash collision with more than one of them at a time is infinitesimally small compared to the likely-hood of any one of them having a collision (or even compared to something ridiculous like the probability of being killed by a meteor strike), and the combination is faster on most systems that I have tried than many 256-bit crypto hashes. It's still a tradeoff though, I also think that the idea mentioned elsewhere in this thread of having separate hashes stored for subsections of the same block is also worth looking at. smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014-12-01 14:34, Alex Elsayed wrote: Alex Elsayed wrote: * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non- cryptographic hashes simply because those require multiple registers to store the state if nothing else; which makes this a rather strong argument that _hardware matters a heck of a lot_, quite possibly _more_ than the algorithm. Ah, correction - it seems he was comparing his own implementations, rather than the Crypto API ones - but the points still hold, seeing as the Crypto API does provide both algorithms. Actually, I did the tests using the userspace interface to the kernel's Crypto API. smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Alex Elsayed wrote: > John Williams wrote: >> Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I >> am talking about can and do take advantage of CPU architecture. For >> 128- and 256-bit hashes, one (or more) of those three will be >> significantly faster than any crypto hash in the Crypto API, >> regardless of the CPU it is run on. > > Sure. Actually, I said "Sure" here, but this isn't strictly true. At some point, you're more memory-bound than CPU-bound, and with CPU intrinsic instructions (like SPARC and recent x86 have for SHA) you're often past that. Then, you're not going to see any real difference - and the accelerated cryptographic hashes may even win out, because the intrinsics may be faster (less stuff of the I$, pipelined single instruction beating multiple simpler instructions, etc) than the software non-cryptographic hash. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 11:28 AM, Alex Elsayed > wrote: > >> I think there's a fundamental set of points being missed. > > That may be true, but it is not me who is missing them. >> * The Crypto API can be used to access non-cryptographic hashes. Full >> stop. > > Irrelevant to my point. I am talking about specific non-cryptographic > hashes, and they are not currently in the Crypto API. Yes, but they're not anywhere else in the kernel either. >> * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the >> Crypto API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto >> API), and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit >> non- cryptographic hashes simply because those require multiple registers >> to store the state if nothing else; which makes this a rather strong >> argument that _hardware matters a heck of a lot_, quite possibly _more_ >> than the algorithm. > > Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I > am talking about can and do take advantage of CPU architecture. For > 128- and 256-bit hashes, one (or more) of those three will be > significantly faster than any crypto hash in the Crypto API, > regardless of the CPU it is run on. Sure. > As for the possibility of adding more hash functions to Crypto API for > btrfs to use, I do not believe I have argued against it, so I am not > sure why you repeated the point. It seems to me that is a discussion > that must be had with the maintainer(s) of Crypto API (will they > accept additional non-crypto 128- and 256-bit hash functions, etc.) In that case, I'm not sure what the reason for the thread continuing is? If they go in the Crypto API, there's no need to argue against cryptographic hashes either - it becomes the user's choice. That's pretty much the entire reason I kept responding; I figured that arguing against the cryptographic hashes _was_ an objection to the Crypto API, since they're basically a freebie for no effort if we use it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 11:28 AM, Alex Elsayed wrote: > I think there's a fundamental set of points being missed. That may be true, but it is not me who is missing them. > * The Crypto API can be used to access non-cryptographic hashes. Full stop. Irrelevant to my point. I am talking about specific non-cryptographic hashes, and they are not currently in the Crypto API. > * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto > API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and > SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non- > cryptographic hashes simply because those require multiple registers to > store the state if nothing else; which makes this a rather strong argument > that _hardware matters a heck of a lot_, quite possibly _more_ than the > algorithm. Again, irrelevant. The Spooky2, CityHash256, and Murmur3 hashes that I am talking about can and do take advantage of CPU architecture. For 128- and 256-bit hashes, one (or more) of those three will be significantly faster than any crypto hash in the Crypto API, regardless of the CPU it is run on. As for the possibility of adding more hash functions to Crypto API for btrfs to use, I do not believe I have argued against it, so I am not sure why you repeated the point. It seems to me that is a discussion that must be had with the maintainer(s) of Crypto API (will they accept additional non-crypto 128- and 256-bit hash functions, etc.) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Alex Elsayed wrote: > * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto > API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), > and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non- > cryptographic hashes simply because those require multiple registers to > store the state if nothing else; which makes this a rather strong argument > that _hardware matters a heck of a lot_, quite possibly _more_ than the > algorithm. Ah, correction - it seems he was comparing his own implementations, rather than the Crypto API ones - but the points still hold, seeing as the Crypto API does provide both algorithms. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Mon, Dec 1, 2014 at 9:42 AM, Austin S Hemmelgarn > Except most of > the CPU optimized hashes aren't crypto hashes (other than the >> various SHA implementations). Furthermore, I've actually tested the >> speed of a generic CRC32c implementation versus SHA-1 using the SHA >> instructions on an UltraSPARC processor, and the difference ammounts to a >> few microseconds in _favor_ of the optimized crypto hash; and I've run >> the math for every other ISA that has instructions for computing SHA >> hashes (I don't have the hardware for any of the others), and expect >> similar results for those as well. > > I think the confusion here is that I am talking about 128-bit and > 256-bit hashes, which is what you would choose for filesystem > checksums if you want to have extremely strong collision resistance > (eg., you could also use it for dedup). > > You seem to be talking about 32-bit (and maybe 64-bit) hashes. > > The speed difference between crypto 128- and 256-bit hashes and > non-crypto equivalents that I have mentioned is an order of magnitude > or more. I think there's a fundamental set of points being missed. * The Crypto API can be used to access non-cryptographic hashes. Full stop. * He was comparing CRC32 (a 32-bit non-cryptographic hash, *via the Crypto API*) against SHA-1 (a 128-bit cryptographic hash, via the Crypto API), and SHA-1 _still_ won. CRC32 tends to beat the pants off 128-bit non- cryptographic hashes simply because those require multiple registers to store the state if nothing else; which makes this a rather strong argument that _hardware matters a heck of a lot_, quite possibly _more_ than the algorithm. Even if SHA-1 in software is vastly slower than CityHash or whatever in software, the Crypto API implementation *may not be purely software*. * The main benefit of the Crypto API is not any specific hash, it's that it's a _common API_ for _using any supported hash_. * Your preferred non-cryptographic hashes can, thus, be used _via_ the Crypto API. * This has benefits of: * Code reuse (for anyone else who wants to use such a hash). * Optimization opportunities (if a CPU implements some primitive, it can be leveraged in an arch-specific implementation, which the Crypto API will use _automatically_). * Flexibility (by using the Crypto API, _any_ supported hash can be used generically, so the _user_ can decide whether they want rather than a small, hard-coded menu of options in btrfs). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Thu, Nov 27, 2014 at 11:52:20AM +0800, Liu Bo wrote: > > There are several checksum algorithms that trade off speed and strength > > so we may want to support more than just sha256. Easy to add but I'd > > rather see them added in all at once than one by one. > > > > Another question is if we'd like to use different checksum for data and > > metadata. This would not cost any format change if we use the 2 bytes in > > super block csum_type. > > Yes, but breaking it into meta_csum_type and data_csum_type will need a > imcompat flag. Not necessarily a new bit. If we read the field as-is, see if it's zero we know it's the previous version, otherwise the new one and then set only in-memory fileds for data and metadata. The backward compatibility is fine, old kernels will refuse to mount with csum_type != 0. > > Optional/crazy/format change stuff: > > > > * per-file checksum algorithm - unlike compression, the whole file would > > have to use the same csum algo > > reflink would work iff the algos match > > snapshotting is unaffected > > > > * per-subvolume checksum algorithm - specify the csum type at creation > > time, or afterwards unless it's modified > > I thought about this before, if we enable this, a few cases need to be dealt > with(at least), > 1. convert file data's csum from one algorithm to another On-line or offline? I'd rather avoid doing that on a mounted filesystem. > 2. to make different checksum length co-exist, we can either use different >key.type for different algorithms, or pack checksum into a new structure > that >has algorithm types(and length). Oh right, the mixed sizes of checksums could be a problem and would require a format change (and thus the incompatibility bit). The key.type approach looks better, we'd encode the algorithm type effectively, the item bytes contain only fixed-size checksums. (Here I'm thinking a new BTRFS_EXTENT_CSUM_KEY per checksum type.) OTOH storing the algo type (size is not needed) would add overhead per-checksum (probably only a single byte but still). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Wed, Nov 26, 2014 at 08:58:50AM -0500, Austin S Hemmelgarn wrote: > On 2014-11-26 08:38, Brendan Hide wrote: > > On 2014/11/25 18:47, David Sterba wrote: > >> We could provide an interface for external applications that would make > >> use of the strong checksums. Eg. external dedup, integrity db. The > >> benefit here is that the checksum is always up to date, so there's no > >> need to compute the checksums again. At the obvious cost. > > > > I can imagine some use-cases where you might even want more than one > > algorithm to be used and stored. Not sure if that makes me a madman, > > though. ;) > > > Not crazy at all, I would love to have the ability to store multiple > different weak but fast hash values. For example, on my laptop, it is > actually faster to compute crc32c, adler32, and md5 hashes together than > it is to compute pretty much any 256-bit hash I've tried. Well, this is doable :) there's space for 256 bits in general, the order of checksum bytes in one "checksum word" would be given by fixed order the algorighms are defined. The code complexity would increase, but not that much I think. > This then brings up the issue of what to do when we try to mount such a > fs on a system that doesn't support some or all of the hashes used. I see two modes: first fail if all not present, or relaxed by a mount option to accept at least one. But let's keep this open, I'm not yet convinced that combining more weak algos makes sense from the crypto POV. If this should protect against random bitflips, would one fast-but-weak be comparable to a combination? Or other expectations. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 9:42 AM, Austin S Hemmelgarn > Except most of the CPU optimized hashes aren't crypto hashes (other than the > various SHA implementations). Furthermore, I've actually tested the speed > of a generic CRC32c implementation versus SHA-1 using the SHA instructions > on an UltraSPARC processor, and the difference ammounts to a few > microseconds in _favor_ of the optimized crypto hash; and I've run the math > for every other ISA that has instructions for computing SHA hashes (I don't > have the hardware for any of the others), and expect similar results for > those as well. I think the confusion here is that I am talking about 128-bit and 256-bit hashes, which is what you would choose for filesystem checksums if you want to have extremely strong collision resistance (eg., you could also use it for dedup). You seem to be talking about 32-bit (and maybe 64-bit) hashes. The speed difference between crypto 128- and 256-bit hashes and non-crypto equivalents that I have mentioned is an order of magnitude or more. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014-12-01 12:22, John Williams wrote: On Mon, Dec 1, 2014 at 4:39 AM, Austin S Hemmelgarn wrote: Just because it's a filesystem doesn't always mean that speed is the most important thing. Personally, I can think of multiple cases where using a cryptographically strong hash would be preferable, for example: * On an fs used solely for backup purposes * On a fs used for /boot * On an fs spread across a very large near-line disk array and mounted by a system with a powerful CPU * Almost any other case where data integrity is more important than speed What does data integrity have to do with whether the hash is cryptographic or not? The primary difference between a cryptographic and non-cryptographic hash is that the non-cryptographic hash can be easily guessed / predicted (eg., an attack to deliberately create collisions) whereas the cryptographic hash cannot (given reasonable assumptions of CPU power). For filesystem checksums it is difficult to imagine a deliberate attack on the checksums. Consequently, the only really important quality for the hash besides speed is collision resistance. The non-crypto hashes that I have mentioned in this thread have excellent collision resistant properties. I'm not saying they don't have excellent collision resistance properties. I'm also not saying that we shouldn't support such non-cryptographic hashes, just that we shouldn't explicitly NOT support other hashes, and that if we are going to support more than one hash algorithm, we should use the infrastructure already in place in the kernel for such things because it greatly simplifies maintaining the code. In fact, if I had the time, I'd just write CryptoAPI implementations of those hashes myself. The biggest reason to use the in-kernel Crypto API though, is that it gives a huge amount of flexibility, and provides pretty much transparent substitution of CPU optimized versions of the exported hash functions (for example, you don't have to know whether or not your processor supports Intel's CRC32 ISA extensions). Which is worse than useless if the CPU-optimized crypto hash is slower than the default non-crypto hash, and that will almost always be the case. Besides, there is nothing magic happening in the Crypto API library. If you implement your own hash, you can easily do a few checks and choose the best code for the CPU. Except most of the CPU optimized hashes aren't crypto hashes (other than the various SHA implementations). Furthermore, I've actually tested the speed of a generic CRC32c implementation versus SHA-1 using the SHA instructions on an UltraSPARC processor, and the difference ammounts to a few microseconds in _favor_ of the optimized crypto hash; and I've run the math for every other ISA that has instructions for computing SHA hashes (I don't have the hardware for any of the others), and expect similar results for those as well. smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Dec 1, 2014 at 4:39 AM, Austin S Hemmelgarn wrote: > Just because it's a filesystem doesn't always mean that speed is the most > important thing. Personally, I can think of multiple cases where using a > cryptographically strong hash would be preferable, for example: > * On an fs used solely for backup purposes > * On a fs used for /boot > * On an fs spread across a very large near-line disk array and mounted >by a system with a powerful CPU > * Almost any other case where data integrity is more important than >speed What does data integrity have to do with whether the hash is cryptographic or not? The primary difference between a cryptographic and non-cryptographic hash is that the non-cryptographic hash can be easily guessed / predicted (eg., an attack to deliberately create collisions) whereas the cryptographic hash cannot (given reasonable assumptions of CPU power). For filesystem checksums it is difficult to imagine a deliberate attack on the checksums. Consequently, the only really important quality for the hash besides speed is collision resistance. The non-crypto hashes that I have mentioned in this thread have excellent collision resistant properties. > The biggest reason to use the in-kernel Crypto API though, is that it gives > a huge amount of flexibility, and provides pretty much transparent > substitution of CPU optimized versions of the exported hash functions (for > example, you don't have to know whether or not your processor supports > Intel's CRC32 ISA extensions). Which is worse than useless if the CPU-optimized crypto hash is slower than the default non-crypto hash, and that will almost always be the case. Besides, there is nothing magic happening in the Crypto API library. If you implement your own hash, you can easily do a few checks and choose the best code for the CPU. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014-11-29 16:21, John Williams wrote: On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed wrote: I'd suggest looking more closely at the crypto api section of menuconfig - it already has crc32c, among others. Just because it's called the "crypto api" doesn't mean it only has cryptographically-strong algorithms. I have looked. What 128- or 256-bit hash functions in "crypto api" are you referring to that are as fast as Spooky2 or CityHash? Just because it's a filesystem doesn't always mean that speed is the most important thing. Personally, I can think of multiple cases where using a cryptographically strong hash would be preferable, for example: * On an fs used solely for backup purposes * On a fs used for /boot * On an fs spread across a very large near-line disk array and mounted by a system with a powerful CPU * Almost any other case where data integrity is more important than speed The biggest reason to use the in-kernel Crypto API though, is that it gives a huge amount of flexibility, and provides pretty much transparent substitution of CPU optimized versions of the exported hash functions (for example, you don't have to know whether or not your processor supports Intel's CRC32 ISA extensions). smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Sun, 2014-11-30 at 23:05 +, Dimitri John Ledkov wrote: > Nope, we should use standard names. Well I wouldn't know that there is really a standardised name in the sense that it tells it's mandatory. People use SHA2-xxx, SHA-xxx, SHAxxx and probably even more combinations. And just because something was started short-sighted and in a wrong way it doesn't mean one cannot correct it, which is why we try to no longer use e.g. KB but kB or KiB. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 30 November 2014 at 22:59, Christoph Anton Mitterer wrote: >>Agree with others about -C 256...-C sha256 is only three >>letters more ;) > > Ideally, sha2-256 would be used, since there will be (are) other > versions of sha which have 256 bits size. > Nope, we should use standard names. SHA-2 256 was the first SHA algo to use 256 bits, thus it's commonly referred to as sha256 across the board in multiple pieces of software. SHA-3 family of hashes started to have the same length and thus will be known as sha3-256 etc. Shorthand variant names in this table here http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions appear to me how SHA hashes are currently referred as. -- Regards, Dimitri. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
>Agree with others about -C 256...-C sha256 is only three >letters more ;) Ideally, sha2-256 would be used, since there will be (are) other versions of sha which have 256 bits size. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
>Agree with others about -C 256...-C sha256 is only three >letters more ;) Ideally, sha2-256 would be used, since there will be (are) other versions of sha which have 256 bits size. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed > wrote: >> I'd suggest looking more closely at the crypto api section of menuconfig >> - it already has crc32c, among others. Just because it's called the >> "crypto api" doesn't mean it only has cryptographically-strong >> algorithms. > > I have looked. What 128- or 256-bit hash functions in "crypto api" are > you referring to that are as fast as Spooky2 or CityHash? I'm saying that neither of those are in the kernel _anywhere_ now, so if someone's adding them the sensible thing seems to be to add them to the crypto api, access them through it, and then if we ever add more we get them for free on the btrfs side instead of needing to reinvent the wheel every time. In short, there's a place for hashes - why not use it? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Sat, Nov 29, 2014 at 1:07 PM, Alex Elsayed wrote: > I'd suggest looking more closely at the crypto api section of menuconfig - > it already has crc32c, among others. Just because it's called the "crypto > api" doesn't mean it only has cryptographically-strong algorithms. I have looked. What 128- or 256-bit hash functions in "crypto api" are you referring to that are as fast as Spooky2 or CityHash? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
John Williams wrote: > On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed > wrote: >> Why not just use the kernel crypto API? Then the user can just specify >> any hash the kernel supports. > > One reason is that crytographic hashes are an order of magnitude > slower than the fastest non-cryptographic hashes. And for filesystem > checksums, I do not see a need for crypotgraphic hashes. I'd suggest looking more closely at the crypto api section of menuconfig - it already has crc32c, among others. Just because it's called the "crypto api" doesn't mean it only has cryptographically-strong algorithms. As a side benefit, if someone implements (say) SipHash for it, then not only could btrfs benefit, but also all other users of the API, including (now) userspace. The crypto api also has compression, for zlib/lzo/lz4/lz4hc, but I'm given to understand that btrfs' usage of compression doesn't match well to that API. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Sat, Nov 29, 2014 at 12:38 PM, Alex Elsayed wrote: > Why not just use the kernel crypto API? Then the user can just specify any > hash the kernel supports. One reason is that crytographic hashes are an order of magnitude slower than the fastest non-cryptographic hashes. And for filesystem checksums, I do not see a need for crypotgraphic hashes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
David Sterba wrote: > On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote: >> This brings a strong-but-slow checksum algorithm, sha256. >> >> Actually btrfs used sha256 at the early time, but then moved to crc32c >> for performance purposes. >> >> As crc32c is sort of weak due to its hash collision issue, we need a >> stronger algorithm as an alternative. >> >> Users can choose sha256 from mkfs.btrfs via >> >> $ mkfs.btrfs -C 256 /device > > There's already some good feedback so I'll try to cover what hasn't been > mentioned yet. > > I think it's better to separate the preparatory works from adding the > algorithm itself. The former can be merged (and tested) independently. > > There are several checksum algorithms that trade off speed and strength > so we may want to support more than just sha256. Easy to add but I'd > rather see them added in all at once than one by one. Why not just use the kernel crypto API? Then the user can just specify any hash the kernel supports. > Another question is if we'd like to use different checksum for data and > metadata. This would not cost any format change if we use the 2 bytes in > super block csum_type. Mmm, that might be a good reason - although maybe store an entry in some tree of the full crypto api spec, and have a special value of one byte meaning "crypto API" and the other byte counts how many bytes the csum is. > Optional/crazy/format change stuff: > > * per-file checksum algorithm - unlike compression, the whole file would > have to use the same csum algo > reflink would work iff the algos match > snapshotting is unaffected > > * per-subvolume checksum algorithm - specify the csum type at creation > time, or afterwards unless it's modified > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html W -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Tue, Nov 25, 2014 at 05:39:05PM +0100, David Sterba wrote: > On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote: > > This brings a strong-but-slow checksum algorithm, sha256. > > > > Actually btrfs used sha256 at the early time, but then moved to crc32c for > > performance purposes. > > > > As crc32c is sort of weak due to its hash collision issue, we need a > > stronger > > algorithm as an alternative. > > > > Users can choose sha256 from mkfs.btrfs via > > > > $ mkfs.btrfs -C 256 /device > > There's already some good feedback so I'll try to cover what hasn't been > mentioned yet. > > I think it's better to separate the preparatory works from adding the > algorithm itself. The former can be merged (and tested) independently. That's a good point. > > There are several checksum algorithms that trade off speed and strength > so we may want to support more than just sha256. Easy to add but I'd > rather see them added in all at once than one by one. > > Another question is if we'd like to use different checksum for data and > metadata. This would not cost any format change if we use the 2 bytes in > super block csum_type. Yes, but breaking it into meta_csum_type and data_csum_type will need a imcompat flag. > > > Optional/crazy/format change stuff: > > * per-file checksum algorithm - unlike compression, the whole file would > have to use the same csum algo > reflink would work iff the algos match > snapshotting is unaffected > > * per-subvolume checksum algorithm - specify the csum type at creation > time, or afterwards unless it's modified I thought about this before, if we enable this, a few cases need to be dealt with(at least), 1. convert file data's csum from one algorithm to another 2. to make different checksum length co-exist, we can either use different key.type for different algorithms, or pack checksum into a new structure that has algorithm types(and length). thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Wed, Nov 26, 2014 at 4:50 AM, Holger Hoffstätte wrote: > On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote: > >> 2) CityHash : for 256-bit hashes on all systems >> https://code.google.com/p/cityhash/ > > Btw this is now superseded by Farmhash: > https://code.google.com/p/farmhash/ > It seems FarmHash is not a complete replacement for CityHash. Specifically, I don't see a Fingerprint256() function in FarmHash, so no 256-bit fingerprints (unless I am missing something?). Also, it seems that FarmHash's Fingerprint128() hash is just CityHash128(). Unless I am misreading it, I think FarmHash is mostly useful for 32-bit and 64-bit hashes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014-11-26 08:38, Brendan Hide wrote: On 2014/11/25 18:47, David Sterba wrote: We could provide an interface for external applications that would make use of the strong checksums. Eg. external dedup, integrity db. The benefit here is that the checksum is always up to date, so there's no need to compute the checksums again. At the obvious cost. I can imagine some use-cases where you might even want more than one algorithm to be used and stored. Not sure if that makes me a madman, though. ;) Not crazy at all, I would love to have the ability to store multiple different weak but fast hash values. For example, on my laptop, it is actually faster to compute crc32c, adler32, and md5 hashes together than it is to compute pretty much any 256-bit hash I've tried. This then brings up the issue of what to do when we try to mount such a fs on a system that doesn't support some or all of the hashes used. smime.p7s Description: S/MIME Cryptographic Signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014/11/25 18:47, David Sterba wrote: We could provide an interface for external applications that would make use of the strong checksums. Eg. external dedup, integrity db. The benefit here is that the checksum is always up to date, so there's no need to compute the checksums again. At the obvious cost. I can imagine some use-cases where you might even want more than one algorithm to be used and stored. Not sure if that makes me a madman, though. ;) -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014/11/25 13:30, Liu Bo wrote: This is actually inspired by ZFS, who offers checksum functions ranging from the simple-and-fast fletcher2 to the slower-but-secure sha256. Back to btrfs, crc32c is the only choice. And also for the slowness of sha256, Intel has a set of instructions for it, "Intel SHA Extensions", that may help a lot. I think the advantage will be in giving a choice with some strong suggestions: An example of suggestions - if using sha256 on an old or "low-power" CPU, detect that the CPU doesn't support the appropriate acceleration functions and print a warning at mount or a warning-and-prompt at mkfs-time. The default could even be changed based on the architecture - though I suspect crc32c is already a good default on most architectures. The choice allowance gives flexibility where admins know it optimally could be used - and David's suggestion (separate thread) would be able to take advantage of that. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Tue, 25 Nov 2014 15:17:58 -0800, John Williams wrote: > 2) CityHash : for 256-bit hashes on all systems > https://code.google.com/p/cityhash/ Btw this is now superseded by Farmhash: https://code.google.com/p/farmhash/ -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Tue, Nov 25, 2014 at 2:30 AM, Liu Bo wrote: > On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote: >> For example, Spooky V2 hash is 128 bits and is very fast. It is >> noncryptographic, but it is more than adequate for data checksums. >> >> http://burtleburtle.net/bob/hash/spooky.html >> >> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine >> (Xeon E3-1270 V2 @ 3.50Ghz) > > Thanks for the suggestion, I'll take a look. > > Btw, it's not in kernel yet, is it? No, as far as I know, it is not in the kernel. By the way, as for the suggestion of blake2 hash, note that it is much slower than Spooky V2 hash. That is to be expected, since blake2 is a cryptographic hash (even if it is one that is fast relative to other cryptographic hashes) and as a class, cryptographic hashes tend to be an order of magnitude slower than the fastest noncryptographic hashes. The hashes that I would recommend for use with btrfs checksums are: 1) SpookyHash V2 : for 128 bit hashes on 64-bit systems http://burtleburtle.net/bob/hash/spooky.html 2) CityHash : for 256-bit hashes on all systems https://code.google.com/p/cityhash/ 3) Murmur3 :for 128-bit hashes on 32-bit systems (since Spooky and City are not the fastest on most 32-bit systems) https://code.google.com/p/smhasher/wiki/MurmurHash3 All of those are noncryptographic, but they all have good properties that should make them more than adequate for data checksums and dedup usage. For more information, here are some comparisons of fast hash functions (note that these comparisons were written 2 to 3 years ago): http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html http://research.neustar.biz/tag/spooky-hash/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On 2014-11-25 17:47, David Sterba wrote: > On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: >> On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: >>> This brings a strong-but-slow checksum algorithm, sha256. >>> >>> Actually btrfs used sha256 at the early time, but then moved to >>> crc32c for >>> performance purposes. >>> >>> As crc32c is sort of weak due to its hash collision issue, we need a >>> stronger >>> algorithm as an alternative. >>> >>> Users can choose sha256 from mkfs.btrfs via >>> >>> $ mkfs.btrfs -C 256 /device >> >> Agree with others about -C 256...-C sha256 is only three letters more ;) >> >> What's the target for this mode? Are we trying to find evil people >> scribbling on the drive, or are we trying to find bad hardware? > > We could provide an interface for external applications that would make > use of the strong checksums. Eg. external dedup, integrity db. The > benefit here is that the checksum is always up to date, so there's no > need to compute the checksums again. At the obvious cost. Yes, pleease! Regards, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: > On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: > > This brings a strong-but-slow checksum algorithm, sha256. > > > > Actually btrfs used sha256 at the early time, but then moved to > > crc32c for > > performance purposes. > > > > As crc32c is sort of weak due to its hash collision issue, we need a > > stronger > > algorithm as an alternative. > > > > Users can choose sha256 from mkfs.btrfs via > > > > $ mkfs.btrfs -C 256 /device > > Agree with others about -C 256...-C sha256 is only three letters more ;) > > What's the target for this mode? Are we trying to find evil people > scribbling on the drive, or are we trying to find bad hardware? We could provide an interface for external applications that would make use of the strong checksums. Eg. external dedup, integrity db. The benefit here is that the checksum is always up to date, so there's no need to compute the checksums again. At the obvious cost. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 01:23:05PM +0800, Liu Bo wrote: > This brings a strong-but-slow checksum algorithm, sha256. > > Actually btrfs used sha256 at the early time, but then moved to crc32c for > performance purposes. > > As crc32c is sort of weak due to its hash collision issue, we need a stronger > algorithm as an alternative. > > Users can choose sha256 from mkfs.btrfs via > > $ mkfs.btrfs -C 256 /device There's already some good feedback so I'll try to cover what hasn't been mentioned yet. I think it's better to separate the preparatory works from adding the algorithm itself. The former can be merged (and tested) independently. There are several checksum algorithms that trade off speed and strength so we may want to support more than just sha256. Easy to add but I'd rather see them added in all at once than one by one. Another question is if we'd like to use different checksum for data and metadata. This would not cost any format change if we use the 2 bytes in super block csum_type. Optional/crazy/format change stuff: * per-file checksum algorithm - unlike compression, the whole file would have to use the same csum algo reflink would work iff the algos match snapshotting is unaffected * per-subvolume checksum algorithm - specify the csum type at creation time, or afterwards unless it's modified -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: > On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: > >This brings a strong-but-slow checksum algorithm, sha256. > > > >Actually btrfs used sha256 at the early time, but then moved to > >crc32c for > >performance purposes. > > > >As crc32c is sort of weak due to its hash collision issue, we need > >a stronger > >algorithm as an alternative. > > > >Users can choose sha256 from mkfs.btrfs via > > > >$ mkfs.btrfs -C 256 /device > > Agree with others about -C 256...-C sha256 is only three letters more ;) That's right, #stupidme > > What's the target for this mode? Are we trying to find evil people > scribbling on the drive, or are we trying to find bad hardware? This is actually inspired by ZFS, who offers checksum functions ranging from the simple-and-fast fletcher2 to the slower-but-secure sha256. Back to btrfs, crc32c is the only choice. And also for the slowness of sha256, Intel has a set of instructions for it, "Intel SHA Extensions", that may help a lot. Not insisting on it, I'm always open to any suggestions. Btw, having played with merkle tree for a while, however, making good use of our existing scrub looks more promising for implemening the feature that detects changes between mounts. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
2014-11-25 11:30 GMT+01:00 Liu Bo : > On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote: >> On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte >> wrote: >> >> > Would there be room for a compromise with e.g. 128 bits? >> >> For example, Spooky V2 hash is 128 bits and is very fast. It is >> noncryptographic, but it is more than adequate for data checksums. >> >> http://burtleburtle.net/bob/hash/spooky.html >> >> SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine >> (Xeon E3-1270 V2 @ 3.50Ghz) > > Thanks for the suggestion, I'll take a look. > > Btw, it's not in kernel yet, is it? > The best option would be blake2b, but it isn't implemented in the kernel. It is not a problem to use it locally (I can upload the code stripped for usage in kernel). from https://blake2.net/ Q: Why do you want BLAKE2 to be fast? Aren't fast hashes bad? A: You want your hash function to be fast if you are using it to compute the secure hash of a large amount of data, such as in distributed filesystems (e.g. Tahoe-LAFS), cloud storage systems (e.g. OpenStack Swift), intrusion detection systems (e.g. Samhain), integrity-checking local filesystems (e.g. ZFS), peer-to-peer file-sharing tools (e.g. BitTorrent), or version control systems (e.g. git). You only want your hash function to be slow if you're using it to "stretch" user-supplied passwords, in which case see the next question. https://blake2.net/ https://github.com/floodyberry/blake2b-opt Best regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote: > On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte > wrote: > > > Would there be room for a compromise with e.g. 128 bits? > > For example, Spooky V2 hash is 128 bits and is very fast. It is > noncryptographic, but it is more than adequate for data checksums. > > http://burtleburtle.net/bob/hash/spooky.html > > SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine > (Xeon E3-1270 V2 @ 3.50Ghz) Thanks for the suggestion, I'll take a look. Btw, it's not in kernel yet, is it? thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 08:23:25AM +, Holger Hoffstätte wrote: > On Mon, 24 Nov 2014 13:23:05 +0800, Liu Bo wrote: > > > This brings a strong-but-slow checksum algorithm, sha256. > > > > Actually btrfs used sha256 at the early time, but then moved to crc32c for > > performance purposes. > > > > As crc32c is sort of weak due to its hash collision issue, we need a > > stronger > > algorithm as an alternative. > > I'm curious - did you see actual cases where this happened, i.e. a corrupt > block that would pass crc32 validation? I know some high-integrity use > cases require a stronger algorithm - just wondering. Haven't see that so far, but here is a link for crc32c hash collision in btrfs, http://lwn.net/Articles/529077/, it's not data checksum though, btrfs's DIR_ITEM also use crc32c hash, if those happen to be data blocks, something interesting will happen. > > Would there be room for a compromise with e.g. 128 bits? Yeah, we're good if it's not larger than 256 bits. > > > Users can choose sha256 from mkfs.btrfs via > > > > $ mkfs.btrfs -C 256 /device > > Not sure how others feel about this, but it's probably easier for > sysadmins to specify the algorithm by name from the set of supported > ones, similar to how ssh does it ("ssh -C arcfour256"). Urr, my bad, I've made it locally but didn't 'git add' them. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 08:58:25PM +, Hugo Mills wrote: > On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: > > On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: > > >This brings a strong-but-slow checksum algorithm, sha256. > > > > > >Actually btrfs used sha256 at the early time, but then moved to > > >crc32c for > > >performance purposes. > > > > > >As crc32c is sort of weak due to its hash collision issue, we need > > >a stronger > > >algorithm as an alternative. > > > > > >Users can choose sha256 from mkfs.btrfs via > > > > > >$ mkfs.btrfs -C 256 /device > > > > Agree with others about -C 256...-C sha256 is only three letters more ;) > > > > What's the target for this mode? Are we trying to find evil people > > scribbling on the drive, or are we trying to find bad hardware? > >You're going to need a hell of a lot more infrastructure to deal > with the first of those two cases. If someone can write arbitrary data > to your storage without going through the filesystem, you've already > lost the game. If the filesystem can be arranged as a Merkle tree then you can store a copy of the root SHA256 with a signature to detect arbitrary tampering. Of course the magnitude of the "If" in that sentence is startlingly large, especially if you are starting from where btrfs is now. ;) >I don't know what the stats are like for random error detection > (probably just what you'd expect in the naive case -- 1/2^n chance of > failing to detect an error for an n-bit hash). More bits likely are > better for that, but how much CPU time do you want to burn on it? crc64 should be more than adequate for simple disk corruption errors. crc32's error rate works out to one false positive per dozen megabytes *of random errors*, and crc64 FP rate is a few billion times lower (one FP per petabyte or so). If you have the kind of storage subsystem that corrupts a petabyte of data, it'd be amazing if you could get anything out of your filesystem at all. >I could see this possibly being useful for having fewer false > positives when using the inbuilt checksums for purposes of dedup. Even then it's massive overkill. A 16TB filesystem will average about one hash collision from a good 64-bit hash. Compared to a 256bit hash, you'd be continuously maintaining a data structure on disk that is 96GB larger than it has to be to save an average of *one* 4K read during a full-filesystem dedup. If your users are filling your disks with data blocks that all have the same 64-bit hash (with any algorithm), SHA256 could be more attractive...but you'd probably still be OK at half that size. >Hugo. > > -- > Hugo Mills | That's not rain, that's a lake with slots in it > hugo@... carfax.org.uk | > http://carfax.org.uk/ | > PGP: 65E74AC0 | signature.asc Description: Digital signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Original Message Subject: Re: [RFC PATCH] Btrfs: add sha256 checksum option From: Hugo Mills To: Chris Mason Date: 2014年11月25日 04:58 On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: This brings a strong-but-slow checksum algorithm, sha256. Actually btrfs used sha256 at the early time, but then moved to crc32c for performance purposes. As crc32c is sort of weak due to its hash collision issue, we need a stronger algorithm as an alternative. Users can choose sha256 from mkfs.btrfs via $ mkfs.btrfs -C 256 /device Agree with others about -C 256...-C sha256 is only three letters more ;) What's the target for this mode? Are we trying to find evil people scribbling on the drive, or are we trying to find bad hardware? You're going to need a hell of a lot more infrastructure to deal with the first of those two cases. If someone can write arbitrary data to your storage without going through the filesystem, you've already lost the game. I don't know what the stats are like for random error detection (probably just what you'd expect in the naive case -- 1/2^n chance of failing to detect an error for an n-bit hash). More bits likely are better for that, but how much CPU time do you want to burn on it? Agree with this, sha256's extra CPU usage seems not so worthy. About the csum algorithm, personally I prefer algorithm with better error detection, not only the integration about the whole data, but the range where the error lies in. If btrfs can know, for example which 4K or 2K block the error lies in, it can drops only the range of data, not the whole tree block, which can do great help for later btrfsck things. In this point of view, 4 crc32 for 16K leaf/node (1 crc32 for 4K) may be more productive than single sha256. Thanks, Qu I could see this possibly being useful for having fewer false positives when using the inbuilt checksums for purposes of dedup. Hugo. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: > On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: > >This brings a strong-but-slow checksum algorithm, sha256. > > > >Actually btrfs used sha256 at the early time, but then moved to > >crc32c for > >performance purposes. > > > >As crc32c is sort of weak due to its hash collision issue, we need > >a stronger > >algorithm as an alternative. > > > >Users can choose sha256 from mkfs.btrfs via > > > >$ mkfs.btrfs -C 256 /device > > Agree with others about -C 256...-C sha256 is only three letters more ;) > > What's the target for this mode? Are we trying to find evil people > scribbling on the drive, or are we trying to find bad hardware? You're going to need a hell of a lot more infrastructure to deal with the first of those two cases. If someone can write arbitrary data to your storage without going through the filesystem, you've already lost the game. I don't know what the stats are like for random error detection (probably just what you'd expect in the naive case -- 1/2^n chance of failing to detect an error for an n-bit hash). More bits likely are better for that, but how much CPU time do you want to burn on it? I could see this possibly being useful for having fewer false positives when using the inbuilt checksums for purposes of dedup. Hugo. -- Hugo Mills | That's not rain, that's a lake with slots in it hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | signature.asc Description: Digital signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo wrote: This brings a strong-but-slow checksum algorithm, sha256. Actually btrfs used sha256 at the early time, but then moved to crc32c for performance purposes. As crc32c is sort of weak due to its hash collision issue, we need a stronger algorithm as an alternative. Users can choose sha256 from mkfs.btrfs via $ mkfs.btrfs -C 256 /device Agree with others about -C 256...-C sha256 is only three letters more ;) What's the target for this mode? Are we trying to find evil people scribbling on the drive, or are we trying to find bad hardware? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte wrote: > Would there be room for a compromise with e.g. 128 bits? For example, Spooky V2 hash is 128 bits and is very fast. It is noncryptographic, but it is more than adequate for data checksums. http://burtleburtle.net/bob/hash/spooky.html SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine (Xeon E3-1270 V2 @ 3.50Ghz) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
Holger Hoffstätte posted on Mon, 24 Nov 2014 08:23:25 + as excerpted: >> Users can choose sha256 from mkfs.btrfs via >> >> $ mkfs.btrfs -C 256 /device > > Not sure how others feel about this, but it's probably easier for > sysadmins to specify the algorithm by name from the set of supported > ones, similar to how ssh does it ("ssh -C arcfour256"). Yes. Simply 256 is waaay too generic for me to be comfortable with. 256 /what/? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, 24 Nov 2014 13:23:05 +0800, Liu Bo wrote: > This brings a strong-but-slow checksum algorithm, sha256. > > Actually btrfs used sha256 at the early time, but then moved to crc32c for > performance purposes. > > As crc32c is sort of weak due to its hash collision issue, we need a stronger > algorithm as an alternative. I'm curious - did you see actual cases where this happened, i.e. a corrupt block that would pass crc32 validation? I know some high-integrity use cases require a stronger algorithm - just wondering. Would there be room for a compromise with e.g. 128 bits? > Users can choose sha256 from mkfs.btrfs via > > $ mkfs.btrfs -C 256 /device Not sure how others feel about this, but it's probably easier for sysadmins to specify the algorithm by name from the set of supported ones, similar to how ssh does it ("ssh -C arcfour256"). cheers Holger -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] Btrfs: add sha256 checksum option
This brings a strong-but-slow checksum algorithm, sha256. Actually btrfs used sha256 at the early time, but then moved to crc32c for performance purposes. As crc32c is sort of weak due to its hash collision issue, we need a stronger algorithm as an alternative. Users can choose sha256 from mkfs.btrfs via $ mkfs.btrfs -C 256 /device Signed-off-by: Liu Bo --- fs/btrfs/Kconfig| 1 + fs/btrfs/check-integrity.c | 13 -- fs/btrfs/compression.c | 30 ++--- fs/btrfs/ctree.h| 8 +++- fs/btrfs/disk-io.c | 106 fs/btrfs/disk-io.h | 2 - fs/btrfs/file-item.c| 25 +-- fs/btrfs/free-space-cache.c | 8 ++-- fs/btrfs/hash.c | 47 fs/btrfs/hash.h | 9 +++- fs/btrfs/inode.c| 21 + fs/btrfs/ordered-data.c | 10 +++-- fs/btrfs/ordered-data.h | 9 ++-- fs/btrfs/scrub.c| 67 +--- 14 files changed, 237 insertions(+), 119 deletions(-) diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index a66768e..0a4f9e7 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -2,6 +2,7 @@ config BTRFS_FS tristate "Btrfs filesystem support" select CRYPTO select CRYPTO_CRC32C + select CRYPTO_SHA256 select ZLIB_INFLATE select ZLIB_DEFLATE select LZO_COMPRESS diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index cb7f3fe..98e1037 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -1808,8 +1808,11 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state, { struct btrfs_header *h; u8 csum[BTRFS_CSUM_SIZE]; - u32 crc = ~(u32)0; unsigned int i; + struct { + struct shash_desc shash; + char ctx[crypto_shash_descsize(state->root->fs_info->csum_tfm)]; + } desc; if (num_pages * PAGE_CACHE_SIZE < state->metablock_size) return 1; /* not metadata */ @@ -1819,14 +1822,18 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state, if (memcmp(h->fsid, state->root->fs_info->fsid, BTRFS_UUID_SIZE)) return 1; + desc.shash.tfm = state->root->fs_info->csum_tfm; + desc.shash.flags = 0; + crypto_shash_init(&desc.shash); + for (i = 0; i < num_pages; i++) { u8 *data = i ? datav[i] : (datav[i] + BTRFS_CSUM_SIZE); size_t sublen = i ? PAGE_CACHE_SIZE : (PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE); - crc = btrfs_crc32c(crc, data, sublen); + crypto_shash_update(&desc.shash, data, sublen); } - btrfs_csum_final(crc, csum); + crypto_shash_final(&desc.shash, csum); if (memcmp(csum, h->csum, state->csum_size)) return 1; diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index d3220d3..d10883f 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -78,7 +78,7 @@ struct compressed_bio { * the start of a variable length array of checksums only * used by reads */ - u32 sums; + u8 sums[]; }; static int btrfs_decompress_biovec(int type, struct page **pages_in, @@ -111,31 +111,29 @@ static int check_compressed_csum(struct inode *inode, struct page *page; unsigned long i; char *kaddr; - u32 csum; - u32 *cb_sum = &cb->sums; + u8 csum[BTRFS_CSUM_SIZE]; + u8 *cb_sum = cb->sums; + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + u16 csum_size = btrfs_super_csum_size(fs_info->super_copy); if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) return 0; for (i = 0; i < cb->nr_pages; i++) { page = cb->compressed_pages[i]; - csum = ~(u32)0; kaddr = kmap_atomic(page); - csum = btrfs_csum_data(kaddr, csum, PAGE_CACHE_SIZE); - btrfs_csum_final(csum, (char *)&csum); + btrfs_csum(fs_info, kaddr, PAGE_CACHE_SIZE, csum); kunmap_atomic(kaddr); - if (csum != *cb_sum) { + if (memcmp(csum, cb_sum, csum_size)) { btrfs_info(BTRFS_I(inode)->root->fs_info, - "csum failed ino %llu extent %llu csum %u wanted %u mirror %d", - btrfs_ino(inode), disk_start, csum, *cb_sum, - cb->mirror_num); + "csum failed ino %llu extent %llu mirror %d", + btrfs_ino(inode), disk_start, cb->mirror_num); ret = -EIO; goto fail; } - cb_sum++; - + cb_sum += csum_size; } ret = 0; fail: @@ -578,7 +576,8 @@ int btrfs_submit_compressed_