Re: [9fans] In case anyone worries about block hash collision in venti
> > If you are really paranoid and don't want any collisions in the next > > 10 years: don't let strangers in your venti. > > Which, to close the circle, as Tim points out, you are always doing, > each time you receive an email :-) not all email is from strangers. in mbox format, messages are concatinated. for a block size of b, the size of the existing mailbox is 0 ... b-1 mod b. therefore, there is a 1/b chance of the attack succeeding, even if the attacker knows the block size and pads to the next block. i suppose that this is a fair assumption, since we're given that the attacker is attacking venti. this is esentially the same reason that mbox format will eat alot of storage, even if you're using venti. - erik
Re: [9fans] In case anyone worries about block hash collision in venti
On Tue, Feb 9, 2010 at 5:13 AM, hiro <23h...@googlemail.com> wrote: > If you are really paranoid and don't want any collisions in the next > 10 years: don't let strangers in your venti. Which, to close the circle, as Tim points out, you are always doing, each time you receive an email :-) ron
Re: [9fans] In case anyone worries about block hash collision in venti
>I feel compelled to add that hardware >uncorrectable bitflips are still reported as erasures, whereas venti >collisions are reported as success and only caught if somebody's doing >checksumming at larger granularity. Uncorrectable bitflips can also be unreported. If you are really paranoid and don't want any collisions in the next 10 years: don't let strangers in your venti. On 2/9/10, Nathaniel W Filardo wrote: > On Sun, Feb 07, 2010 at 10:08:39PM +, matt wrote: >> not only has someone got to find a collision during a tiny timeframe, >> they also have to fit it in 8k > > MD5 collisions can, apparently, be constructed in 24 hours on a laptop these > days. Yes, it's a constraint, but. > > Venti supports larger blocks than 8K, and even if it didn't, the exhibited > MD5 collisions to date are short strings and would fit. Working with larger > data feels like it would be unlikely to be advantageous to the collision > search -- 8K is much more than 20 bytes, and so there should be plenty of > pidgeons to cram into holes. > > In any case, I fear that the point has been lost in details. Something > about forests and trees. So: > > The point was not to say that you shouldn't be using venti; it's a lovely > idea and the code does its job somewhat well, bugs aside. The point was to > object to the kind of analysis which postulates that you need to store 2^98 > blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions > start be meaningful. > > If you're storing iid uniformly-selected-at-random blocks of iid uniform > noise then yes, that's true, but you're not and besides the instantiation of > the random oracle model you're using (SHA-1) isn't perfect. As such, we > should expect to see collisions after many fewer blocks stored -- strictly, > the iid uniform-at-random selection criterion yields an upper bound. > > One way to demonstrate this upper-bound nature is to exhibit work on hash > collisions and the reduction in estimated security margins. To take an > extreme case, the Fletcher checksum or your favorite CRC polynomial can be > extended to be perfectly valid 160 bit hash, but the lack of cryptographic > security makes it laughably easy to collide blocks. Surely you'd be dubious > of a Venti where I replaced SHA-1 with such an extended Fletcher? (It'd be > faster, too!) Why? Selection of blocks iid uniformly-at-random should mean > that it will take exactly as many blocks as before, before you see a > problem... SHA-1 is not laughably easy to break by any stretch of the > imagination, but it also isn't the "impossibly hard" you'd get in the random > oracle model either. > > (While nitpicking about the analysis, I feel compelled to add that hardware > uncorrectable bitflips are still reported as erasures, whereas venti > collisions are reported as success and only caught if somebody's doing > checksumming at larger granularity. So at least in the one case you know > you're in trouble. Collisions in venti act as if the disk corrupted so many > bits that we move into the correctable region for a different codeword.) > > --nwf; >
Re: [9fans] In case anyone worries about block hash collision in venti
On Sun, Feb 07, 2010 at 10:08:39PM +, matt wrote: > not only has someone got to find a collision during a tiny timeframe, > they also have to fit it in 8k MD5 collisions can, apparently, be constructed in 24 hours on a laptop these days. Yes, it's a constraint, but. Venti supports larger blocks than 8K, and even if it didn't, the exhibited MD5 collisions to date are short strings and would fit. Working with larger data feels like it would be unlikely to be advantageous to the collision search -- 8K is much more than 20 bytes, and so there should be plenty of pidgeons to cram into holes. In any case, I fear that the point has been lost in details. Something about forests and trees. So: The point was not to say that you shouldn't be using venti; it's a lovely idea and the code does its job somewhat well, bugs aside. The point was to object to the kind of analysis which postulates that you need to store 2^98 blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions start be meaningful. If you're storing iid uniformly-selected-at-random blocks of iid uniform noise then yes, that's true, but you're not and besides the instantiation of the random oracle model you're using (SHA-1) isn't perfect. As such, we should expect to see collisions after many fewer blocks stored -- strictly, the iid uniform-at-random selection criterion yields an upper bound. One way to demonstrate this upper-bound nature is to exhibit work on hash collisions and the reduction in estimated security margins. To take an extreme case, the Fletcher checksum or your favorite CRC polynomial can be extended to be perfectly valid 160 bit hash, but the lack of cryptographic security makes it laughably easy to collide blocks. Surely you'd be dubious of a Venti where I replaced SHA-1 with such an extended Fletcher? (It'd be faster, too!) Why? Selection of blocks iid uniformly-at-random should mean that it will take exactly as many blocks as before, before you see a problem... SHA-1 is not laughably easy to break by any stretch of the imagination, but it also isn't the "impossibly hard" you'd get in the random oracle model either. (While nitpicking about the analysis, I feel compelled to add that hardware uncorrectable bitflips are still reported as erasures, whereas venti collisions are reported as success and only caught if somebody's doing checksumming at larger granularity. So at least in the one case you know you're in trouble. Collisions in venti act as if the disk corrupted so many bits that we move into the correctable region for a different codeword.) --nwf; pgpKRQORZwz0C.pgp Description: PGP signature
Re: [9fans] In case anyone worries about block hash collision in venti
About a year ago i wrote a (kind of vapourware) backup system called Baccus, based on content addressed storage. Most ideas are stolen from Plan9/venti, but for the here discussed reasons i used the Salsa family of hashes from Dan Bernstein: http://cr.yp.to/chacha.html Respectively the Rumba-"compression": http://cr.yp.to/rumba20.html I combined hashing and encryption with Salsa/Rumba into one step. The hash function in Baccus is pluggable, so the user could decide which to use and would be able to upgrade to a stronger hash. Maybe pluggability of the hash function would be a nice addition to venti (if it is not there anyways). Also Salsa should be considered a valuable addition to Plan9. Regards, Jorge-León P.S.: Here is the link to Baccus: http://wiki.tcl.tk/23064, but beware: it is in a bad state and style. Didn't have time to improve since then. If you still want to look at it, start with reading the CREDITS file. PS2: You need at least eight rounds, else you get lots of hash-collisions. Tim Newsham wrote: 1. the sender can't control email headers. many transfer agents add a random transfer-id which would confound this attack. If you know the size of the transfer id, you can pad out to the next full block size. 2. if the rcpt uses mbox format, the sender can't control how your message is fit into venti blocks. the sender would need to control the entire mail box. I'm ignorant on this front. 3. http://en.wikipedia.org/wiki/SHA_hash_functions says that there have been no SHA1 collisions found. IIUC there has been significant progress in attacking all major hash functions and the cryptographic community has low confidence in all major hash functions at the moment. Some hash algorithms have more serious attacks than others, but once a few weaknesses are found its usually an indication that the algorithm will fall soon. Re: SHA1, it looks like the strenght has been whittled down to around 2^52 operations: http://www.schneier.com/blog/archives/2009/06/ever_better_cry.html I'm not saying that there is a viable attack against your SHA-indexed venti right now. I'm saying that its bunk to evaluate the storage system simply on how likely it is for a random collision to occur. The proper analysis is how hard it is for a malicious attacker to cause a collision now and in the near future. - erik Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com
Re: [9fans] In case anyone worries about block hash collision in venti
not only has someone got to find a collision during a tiny timeframe, they also have to fit it in 8k
Re: [9fans] In case anyone worries about block hash collision in venti
1. the sender can't control email headers. many transfer agents add a random transfer-id which would confound this attack. If you know the size of the transfer id, you can pad out to the next full block size. 2. if the rcpt uses mbox format, the sender can't control how your message is fit into venti blocks. the sender would need to control the entire mail box. I'm ignorant on this front. 3. http://en.wikipedia.org/wiki/SHA_hash_functions says that there have been no SHA1 collisions found. IIUC there has been significant progress in attacking all major hash functions and the cryptographic community has low confidence in all major hash functions at the moment. Some hash algorithms have more serious attacks than others, but once a few weaknesses are found its usually an indication that the algorithm will fall soon. Re: SHA1, it looks like the strenght has been whittled down to around 2^52 operations: http://www.schneier.com/blog/archives/2009/06/ever_better_cry.html I'm not saying that there is a viable attack against your SHA-indexed venti right now. I'm saying that its bunk to evaluate the storage system simply on how likely it is for a random collision to occur. The proper analysis is how hard it is for a malicious attacker to cause a collision now and in the near future. - erik Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com
Re: [9fans] In case anyone worries about block hash collision in venti
On Sun, Feb 07, 2010 at 12:44:52PM -0500, erik quanstrom wrote: > 1. the sender can't control email headers. many > transfer agents add a random transfer-id which > would confound this attack. > > 2. if the rcpt uses mbox format, the sender can't > control how your message is fit into venti blocks. > the sender would need to control the entire > mail box. Fine, so he sends the evil document as a MIME attachment and you decode it into its own file to see what it is, just as fossil takes its nightly snapshot and flings data off to venti. > 3. http://en.wikipedia.org/wiki/SHA_hash_functions > says that there have been no SHA1 collisions found. Up until relatively recently, that would have been true for MD5 as well. --nwf; pgpL8HYaPnHAN.pgp Description: PGP signature
Re: [9fans] In case anyone worries about block hash collision in venti
The e-mail trick is just an example, but the scenario is still valid. Consider an alternative scenario where an attacker is able to upload files to your server (perhaps jpg, gif, etc) via a web application or FTP server. Or perhaps, if someone were able to contribute source or a tarball by uploading it to your server, this would be an issue. Also, if a Postfix/etc server is misconfigured (or one were to be set up by the attacker) they would have far more control of the SMTP headers than you may realize. This would give them the ability to reliably predict the rest of the headers stored on disk. Especially if they've been able to see the headers from an e-mail you've previously sent through the same network. D On Sun, Feb 7, 2010 at 10:44 AM, erik quanstrom wrote: >> OK, lets assume that the attacker has the most powerful attack >> against a hash available in which he can construct a garbage >> block of data (perhaps with some control of its content) that >> hashes to a value of his choosing. Now he predicts some data >> that is likely to be written to your filesystem soon (say a >> brand knew pull update that you havent pulled yet), makes >> an email that has a data block in it that collides with that >> block, sends that email to you. Your filesystem stores it. >> Later you do a pull and venti notices that you don't have to >> store one of the blocks because it already has a block stored >> with that same hash. Now one of your files is corrupt. > > small problems with this: > > 1. the sender can't control email headers. many > transfer agents add a random transfer-id which > would confound this attack. > > 2. if the rcpt uses mbox format, the sender can't > control how your message is fit into venti blocks. > the sender would need to control the entire > mail box. > > 3. http://en.wikipedia.org/wiki/SHA_hash_functions > says that there have been no SHA1 collisions found. > > - erik > >
Re: [9fans] In case anyone worries about block hash collision in venti
> OK, lets assume that the attacker has the most powerful attack > against a hash available in which he can construct a garbage > block of data (perhaps with some control of its content) that > hashes to a value of his choosing. Now he predicts some data > that is likely to be written to your filesystem soon (say a > brand knew pull update that you havent pulled yet), makes > an email that has a data block in it that collides with that > block, sends that email to you. Your filesystem stores it. > Later you do a pull and venti notices that you don't have to > store one of the blocks because it already has a block stored > with that same hash. Now one of your files is corrupt. small problems with this: 1. the sender can't control email headers. many transfer agents add a random transfer-id which would confound this attack. 2. if the rcpt uses mbox format, the sender can't control how your message is fit into venti blocks. the sender would need to control the entire mail box. 3. http://en.wikipedia.org/wiki/SHA_hash_functions says that there have been no SHA1 collisions found. - erik
Re: [9fans] In case anyone worries about block hash collision in venti
Sorry, this is all bunk. You shouldn't be worried about an accidental collision. You should be worried about an intentional collision. Especially if your filesystem stores data that is under the attackers control such as email messages, web page caches, etc. So what you need to analyze isn't how often an accidental collision happens but how hard it is to create an intentional collision. All the popular hash algorithms have been losing ground to attackers lately. can you make this a little more concrete? i'm having trouble understanding how a email that an attacker controls is a problem. assuming the attacker can predict the headers add well enough, this implies that the attacker, given access to your venti, can retrieve an email said attacker sent. where's the problem? i don't see it yet. OK, lets assume that the attacker has the most powerful attack against a hash available in which he can construct a garbage block of data (perhaps with some control of its content) that hashes to a value of his choosing. Now he predicts some data that is likely to be written to your filesystem soon (say a brand knew pull update that you havent pulled yet), makes an email that has a data block in it that collides with that block, sends that email to you. Your filesystem stores it. Later you do a pull and venti notices that you don't have to store one of the blocks because it already has a block stored with that same hash. Now one of your files is corrupt. Now in actuality an attacker probably doesn't have this strong of an attack against your hash right now. But he might have much weaker attacks that he can use creatively to cause some collisions that lead to corruption of data. These attacks would be much harder, but with enough creativity you can do some intersting things. For example, see: http://www.win.tue.nl/hashclash/rogue-ca/ - erik Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com
Re: [9fans] In case anyone worries about block hash collision in venti
> Sorry, this is all bunk. You shouldn't be worried about > an accidental collision. You should be worried about > an intentional collision. Especially if your filesystem > stores data that is under the attackers control such as > email messages, web page caches, etc. So what you need > to analyze isn't how often an accidental collision happens > but how hard it is to create an intentional collision. > All the popular hash algorithms have been losing ground to > attackers lately. can you make this a little more concrete? i'm having trouble understanding how a email that an attacker controls is a problem. assuming the attacker can predict the headers add well enough, this implies that the attacker, given access to your venti, can retrieve an email said attacker sent. where's the problem? i don't see it yet. - erik
Re: [9fans] In case anyone worries about block hash collision in venti
http://www.c0t0d0s0.org/archives/6349-Perceived-Risk.html Sorry, this is all bunk. You shouldn't be worried about an accidental collision. You should be worried about an intentional collision. Especially if your filesystem stores data that is under the attackers control such as email messages, web page caches, etc. So what you need to analyze isn't how often an accidental collision happens but how hard it is to create an intentional collision. All the popular hash algorithms have been losing ground to attackers lately. The simple solution is to use a keyed hash rather than an unkeyed one and keep the key secret from potential attackers. Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com
[9fans] In case anyone worries about block hash collision in venti
http://www.c0t0d0s0.org/archives/6349-Perceived-Risk.html