On Jan 24, 2008, at 11:32 AM, David Roundy wrote: > All data in the hashed format is hashed. Darcs doesn't implement any > checking of signatures, but you could (relatively) easily do so by > hand. Just sign _darcs/hashed_inventory, and if the signature is > valid and the repository is consistent (which darcs automatically > checks for any portion of the repository that it accesses), then the > repository hasn't been tampered with (since it was signed, anyhow). > > As far as what the guarantee is, all contents of the repository > (except _darcs/prefs/ and of course the working directory) are > accessed by hashes stored in that one file.
So, let me see if I understand the issues here. 1. Darcs needs to have short identifiers for a patch. Is it required, or merely tolerable that the same patch (contents and metadata) yield the same identifier? Or is it a bug, in which the independent generation of an identical patch will result in undefined behavior? Let's call this property "patch id determined by patch". 2. It would be nice, but isn't currently used, if one could rely on the property that for a given patch id, nobody can come up with another patch that has the same id. This would be nice because in the future we might use this to securely identify patches and entire repositories. This property is called "second pre-image resistance". 3. Likewise, it would be nice if it were impossible for someone to come up with two different patches that had the same patch id. This is called "collision resistance". 4. Someday, someone might want to expose their patch ids without also thus exposing information about their patches. This is called "pre-image resistance". 5. Inasmuch as anyone might want to rely on any of these properties, they might want to do so years down the road. They might want to take darcs patches or darcs repositories that were generated in 2008, and rely on these properties in 2015 or even later. Now it is important to note that SHA-1 does *not* have collision resistance today, and there is reason to suspect that it doesn't have other security properties either. The more years into the future we look, the less likely it is that the user will be willing to rely on SHA-1. Let's enumerate a few options for what darcs can do now: 1. Use randomly generated patch identifiers. This means that the identifiers don't have any of the properties described, but they are easy to implement, fast to produce (practically instantaneous), and at least we know that they don't have these properties so we don't accidentally rely on them. 2. Use a CRC like ZFS does. This means that identifiers have "patch id determined by patch", but don't have the other properties. Again, this is easy to implement, fast to compute (let's say around 1 GiB/ sec), and has the virtue of not fooling us into thinking we have a security guarantee that we don't. 3. Use MD5. This is easy to implement (because MD5 is widely used), fast to compute (370 MiB/sec), and has the dubious virtue of being mostly well understood as being insecure. 4. Use SHA-1. This is easy to implement (because SHA-1 is widely used), a bit slower to compute (220 MiB/sec), and has the regrettable misfeature of being widely misunderstood (e.g. by Linus) as being secure. 5. Use SHA-256. This is hard to implement (I'm assuming there aren't already Haskell bindings for it), even slower to compute (110 MiB/sec), but it is the first of these options has a chance of allowing people in the future to rely on the security properties. 6. Use Tiger. This is hard to implement (I'm assuming there aren't already Haskell bindings for it), relatively fast (320 MiB/sec), and might or might not offer security ten or twenty years from now. I must say that option 4 is the worst of the bunch. Except for the implementation difficulty (of which I know little), any other option would provide a better trade-off. Regards, Zooko _______________________________________________ darcs-devel mailing list darcs-devel@darcs.net http://lists.osuosl.org/mailman/listinfo/darcs-devel