On Tue, Oct 16, 2012 at 02:27:51PM -0400, Jeff King wrote: > > The one reason why we *might* want to use SHA-3, BTW, is that it is a > > radically different design from SHA-1 and SHA-2. And if there is a > > crypto hash failure which is bad enough that the security of git would > > be affected, there's a chance that the same attack could significantly > > affect SHA-2 as well. The fact that SHA-3 is fundamentally different > > from a cryptographic design perspective means that an attack that > > impacts SHA-1/SHA-2 will not likely impact SHA-3, and vice versa. > > Right. The point of having the SHA-3 contest was that we thought SHA-1's > breakage meant that SHA-2 was going to fall next. But Schneier's > comments before the winners were announced were basically "it turns out > that SHA-2 is not broken like we thought, so there's no reason to ditch > it, and the fact that it is well-studied and well-deployed may mean it's > a good choice". > > So I could go either way. This is not a decision we should make today, > though, so we can wait and see which direction the world goes before > picking an algorithm.
Do you really need to pick an algorithm and go through a full-on flag day ten years down the road all over again? People don't really care that a git revision is actually the hex-encoded SHA1 hash of a tree. They just know it's this long string of "stuff" that uniquely identifies a revison globally somehow. They know if they copy and paste the first few characters of the string there is a small chance two revisions will have the same first few characters, and if they copy and paste the whole string the chance drops to "you're whole dev team will be eaten by wolves in tragic unrelated incidences" unlikely. So why bake in a single algorithm? We'll have to extend the length of a whole revision string anyway - the alternatives start at 256bits - and people are going to want to be able to specify the whole revision string at least sometimes. Once you've gone through that pain, why have to repeat it again in ten years? Let's make revisions be a long but variable length string. A revision by itself is meaningless of course. However if if you know of a repo that contains that revision, you can convert it into something useful, like a commit and associated tree. If you don't know, well, you'd be stuck anyway right now. Now when you push and pull from a remote repo what'll happen is the repo will figure out what type(s) of hash algorithm your client supports. A Git 3000 user with a repo using SHA3072 can talk to a v0.1 client just fine: they send the v0.1 client revisions calculated with an algorithm they support, and when they pull revisions from that repo they calculate new revisions with their preferred algorithm. If they want to do this a lot, they maintain the two sets of digest tables next to each other, with the SHA3072 table marked as preferred, and the rest kept only so pushes and pulls can be fast. In most cases a project will convert to one hash algorithm, but by having multi-hash support that conversion doesn't have to be a flag day, and at the same time it's still easy to lookup old revisions by their old digests. Meanwhile the crypto-wonks get to have their fun PGP signing and timestamping long, secure digests. Note that we don't even have to shut out non-upgraded users from participating. Machine-to-machine communication is not a problem as outlined above, but even with stuff like mailing lists we can start passing around concatenated revisions like the following: da39a3ee5e6b4b0d3255bfef95601890afd80709.e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Old users just use the first bit. (the period isn't even required really) If you think that's too long, there's a simple solution that keeps <your requirements>-bit security, albeit one whose implications lead you right to Linus's lines of thinking: da39a3ee5e6b4b0d3255bfef95601890afd80709 That's just a SHA1 again. Of course, if you actually care about this stuff you already have cryptographic infrastructure, and that infrastructure can simply store *trusted* metadata in you're repos saying that the string 'foo' happens to be a valid alias for the actual digest that *the user* can specify instead of that digest. It may even be that for your security needs just timestamping those aliases is enough. Either way while something needs to be calculating secure hashes, and preferably Git mainline so push and pull works without having to examine every last line of code, you can get away without changing the UI very much. Anyway, in the short term the people who care can write parallel digest calculators; I personally have a use-case for one right now. Better code to handle the cases where individual blobs have colliding hashes is required as well in the medium term. Finally those who require it could very well write parallel git's to effectively do the pulling and pushing of their parallel calculated revision hashes if they really wanted too. But if this problem gets to the point where git-core has to change the reality is organizations are not going to be happy if it has to be a big flag day. Git v2/v3 interoperability *will* be implemented. Once we're at that point, let's make sure Git 5.0's big feature isn't SHA5000. -- 'peter'[:-1]@petertodd.org
signature.asc
Description: Digital signature