On 03/02/2015 12:08 PM, Junio C Hamano wrote:
I have a
hazy recollection of what it would take to replace SHA-1 in git with
something else; it should be possible (though tricky) to do it lazily,
where a tree entry has bits (eg, some of the currently unused file
mode bits) to denotes which hash algorithm is in use for the entry.
However I don't think that got past idea stage...
I think one reason why it didn't was because it would not work well.
That "bit that tells this is a new object or old" would mean that a
single tree can have many different object names, depending on which
of its component entries are using that bit and which aren't.  There
goes the "we know two trees with the same object name are identical
without recursing into them" optimization out the window.

Also it would make it impossible to do what you suggest to Joey to
do, i.e. "exactly the same way that git does", once you start saying
that a tree object can be encoded in more than one different ways,
wouldn't it?

I was reasoning that people would rather not have to rewrite their whole history in order to switch checksum algorithms, and that by allowing trees to be lazily converted that this would make things more efficient. However, I think I see your point here that this doesn't work.

However, as a per-commit header, then only first commit which changes the hashing algorithm would have to re-checksum each of the files: but just in the current tree, not all the way back to the beginning of history. The delta logic should not have to care, and these objects with the same content but different object ID should pack perfectly, so long as git-pack-objects knows to re-checksum objects with the available hash algorithms and spot matches.

Other operations like diff which span commit hashing algorithms might be able to get away with their existing object ranking algorithms and cache alternate object IDs for content as they operate to facilitate exact matching across hash algorithm changes.

But actually, for the original problem - just producing a signature with a different hashing algorithm - probably it would be sufficient to just re-hash the current commit and the current tree recursively, and the mixed hash-algorithm case does not need to exist. But I'm just thinking it might not be too hard to make git nicely generic, to be well prepared for when a second pre-image attack on SHA-1 becomes practical.

Sam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to