> On Feb 15, 2021, at 9:18 AM, Joerg Sonnenberger <jo...@bec.de> wrote: > > Hello all, > to help the review process along, here are the rough steps I see in > preparation for supporting 256bit hashes: > > (1) Move the current 160bit constants from mercurial.node into a > subclass. Instead of a global constant, derive the correct constant from > the repo/revlog/... instance and pass it down as necessary. The API > change itself is in D9750. The expectation for this step is that a > repository has one hash size and one set of magic values, but it doesn't > change anything regarding the hash function itself. A follow-up change > is necessary to replace the global constants (approximately D9465 minus > D9750).
I was somewhat assuming we’d alter various codepaths to always emit 256-bit hashes, even if they end in all NUL. Your way sounds a little more complicated but is also fine, I don’t feel strongly. > > (2) Adjust various on-disk formats to switch between the current 160bit > and 256bit encoding based on the node constants in use. This would be a > non-functional change for existing repositories. Are any on-disk formats not already using 256-bit storage? I know revlogs are, so I _think_ this is only going to matter for caches. > > (3) Introduce the tagged 256(*) hash function. My plan here is to use > Blake2b configured for 248bit output and a suffix of b'\x01'. It is a > bit wasteful to reserve 8bit for the tag, but simplifies code. Biggest > downside is that the full Blake2b support is not available in Python 2. Honestly I think new hash functions is exactly the kind of thing we should gate on Python 3. If someone is _really_ enthusiastic they can write a backport extension or something, for the users that are (bafflingly) caring about modern hashes but stuck on an ancient Python. > > The tag would allow different hash functions to co-exist and embed > existing SHA1 hashes by zero padding. > > (4) Adjust hash verification logic to derive the hash function from the > tag of a node, not just hard-coding it. > > At the end of step 4, most repositories can be converted in a mostly > transparent way. Notably, you can allow people to only upgrade new hashes if they’re so inclined, which lets you preserve gpg signatures etc. > Some additional changes might be necessary for allowing > "short" node ids for things like .hgtags, but overall, existing hashes > should just continue to work as before. Overall +1. We can arm-wrestle later about if allowing a “new commits are blake2b” mode (vs convert-the-repo mode) is reasonable, I don’t think it’ll take a ton of code either way. One request: I think we should reserve a couple of suffixes (0xff and 0xfe, perhaps?) for “private use” - this would be useful for large installations that do strange things with hashing out of necessity. Sorry for taking so long to respond to this - this is well thought out and I was just too busy with other work stuff for a couple weeks straight. > > Joerg > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel