On Fri, Feb 26, 2021 at 10:49:45PM -0500, Augie Fackler wrote:
> 
> 
> > On Feb 15, 2021, at 9:18 AM, Joerg Sonnenberger <jo...@bec.de> wrote:
> > 
> > Hello all,
> > to help the review process along, here are the rough steps I see in
> > preparation for supporting 256bit hashes:
> > 
> > (1) Move the current 160bit constants from mercurial.node into a
> > subclass. Instead of a global constant, derive the correct constant from
> > the repo/revlog/... instance and pass it down as necessary. The API
> > change itself is in D9750. The expectation for this step is that a
> > repository has one hash size and one set of magic values, but it doesn't
> > change anything regarding the hash function itself. A follow-up change
> > is necessary to replace the global constants (approximately D9465 minus
> > D9750).
> 
> I was somewhat assuming we’d alter various codepaths to always emit
> 256-bit hashes, even if they end in all NUL. Your way sounds a little
> more complicated but is also fine, I don’t feel strongly.

I was considering it. It is harder to ensure that nothing breaks as we
would have to fix all external interactions at the same time from
output commands to network IO. Keeping the code parts somewhat separate
has the advantage that we can sort out the new hash first and the compat
migration as second step without having to worry about compat as much.

> > 
> > (2) Adjust various on-disk formats to switch between the current 160bit
> > and 256bit encoding based on the node constants in use. This would be a
> > non-functional change for existing repositories.
> 
> Are any on-disk formats not already using 256-bit storage?
> I know revlogs are, so I _think_ this is only going to matter for caches.

It's a mixed bag. The textual formats are fine, revlog is fine. Most of
the ad-hoc formats are 160bit only.

> > 
> > (3) Introduce the tagged 256(*) hash function. My plan here is to use
> > Blake2b configured for 248bit output and a suffix of b'\x01'. It is a
> > bit wasteful to reserve 8bit for the tag, but simplifies code. Biggest
> > downside is that the full Blake2b support is not available in Python 2.
> 
> Honestly I think new hash functions is exactly the kind of thing we
> should gate on Python 3. If someone is _really_ enthusiastic they can
> write a backport extension or something, for the users that are
> (bafflingly) caring about modern hashes but stuck on an ancient Python.

Yeah, I'm mostly fine with not supporting Python 2 here. Just needs some
small feature testing I think to not mess up the current status.

> > The tag would allow different hash functions to co-exist and embed
> > existing SHA1 hashes by zero padding.
> > 
> > (4) Adjust hash verification logic to derive the hash function from the
> > tag of a node, not just hard-coding it.
> > 
> > At the end of step 4, most repositories can be converted in a mostly
> > transparent way.
> 
> Notably, you can allow people to only upgrade new hashes if they’re so
> inclined, which lets you preserve gpg signatures etc.

Right, migration will not be forced in the near future. Maybe in a few
years, but not now.

> > Some additional changes might be necessary for allowing
> > "short" node ids for things like .hgtags, but overall, existing hashes
> > should just continue to work as before.
> 
> Overall +1. We can arm-wrestle later about if allowing a “new commits
> are blake2b” mode (vs convert-the-repo mode) is reasonable, I don’t
> think it’ll take a ton of code either way.
> 
> One request: I think we should reserve a couple of suffixes (0xff and
> 0xfe, perhaps?) for “private use” - this would be useful for large
> installations that do strange things with hashing out of necessity.

All F is currently used as marker entry by the hgtags cache for example,
so yeah, it certainly sounds sensible to reserve certain entries.

Joerg
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to