On Wed, Apr 25, 2018 at 8:49 PM, Martin Ågren <martin.ag...@gmail.com> wrote:
>> I agree that pack v2 is not going to have anything but SHA-1.  However,
>> writing all the code such that it's algorithm agnostic means that we can
>> do testing of new algorithms by wholesale replacing the algorithm with a
>> new one, which simplifies things considerably.
>
> Ok. I do sort of wonder if a "successful" test run after globally
> substituting Hash-Foo for SHA-1 (regardless of whether the size changes
> or not) hints at a problem. That is, nowhere do we test that this code
> uses 20-byte SHA-1s, regardless of what other hash functions are
> available and configured. Of course, until soon, that did not really
> have to be tested since there was only one hash function available to
> choose from. As for identifying all the places that matter ... no idea.
>
> Of course I can see how this helps get things to a point where Git does
> not crash and burn because the hash has a different size, and where the
> test suite doesn't spew failures because the initial chaining value of
> "SHA-1" is changed.
>
> Once that is accomplished, I sort of suspect that this code will want to
> be updated to not always blindly use the_hash_algo, but to always work
> with SHA-1 sizes. Or rather, this would turn into more generic code to
> handle both "v2 with SHA-1" and "v3 with some hash function(s)". This
> commit might be a good first step in that direction.

I also have an uneasy feeling when things this close to on-disk file
format get hash-agnostic treatment. I think we would need to start
adding new file formats soon, from bottom up with simple things like
loose object files (cat-file and hash-object should be enough to test
blobs...), then moving up to pack files and more. This is when we can
really decide where to use the new hash and whether we should keep
some hashes as sha-1.

For trailing hashes for example, there's no need to move to a new hash
which only costs us more cycles. We just use it as a fancy checksum to
avoid bit flips. But then my assumption about cost may be completely
wrong without experimenting.

> Long rambling short, yeah, I see your point.

So yeah. It may be ok to move everything to "new hash" now. But we
need a closer look soon.
-- 
Duy

Reply via email to