On Fri, Aug 3, 2018 at 9:40 AM Junio C Hamano <gits...@pobox.com> wrote:
>
> > [...]
> >> -  - 20-byte NewHash checksum of all of the above.
> >> +  - 20-byte SHA-256 checksum of all of the above.
> >
> > Likewise.
>
> Hmph, I've always assumed since NewHash plan was written that this
> part was not about tamper resistance but was about bit-flipping
> detection and it was deliberate to stick to 20-byte sum, truncating
> as necessary.

Yeah, in fact, since this was one area where people actually had
performance issues with the hash, it might be worth considering
_weakening_ the hash.

Things like the pack index files (and just the regular file index) are
entirely local, and the SHA1 in those is really just a fancy CRC. It's
really just "good protection against disk corruption" (it happens),
and in fact you cannot use it as protection against active tampering,
since there's no secret there and any active attacker that has access
to your local filesystem could just recompute the hash anyway.

And because they are local anyway and aren't really transported
(modulo "shared repositories" where you use them across users or
legacy rsync-like synchronization), they can be handled separately
from any hashing changes. The pack and index file formats have in fact
been changed before.

It might make sense to either keep it as SHA1 (just to minimize any
changes) or if there are still issues with index file performance it
could even be made to use something fast-but-not-cryptographic like
just making it use crc32().

Remember: one of the original core git design requirements was
"corruption detection".

For normal hashed objects, that came naturally, and the normal object
store additionally has active tamper protection thanks to the
interconnected nature of the hashes and the distribution of the
objects.

But for things like packfiles and the file index, it is just a
separate checksum. There is no tamper protection anyway, since anybody
who can modify them directly can just recompute the hash checksum.

The fact that both of these things used SHA1 was more of a convenience
issue than anything else, and the pack/index file checksum is
fundamentally not cryptographic (but a cryptographic hash is obviously
by definition also a very good corruption detector).

               Linus

Reply via email to