Re: [PATCH v4] technical doc: add a design doc for hash function transition

Junio C Hamano Fri, 29 Sep 2017 01:10:29 -0700

Junio C Hamano <[email protected]> writes:

> Or perhaps we could.  There is nothing that says a signed tag
> created in the SHA-1 world must have the PGP/SHA-1 signature in the
> NewHash payload---it could be split off of the object data and
> stored in a local metadata cache, to be used only when we need to
> convert it back to the SHA-1 world.
> ...
>> +The format allows round-trip conversion between newhash-content and
>> +sha1-content.
>
> If it is a goal to eventually be able to lose SHA-1 compatibility
> metadata from the objects, then we might want to remove SHA-1 based
> signature bits (e.g. PGP trailer in signed tag, gpgsig header in the
> commit object) from NewHash contents, and instead have them stored
> in a side "metadata" table, only to be used while converting back.
> I dunno if that is desirable.


Let's keep it simple by ignoring all of the above.  Even though
leaving the sha1-gpgsig and other crufts would etch these
compatibility metadata in objects forever, these remain only in
objects that originate from SHA-1 world, or in objects created in
the NewHash world only while the project participants still care
about SHA-1 compatibility.  Strictly speaking, it would be super
nice if we can do without contaminating these newly created objects
with SHA-1 compatibility headers, just like we wish to be able to
drop the SHA-1 vs NewHash mapping table after projects participants
stop careing about SHA-1 compatiblity, it may not be worth it.  Of
course, if we decide to spend a bit more brain cycle to design how
we push these out of the object proper, the same solution would
automatically allow us to omit SHA-1 compatibility headers from the
objects that were converted from SHA-1 world.
>
>> +  - A table of 4-byte CRC32 values of the packed object data, in the
>> +    order that the objects appear in the pack file. This is to allow
>> +    compressed data to be copied directly from pack to pack during
>> +    repacking without undetected data corruption.
>
> An obvious alternative would be to have the CRC32 checksum near
> (e.g. immediately before) the object data in the packfile (as
> opposed to the .idx file like this document specifies).  I am not
> sure what the pros and cons are between the two, though, and that is
> why I mention the possiblity here.
>
> Hmm, as the corresponding packfile stores object data only in
> NewHash content format, it is somewhat curious that this table that
> stores CRC32 of the data appears in the "Tables for each object
> format" section, as they would be identical, no?  Unless I am
> grossly misleading the spec, the checksum should either go outside
> the "Tables for each object format" section but still in .idx, or
> should be eliminated and become part of the packdata stream instead,
> perhaps?

Thinking about this a bit more, I think a single table per .idx file
would be the right way to go, not a checksum immediately after or
before the object data that is embedded in the pack stream.  In the
NewHash world (after this initial migration), we would want to be
able to stream NewHash packstream that comes from the network
straight to disk, which would mean these in-line CRC32 data would
need to be sent over the wire (i.e. 4-byte per object sent); that is
an unneeded overhead, as the packstream has its trailing checksum to
protect the whole thing anyway.

Re: [PATCH v4] technical doc: add a design doc for hash function transition

Reply via email to