On 9/8/2018 2:29 AM, Martin Ågren wrote:
On Fri, 7 Sep 2018 at 22:24, Ben Peart <peart...@gmail.com> wrote:
Ben Peart <benpe...@microsoft.com> writes:

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

The purpose of the SHA isn't to detect disk corruption (we already have
a SHA for the entire index that can serve that purpose) but to help
ensure that this was actually a valid EOIE extension and not a lucky
random set of bytes. [...]

+#define EOIE_SIZE 24 /* <4-byte offset> + <20-byte hash> */

+    the_hash_algo->init_fn(&c);
+    while (src_offset < mmap_size - the_hash_algo->rawsz - 
EOIE_SIZE_WITH_HEADER) {
[...]
+    the_hash_algo->final_fn(hash, &c);
+    if (hashcmp(hash, (unsigned char *)index))
+            return 0;
+
+    /* Validate that the extension offsets returned us back to the eoie 
extension. */
+    if (src_offset != mmap_size - the_hash_algo->rawsz - EOIE_SIZE_WITH_HEADER)
+            return 0;

Besides the issue you and Junio discussed with "should we document this
as being SHA-1 or NewHash" (or "the hash algo"), it seems to me that
this implementation is living somewhere between using SHA-1 and "the
hash algo". The hashing uses `the_hash_algo`, but the hash size is
hardcoded at 20 bytes.

Maybe it all works out, e.g., so that when someone (brian) merges a
NewHash and runs the testsuite, this will fail consistently and in a
safe way. But I wonder if it would be too hard to avoid the hardcoded 24
already now.

Martin


I can certainly change this to be:

#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ)

which should (hopefully) make it easier to find this hard coded hash length in the sea of hard coded "20" and "160" (bits) littered through the codebase.

Reply via email to