On 5/30/2018 6:24 PM, Jakub Narebski wrote:
Derrick Stolee <dsto...@microsoft.com> writes:

The 'verify' subcommand must compare the commit content parsed from the
commit-graph and compare it against the content in the object database.
You have "compare" twice in the above sentence.

Use lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.
All right, that looks like a nice extension of what was done in previous
patch.  We want to check that cache (serialized commit graph) matches
reality (object database).

Add checks for the root tree OID.
All right; isn't it that now we check almost all information from
commit-graph that hs match in object database (with exception of commit
parents, I think).

Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  commit-graph.c          | 17 ++++++++++++++++-
  t/t5318-commit-graph.sh |  7 +++++++
  2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0420ebcd87..19ea369fc6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -880,6 +880,8 @@ int verify_commit_graph(struct commit_graph *g)
                return verify_commit_graph_error;
NOTE: we will be checking Commit Data chunk; I think it would be good
idea to verify that size of Commit Data chunk matches (N * (H + 16) bytes)
that format gives us, so that we don't accidentally red outside of
memory if something got screwed up (like number of commits being wrong,
or file truncated).

This is actually how we calculate 'num_commits' during load_commit_graph_one():

    if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
    {
        graph->num_commits = (chunk_offset - last_chunk_offset)
                             / graph->hash_len;
    }

So, if the chunk doesn't match N*(H+16), we detect this because FANOUT[255] != N.

(There is one caveat here: (chunk_offset - last_chunk_offset) may not be a multiple of hash_len, and "accidentally" truncate to N in the division. I'll add more careful checks for this.)

We also check out-of-bounds offsets in that method.


for (i = 0; i < g->num_commits; i++) {
+               struct commit *graph_commit;
+
                hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -897,6 +899,11 @@ int verify_commit_graph(struct commit_graph *g)
cur_fanout_pos++;
                }
+
+               graph_commit = lookup_commit(&cur_oid);
So now I see why we add all commits to memory (to hash structure).

+               if (!parse_commit_in_graph_one(g, graph_commit))
+                       graph_report("failed to parse %s from commit-graph",
+                                    oid_to_hex(&cur_oid));
All right, this verifies that commit in OID Lookup chunk has parse-able
data in Commit Data chunk, isn't it?

        }
while (cur_fanout_pos < 256) {
@@ -913,16 +920,24 @@ int verify_commit_graph(struct commit_graph *g)
                return verify_commit_graph_error;
for (i = 0; i < g->num_commits; i++) {
-               struct commit *odb_commit;
+               struct commit *graph_commit, *odb_commit;
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + graph_commit = lookup_commit(&cur_oid);
                odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
All right, so we have commit data from graph, and commit data from the
object database.

                if (parse_commit_internal(odb_commit, 0, 0)) {
                        graph_report("failed to parse %s from object database",
                                     oid_to_hex(&cur_oid));
                        continue;
                }
+
+               if (oidcmp(&get_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+                          get_commit_tree_oid(odb_commit)))
+                       graph_report("root tree OID for commit %s in commit-graph is 
%s != %s",
+                                    oid_to_hex(&cur_oid),
+                                    
oid_to_hex(get_commit_tree_oid(graph_commit)),
+                                    
oid_to_hex(get_commit_tree_oid(odb_commit)));
Maybe explicitly say that it doesn't match the value from the object
database; OTOH this may be too verbose.

        }
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 996a016239..21cc8e82f3 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
  GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
  GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
  GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 
+ 10`
+GRAPH_COMMIT_DATA_OFFSET=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
$NUM_COMMITS`
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
All right, so the first is entry into record in Commit Data chunk, and
the latter points into tree entry in this record -- which entry is first
field:

   Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
     * The first H bytes are for the OID of the root tree.

# usage: corrupt_graph_and_verify <position> <data> <string>
  # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
                "from object database"
  '
+test_expect_success 'detect incorrect tree OID' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+               "root tree OID for commit"
+'
All right.

I wonder if we can create a test for first check added (that Commit
Graph data parses correctly), that is the one with the following error
message:

   "failed to parse <OID> from commit-graph file".

+
  test_done
Regards,

Reply via email to