On 11 May 2018 at 23:15, Derrick Stolee <dsto...@microsoft.com> wrote:
> When running 'git commit-graph verify', compare the contents of the
> commits that are loaded from the commit-graph file with commits that are
> loaded directly from the object database. This includes checking the
> root tree object ID, commit date, and parents.
>
> Parse the commit from the graph during the initial loop through the
> object IDs to guarantee we parse from the commit-graph file.
>
> In addition, verify the generation number calculation is correct for all
> commits in the commit-graph file.
>
> While testing, we discovered that mutating the integer value for a
> parent to be outside the accepted range causes a segmentation fault. Add
> a new check in insert_parent_or_die() that prevents this fault. Check
> for that error during the test, both in the typical parents and in the
> list of parents for octopus merges.

This paragraph and the corresponding fix and test feel like a separate
patch to me. (The commit message of it could be "To test the next patch,
we threw invalid data at `git commit-graph verify, and it crashed in
pre-existing code, so let's fix that first" -- there is definitely a
connection.) Is this important enough to fast-track to master in time
for 2.18? My guess would be no.

> +
> +       if (pos >= g->num_commits)
> +               die("invalide parent position %"PRIu64, pos);

s/invalide/invalid/

> @@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g)
>                 return 1;
>
>         for (i = 0; i < g->num_commits; i++) {
> +               struct commit *graph_commit;
> +
>                 hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
>
>                 if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
> @@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g)
>
>                         cur_fanout_pos++;
>                 }
> +
> +               graph_commit = lookup_commit(&cur_oid);
> +               if (!parse_commit_in_graph_one(g, graph_commit))
> +                       graph_report("failed to parse %s from commit-graph", 
> oid_to_hex(&cur_oid));
>         }

Could this end up giving ridiculous amounts of output? It would depend
on the input, I guess.

>         while (cur_fanout_pos < 256) {
> @@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g)
>                 cur_fanout_pos++;
>         }
>
> +       if (verify_commit_graph_error)
> +               return 1;

Well, here we give up before running into *too* much problem.

> +       for (i = 0; i < g->num_commits; i++) {
> +               struct commit *graph_commit, *odb_commit;
> +               struct commit_list *graph_parents, *odb_parents;
> +               int num_parents = 0;
> +
> +               hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> +
> +               graph_commit = lookup_commit(&cur_oid);
> +               odb_commit = (struct commit *)create_object(cur_oid.hash, 
> alloc_commit_node());
> +               if (parse_commit_internal(odb_commit, 0, 0)) {
> +                       graph_report("failed to parse %s from object 
> database", oid_to_hex(&cur_oid));
> +                       continue;
> +               }
> +
> +               if (oidcmp(&get_commit_tree_in_graph_one(g, 
> graph_commit)->object.oid,
> +                          get_commit_tree_oid(odb_commit)))
> +                       graph_report("root tree object ID for commit %s in 
> commit-graph is %s != %s",
> +                                    oid_to_hex(&cur_oid),
> +                                    
> oid_to_hex(get_commit_tree_oid(graph_commit)),
> +                                    
> oid_to_hex(get_commit_tree_oid(odb_commit)));
> +
> +               if (graph_commit->date != odb_commit->date)
> +                       graph_report("commit date for commit %s in 
> commit-graph is %"PRItime" != %"PRItime"",
> +                                    oid_to_hex(&cur_oid),
> +                                    graph_commit->date,
> +                                    odb_commit->date);
> +
> +
> +               graph_parents = graph_commit->parents;
> +               odb_parents = odb_commit->parents;
> +
> +               while (graph_parents) {
> +                       num_parents++;
> +
> +                       if (odb_parents == NULL)
> +                               graph_report("commit-graph parent list for 
> commit %s is too long (%d)",
> +                                            oid_to_hex(&cur_oid),
> +                                            num_parents);
> +
> +                       if (oidcmp(&graph_parents->item->object.oid, 
> &odb_parents->item->object.oid))
> +                               graph_report("commit-graph parent for %s is 
> %s != %s",
> +                                            oid_to_hex(&cur_oid),
> +                                            
> oid_to_hex(&graph_parents->item->object.oid),
> +                                            
> oid_to_hex(&odb_parents->item->object.oid));
> +
> +                       graph_parents = graph_parents->next;
> +                       odb_parents = odb_parents->next;
> +               }
> +
> +               if (odb_parents != NULL)
> +                       graph_report("commit-graph parent list for commit %s 
> terminates early",
> +                                    oid_to_hex(&cur_oid));
> +
> +               if (graph_commit->generation) {
> +                       uint32_t max_generation = 0;
> +                       graph_parents = graph_commit->parents;
> +
> +                       while (graph_parents) {
> +                               if (graph_parents->item->generation == 
> GENERATION_NUMBER_ZERO ||
> +                                   graph_parents->item->generation == 
> GENERATION_NUMBER_INFINITY)
> +                                       graph_report("commit-graph has valid 
> generation for %s but not its parent, %s",
> +                                                    oid_to_hex(&cur_oid),
> +                                                    
> oid_to_hex(&graph_parents->item->object.oid));
> +                               if (graph_parents->item->generation > 
> max_generation)
> +                                       max_generation = 
> graph_parents->item->generation;
> +                               graph_parents = graph_parents->next;
> +                       }
> +
> +                       if (max_generation == GENERATION_NUMBER_MAX)
> +                               max_generation--;

I'm not too familiar with these concepts. Is this a trick in preparation
for this:

> +
> +                       if (graph_commit->generation != max_generation + 1)

Any way that could give a false negative? (I'm not sure it would matter
much.) Maybe "if (!MAX && generation != max + 1)".

> +                               graph_report("commit-graph has incorrect 
> generation for %s",
> +                                            oid_to_hex(&cur_oid));
> +               } else {
> +                       graph_parents = graph_commit->parents;
> +
> +                       while (graph_parents) {
> +                               if (graph_parents->item->generation)
> +                                       graph_report("commit-graph has 
> generation ZERO for %s but not its parent, %s",
> +                                                    oid_to_hex(&cur_oid),
> +                                                    
> oid_to_hex(&graph_parents->item->object.oid));
> +                               graph_parents = graph_parents->next;
> +                       }
> +               }
> +       }
> +
>         return verify_commit_graph_error;
>  }

At this point, I should admit that I went through the above thinking
"right, makes sense, ok, sure". I was not really going "hmm, I wonder
..." This looks like the real meat of "verify", and I'll try to look it
over with a fresh pair of eyes tomorrow.

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh

> +       corrupt_data $objdir/info/commit-graph 1134 "\01" &&

> +       corrupt_data $objdir/info/commit-graph 1312 "\01" &&

> +       corrupt_data $objdir/info/commit-graph 1332 "\01" &&

> +       corrupt_data $objdir/info/commit-graph 1712 "\01" &&

> +       corrupt_data $objdir/info/commit-graph 1340 "\01" &&
> +       corrupt_data $objdir/info/commit-graph 1344 "\01" &&

Could you document these numbers somehow? (Maybe even calculate them
from constant inputs, although that might be a form of premature
optimization.) When some poor soul has to derive the corresponding
numbers for a commit-graph with NewHash, they will thank you.

Martin

Reply via email to