On 11 May 2018 at 23:15, Derrick Stolee <[email protected]> wrote:
> When running 'git commit-graph verify', compare the contents of the
> commits that are loaded from the commit-graph file with commits that are
> loaded directly from the object database. This includes checking the
> root tree object ID, commit date, and parents.
>
> Parse the commit from the graph during the initial loop through the
> object IDs to guarantee we parse from the commit-graph file.
>
> In addition, verify the generation number calculation is correct for all
> commits in the commit-graph file.
>
> While testing, we discovered that mutating the integer value for a
> parent to be outside the accepted range causes a segmentation fault. Add
> a new check in insert_parent_or_die() that prevents this fault. Check
> for that error during the test, both in the typical parents and in the
> list of parents for octopus merges.
This paragraph and the corresponding fix and test feel like a separate
patch to me. (The commit message of it could be "To test the next patch,
we threw invalid data at `git commit-graph verify, and it crashed in
pre-existing code, so let's fix that first" -- there is definitely a
connection.) Is this important enough to fast-track to master in time
for 2.18? My guess would be no.
> +
> + if (pos >= g->num_commits)
> + die("invalide parent position %"PRIu64, pos);
s/invalide/invalid/
> @@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g)
> return 1;
>
> for (i = 0; i < g->num_commits; i++) {
> + struct commit *graph_commit;
> +
> hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
>
> if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
> @@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g)
>
> cur_fanout_pos++;
> }
> +
> + graph_commit = lookup_commit(&cur_oid);
> + if (!parse_commit_in_graph_one(g, graph_commit))
> + graph_report("failed to parse %s from commit-graph",
> oid_to_hex(&cur_oid));
> }
Could this end up giving ridiculous amounts of output? It would depend
on the input, I guess.
> while (cur_fanout_pos < 256) {
> @@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g)
> cur_fanout_pos++;
> }
>
> + if (verify_commit_graph_error)
> + return 1;
Well, here we give up before running into *too* much problem.
> + for (i = 0; i < g->num_commits; i++) {
> + struct commit *graph_commit, *odb_commit;
> + struct commit_list *graph_parents, *odb_parents;
> + int num_parents = 0;
> +
> + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> +
> + graph_commit = lookup_commit(&cur_oid);
> + odb_commit = (struct commit *)create_object(cur_oid.hash,
> alloc_commit_node());
> + if (parse_commit_internal(odb_commit, 0, 0)) {
> + graph_report("failed to parse %s from object
> database", oid_to_hex(&cur_oid));
> + continue;
> + }
> +
> + if (oidcmp(&get_commit_tree_in_graph_one(g,
> graph_commit)->object.oid,
> + get_commit_tree_oid(odb_commit)))
> + graph_report("root tree object ID for commit %s in
> commit-graph is %s != %s",
> + oid_to_hex(&cur_oid),
> +
> oid_to_hex(get_commit_tree_oid(graph_commit)),
> +
> oid_to_hex(get_commit_tree_oid(odb_commit)));
> +
> + if (graph_commit->date != odb_commit->date)
> + graph_report("commit date for commit %s in
> commit-graph is %"PRItime" != %"PRItime"",
> + oid_to_hex(&cur_oid),
> + graph_commit->date,
> + odb_commit->date);
> +
> +
> + graph_parents = graph_commit->parents;
> + odb_parents = odb_commit->parents;
> +
> + while (graph_parents) {
> + num_parents++;
> +
> + if (odb_parents == NULL)
> + graph_report("commit-graph parent list for
> commit %s is too long (%d)",
> + oid_to_hex(&cur_oid),
> + num_parents);
> +
> + if (oidcmp(&graph_parents->item->object.oid,
> &odb_parents->item->object.oid))
> + graph_report("commit-graph parent for %s is
> %s != %s",
> + oid_to_hex(&cur_oid),
> +
> oid_to_hex(&graph_parents->item->object.oid),
> +
> oid_to_hex(&odb_parents->item->object.oid));
> +
> + graph_parents = graph_parents->next;
> + odb_parents = odb_parents->next;
> + }
> +
> + if (odb_parents != NULL)
> + graph_report("commit-graph parent list for commit %s
> terminates early",
> + oid_to_hex(&cur_oid));
> +
> + if (graph_commit->generation) {
> + uint32_t max_generation = 0;
> + graph_parents = graph_commit->parents;
> +
> + while (graph_parents) {
> + if (graph_parents->item->generation ==
> GENERATION_NUMBER_ZERO ||
> + graph_parents->item->generation ==
> GENERATION_NUMBER_INFINITY)
> + graph_report("commit-graph has valid
> generation for %s but not its parent, %s",
> + oid_to_hex(&cur_oid),
> +
> oid_to_hex(&graph_parents->item->object.oid));
> + if (graph_parents->item->generation >
> max_generation)
> + max_generation =
> graph_parents->item->generation;
> + graph_parents = graph_parents->next;
> + }
> +
> + if (max_generation == GENERATION_NUMBER_MAX)
> + max_generation--;
I'm not too familiar with these concepts. Is this a trick in preparation
for this:
> +
> + if (graph_commit->generation != max_generation + 1)
Any way that could give a false negative? (I'm not sure it would matter
much.) Maybe "if (!MAX && generation != max + 1)".
> + graph_report("commit-graph has incorrect
> generation for %s",
> + oid_to_hex(&cur_oid));
> + } else {
> + graph_parents = graph_commit->parents;
> +
> + while (graph_parents) {
> + if (graph_parents->item->generation)
> + graph_report("commit-graph has
> generation ZERO for %s but not its parent, %s",
> + oid_to_hex(&cur_oid),
> +
> oid_to_hex(&graph_parents->item->object.oid));
> + graph_parents = graph_parents->next;
> + }
> + }
> + }
> +
> return verify_commit_graph_error;
> }
At this point, I should admit that I went through the above thinking
"right, makes sense, ok, sure". I was not really going "hmm, I wonder
..." This looks like the real meat of "verify", and I'll try to look it
over with a fresh pair of eyes tomorrow.
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> + corrupt_data $objdir/info/commit-graph 1134 "\01" &&
> + corrupt_data $objdir/info/commit-graph 1312 "\01" &&
> + corrupt_data $objdir/info/commit-graph 1332 "\01" &&
> + corrupt_data $objdir/info/commit-graph 1712 "\01" &&
> + corrupt_data $objdir/info/commit-graph 1340 "\01" &&
> + corrupt_data $objdir/info/commit-graph 1344 "\01" &&
Could you document these numbers somehow? (Maybe even calculate them
from constant inputs, although that might be a form of premature
optimization.) When some poor soul has to derive the corresponding
numbers for a commit-graph with NewHash, they will thank you.
Martin