Re: [PATCH v2 08/12] commit-graph: verify commit contents against odb

2018-05-15 Thread Martin Ågren
On 11 May 2018 at 23:15, Derrick Stolee  wrote:

I finally sat down today and familiarized myself with the commit-graph
code a little. My biggest surprise was when I noticed that there is a
hash checksum at the end of the commit-graph-file. That in combination
with the tests where you flip some bytes...

It turns out, if my reading is right, that the hash value is written as
the commit-graph is generated, but that it is not verified as the
commit-graph is later read. I could not find any mention of your plans
here -- I understand why you would not want to verify the hash in
`load_commit_graph_one()`, at least not in every run. Anyway, this is
just an observation. Verifying the hash would affect the tests this
series adds. They might need to rewrite the hash or set some magic
environment variable. :-/ But that's for another day.

> +   for (i = 0; i < g->num_commits; i++) {
> +   struct commit *graph_commit, *odb_commit;
> +   struct commit_list *graph_parents, *odb_parents;
> +   int num_parents = 0;
> +   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

`num_commits` was derived as the commit-graph was loaded. It was derived
from offsets which were verified to be in the mmap-ed memory. So this
source address is guaranteed to be so, as well. Ok.

(Once brian's latest series hits master, this could use `oidread(...)`.)

> +   graph_commit = lookup_commit(_oid);

Do we know this comes from the graph? Even with a more-or-less-messed-up
commit graph? See below.

> +   odb_commit = (struct commit *)create_object(cur_oid.hash, 
> alloc_commit_node());
> +   if (parse_commit_internal(odb_commit, 0, 0)) {
> +   graph_report("failed to parse %s from object 
> database", oid_to_hex(_oid));
> +   continue;
> +   }
> +
> +   if (oidcmp(_commit_tree_in_graph_one(g, 
> graph_commit)->object.oid,
> +  get_commit_tree_oid(odb_commit)))

`get_commit_tree_in_graph_one()` will BUG rather than return NULL. So
this will not dereference NULL. Good. But might it hit the BUG? That is,
can we trust the commit coming out of `lookup_commit()` not to have
`graph_pos == COMMIT_NOT_FROM_GRAPH`?

> +   graph_report("root tree object ID for commit %s in 
> commit-graph is %s != %s",
> +oid_to_hex(_oid),
> +
> oid_to_hex(get_commit_tree_oid(graph_commit)),
> +
> oid_to_hex(get_commit_tree_oid(odb_commit)));
> +
> +   if (graph_commit->date != odb_commit->date)
> +   graph_report("commit date for commit %s in 
> commit-graph is %"PRItime" != %"PRItime"",
> +oid_to_hex(_oid),
> +graph_commit->date,
> +odb_commit->date);
> +
> +

(Extra blank line?)

> +   graph_parents = graph_commit->parents;
> +   odb_parents = odb_commit->parents;
> +
> +   while (graph_parents) {
> +   num_parents++;
> +
> +   if (odb_parents == NULL)
> +   graph_report("commit-graph parent list for 
> commit %s is too long (%d)",
> +oid_to_hex(_oid),
> +num_parents);
> +
> +   if (oidcmp(_parents->item->object.oid, 
> _parents->item->object.oid))
> +   graph_report("commit-graph parent for %s is 
> %s != %s",
> +oid_to_hex(_oid),
> +
> oid_to_hex(_parents->item->object.oid),
> +
> oid_to_hex(_parents->item->object.oid));
> +
> +   graph_parents = graph_parents->next;
> +   odb_parents = odb_parents->next;
> +   }
> +
> +   if (odb_parents != NULL)
> +   graph_report("commit-graph parent list for commit %s 
> terminates early",
> +oid_to_hex(_oid));

Ok, ensure the lists are equally long and compare the entries.

> +
> +   if (graph_commit->generation) {

If the commit has a generation number (not an old commit graph)...

> +   uint32_t max_generation = 0;
> +   graph_parents = graph_commit->parents;
> +
> +   while (graph_parents) {
> +   if (graph_parents->item->generation == 
> GENERATION_NUMBER_ZERO ||
> +   graph_parents->item->generation == 
> GENERATION_NUMBER_INFINITY)
> +   graph_report("commit-graph has valid 
> generation for %s but not 

Re: [PATCH v2 08/12] commit-graph: verify commit contents against odb

2018-05-14 Thread Derrick Stolee

On 5/12/2018 5:17 PM, Martin Ågren wrote:

On 11 May 2018 at 23:15, Derrick Stolee  wrote:

When running 'git commit-graph verify', compare the contents of the
commits that are loaded from the commit-graph file with commits that are
loaded directly from the object database. This includes checking the
root tree object ID, commit date, and parents.

Parse the commit from the graph during the initial loop through the
object IDs to guarantee we parse from the commit-graph file.

In addition, verify the generation number calculation is correct for all
commits in the commit-graph file.

While testing, we discovered that mutating the integer value for a
parent to be outside the accepted range causes a segmentation fault. Add
a new check in insert_parent_or_die() that prevents this fault. Check
for that error during the test, both in the typical parents and in the
list of parents for octopus merges.

This paragraph and the corresponding fix and test feel like a separate
patch to me. (The commit message of it could be "To test the next patch,
we threw invalid data at `git commit-graph verify, and it crashed in
pre-existing code, so let's fix that first" -- there is definitely a
connection.) Is this important enough to fast-track to master in time
for 2.18? My guess would be no.


+
+   if (pos >= g->num_commits)
+   die("invalide parent position %"PRIu64, pos);

s/invalide/invalid/


@@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g)
 return 1;

 for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
 hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

 if (i && oidcmp(_oid, _oid) >= 0)
@@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g)

 cur_fanout_pos++;
 }
+
+   graph_commit = lookup_commit(_oid);
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report("failed to parse %s from commit-graph", 
oid_to_hex(_oid));
 }

Could this end up giving ridiculous amounts of output? It would depend
on the input, I guess.


 while (cur_fanout_pos < 256) {
@@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g)
 cur_fanout_pos++;
 }

+   if (verify_commit_graph_error)
+   return 1;

Well, here we give up before running into *too* much problem.


+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
+   int num_parents = 0;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   graph_commit = lookup_commit(_oid);
+   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   if (parse_commit_internal(odb_commit, 0, 0)) {
+   graph_report("failed to parse %s from object database", 
oid_to_hex(_oid));
+   continue;
+   }
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report("root tree object ID for commit %s in 
commit-graph is %s != %s",
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report("commit date for commit %s in commit-graph is %"PRItime" 
!= %"PRItime"",
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
+
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   num_parents++;
+
+   if (odb_parents == NULL)
+   graph_report("commit-graph parent list for commit %s 
is too long (%d)",
+oid_to_hex(_oid),
+num_parents);
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report("commit-graph parent for %s is %s != 
%s",
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+

Re: [PATCH v2 08/12] commit-graph: verify commit contents against odb

2018-05-12 Thread Martin Ågren
On 11 May 2018 at 23:15, Derrick Stolee  wrote:
> When running 'git commit-graph verify', compare the contents of the
> commits that are loaded from the commit-graph file with commits that are
> loaded directly from the object database. This includes checking the
> root tree object ID, commit date, and parents.
>
> Parse the commit from the graph during the initial loop through the
> object IDs to guarantee we parse from the commit-graph file.
>
> In addition, verify the generation number calculation is correct for all
> commits in the commit-graph file.
>
> While testing, we discovered that mutating the integer value for a
> parent to be outside the accepted range causes a segmentation fault. Add
> a new check in insert_parent_or_die() that prevents this fault. Check
> for that error during the test, both in the typical parents and in the
> list of parents for octopus merges.

This paragraph and the corresponding fix and test feel like a separate
patch to me. (The commit message of it could be "To test the next patch,
we threw invalid data at `git commit-graph verify, and it crashed in
pre-existing code, so let's fix that first" -- there is definitely a
connection.) Is this important enough to fast-track to master in time
for 2.18? My guess would be no.

> +
> +   if (pos >= g->num_commits)
> +   die("invalide parent position %"PRIu64, pos);

s/invalide/invalid/

> @@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g)
> return 1;
>
> for (i = 0; i < g->num_commits; i++) {
> +   struct commit *graph_commit;
> +
> hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
>
> if (i && oidcmp(_oid, _oid) >= 0)
> @@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g)
>
> cur_fanout_pos++;
> }
> +
> +   graph_commit = lookup_commit(_oid);
> +   if (!parse_commit_in_graph_one(g, graph_commit))
> +   graph_report("failed to parse %s from commit-graph", 
> oid_to_hex(_oid));
> }

Could this end up giving ridiculous amounts of output? It would depend
on the input, I guess.

> while (cur_fanout_pos < 256) {
> @@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g)
> cur_fanout_pos++;
> }
>
> +   if (verify_commit_graph_error)
> +   return 1;

Well, here we give up before running into *too* much problem.

> +   for (i = 0; i < g->num_commits; i++) {
> +   struct commit *graph_commit, *odb_commit;
> +   struct commit_list *graph_parents, *odb_parents;
> +   int num_parents = 0;
> +
> +   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> +
> +   graph_commit = lookup_commit(_oid);
> +   odb_commit = (struct commit *)create_object(cur_oid.hash, 
> alloc_commit_node());
> +   if (parse_commit_internal(odb_commit, 0, 0)) {
> +   graph_report("failed to parse %s from object 
> database", oid_to_hex(_oid));
> +   continue;
> +   }
> +
> +   if (oidcmp(_commit_tree_in_graph_one(g, 
> graph_commit)->object.oid,
> +  get_commit_tree_oid(odb_commit)))
> +   graph_report("root tree object ID for commit %s in 
> commit-graph is %s != %s",
> +oid_to_hex(_oid),
> +
> oid_to_hex(get_commit_tree_oid(graph_commit)),
> +
> oid_to_hex(get_commit_tree_oid(odb_commit)));
> +
> +   if (graph_commit->date != odb_commit->date)
> +   graph_report("commit date for commit %s in 
> commit-graph is %"PRItime" != %"PRItime"",
> +oid_to_hex(_oid),
> +graph_commit->date,
> +odb_commit->date);
> +
> +
> +   graph_parents = graph_commit->parents;
> +   odb_parents = odb_commit->parents;
> +
> +   while (graph_parents) {
> +   num_parents++;
> +
> +   if (odb_parents == NULL)
> +   graph_report("commit-graph parent list for 
> commit %s is too long (%d)",
> +oid_to_hex(_oid),
> +num_parents);
> +
> +   if (oidcmp(_parents->item->object.oid, 
> _parents->item->object.oid))
> +   graph_report("commit-graph parent for %s is 
> %s != %s",
> +oid_to_hex(_oid),
> +
> oid_to_hex(_parents->item->object.oid),
> +
> oid_to_hex(_parents->item->object.oid));
> +

[PATCH v2 08/12] commit-graph: verify commit contents against odb

2018-05-11 Thread Derrick Stolee
When running 'git commit-graph verify', compare the contents of the
commits that are loaded from the commit-graph file with commits that are
loaded directly from the object database. This includes checking the
root tree object ID, commit date, and parents.

Parse the commit from the graph during the initial loop through the
object IDs to guarantee we parse from the commit-graph file.

In addition, verify the generation number calculation is correct for all
commits in the commit-graph file.

While testing, we discovered that mutating the integer value for a
parent to be outside the accepted range causes a segmentation fault. Add
a new check in insert_parent_or_die() that prevents this fault. Check
for that error during the test, both in the typical parents and in the
list of parents for octopus merges.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 100 
 t/t5318-commit-graph.sh |  64 +++
 2 files changed, 164 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5bb93e533c..a15ad9710d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -237,6 +237,10 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
 {
struct commit *c;
struct object_id oid;
+
+   if (pos >= g->num_commits)
+   die("invalide parent position %"PRIu64, pos);
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit();
if (!c)
@@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g)
return 1;
 
for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
if (i && oidcmp(_oid, _oid) >= 0)
@@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g)
 
cur_fanout_pos++;
}
+
+   graph_commit = lookup_commit(_oid);
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report("failed to parse %s from commit-graph", 
oid_to_hex(_oid));
}
 
while (cur_fanout_pos < 256) {
@@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g)
cur_fanout_pos++;
}
 
+   if (verify_commit_graph_error)
+   return 1;
+
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
+   int num_parents = 0;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   graph_commit = lookup_commit(_oid);
+   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   if (parse_commit_internal(odb_commit, 0, 0)) {
+   graph_report("failed to parse %s from object database", 
oid_to_hex(_oid));
+   continue;
+   }
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report("root tree object ID for commit %s in 
commit-graph is %s != %s",
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report("commit date for commit %s in commit-graph 
is %"PRItime" != %"PRItime"",
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
+
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   num_parents++;
+
+   if (odb_parents == NULL)
+   graph_report("commit-graph parent list for 
commit %s is too long (%d)",
+oid_to_hex(_oid),
+num_parents);
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report("commit-graph parent for %s is %s 
!= %s",
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+