On 5/30/2018 3:22 PM, Jakub Narebski wrote:
Derrick Stolee <dsto...@microsoft.com> writes:

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.
All right, before we check that the commit data matches, we need to
check that all the commits in cache (in the serialized commit graph) are
present in real data (in the object database of the repository).

What's nice of this series is that the operation that actually removes
unreachable commits from the object database, that is `git gc`, would
also update commit-gaph file.

Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  commit-graph.c          | 20 ++++++++++++++++++++
  t/t5318-commit-graph.sh |  7 +++++++
  2 files changed, 27 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index cbd1aae514..0420ebcd87 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -238,6 +238,10 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
  {
        struct commit *c;
        struct object_id oid;
+
+       if (pos >= g->num_commits)
+               die("invalid parent position %"PRIu64, pos);
+
This change is not described in the commit message.
This change should go in "commit-graph: verify parent list" which adds a test that fails without it. Thanks.

        hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
        c = lookup_commit(&oid);
        if (!c)
@@ -905,5 +909,21 @@ int verify_commit_graph(struct commit_graph *g)
                cur_fanout_pos++;
        }
+ if (verify_commit_graph_error)
+               return verify_commit_graph_error;
All right, so we by default short-circuit so that errors found earlier
wouldn't cause crash when checking other things.

Is it needed, though, in this case?  Though I guess it is better to be
conservative; lso by terminating after a batch of one type of errors we
don't get many different error messages from the same error (i.e. error
propagation).

+
+       for (i = 0; i < g->num_commits; i++) {
+               struct commit *odb_commit;
+
+               hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+               odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
Do we really need to keep all those commits from the object database in
memory (in the object::obj_hash hash table)?  Perhaps using something
like Flywheel / Recycler pattern would be a better solution (if
possible)?

Though perhaps this doesn't matter much with respect to memory use.

+               if (parse_commit_internal(odb_commit, 0, 0)) {
Just a reminder to myself: the signature is

   int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)


Hmmm... I wonder if with two boolean paramaters wouldn't it be better to
use flags parameter, i.e.

   int parse_commit_internal(struct commit *item, int flags)

   ...

   parse_commit_internal(commit, QUIET_ON_MISSING | USE_COMMIT_GRAPH)

But I guess that it is not worth it (especially for internal-ish
function).

+                       graph_report("failed to parse %s from object database",
+                                    oid_to_hex(&cur_oid));
Wouldn't parse_commit_internal() show it's own error message, in
addition to the one above?

+                       continue;
+               }
+       }
+
        return verify_commit_graph_error;
  }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c050ef980b..996a016239 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
        git commit-graph verify >output
  '
+NUM_COMMITS=9
  HASH_LEN=20
  GRAPH_BYTE_VERSION=4
  GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
  GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
  GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
  GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
+GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 
+ 10`
All right, so we modify 10-the byte of OID of 5-th commit out of 9,
am I correct here?

# usage: corrupt_graph_and_verify <position> <data> <string>
  # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
                "incorrect OID order"
  '
+test_expect_success 'detect OID not in object database' '
+       corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+               "from object database"
+'
I guess that if we ensure that OIDs are constant, you have chosen the
change to actually corrupt the OID in OID Lookup chunk to point to OID
that is not in the object database (while still not violating the
constraint that OID in OID Lookup chunk needs to be sorted).

+
  test_done
All right (well, except for `expr ... ` --> $(( ... )) change).


Reply via email to