Re: [PATCH v3 10/20] commit-graph: verify objects exist

Jakub Narebski Wed, 30 May 2018 12:23:09 -0700

Derrick Stolee <dsto...@microsoft.com> writes:

> In the 'verify' subcommand, load commits directly from the object
> database to ensure they exist. Parse by skipping the commit-graph.


All right, before we check that the commit data matches, we need to
check that all the commits in cache (in the serialized commit graph) are
present in real data (in the object database of the repository).

What's nice of this series is that the operation that actually removes
unreachable commits from the object database, that is `git gc`, would
also update commit-gaph file.

> Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
> ---
>  commit-graph.c          | 20 ++++++++++++++++++++
>  t/t5318-commit-graph.sh |  7 +++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index cbd1aae514..0420ebcd87 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -238,6 +238,10 @@ static struct commit_list **insert_parent_or_die(struct 
> commit_graph *g,
>  {
>       struct commit *c;
>       struct object_id oid;
> +
> +     if (pos >= g->num_commits)
> +             die("invalid parent position %"PRIu64, pos);
> +

This change is not described in the commit message.

>       hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
>       c = lookup_commit(&oid);
>       if (!c)
> @@ -905,5 +909,21 @@ int verify_commit_graph(struct commit_graph *g)
>               cur_fanout_pos++;
>       }
>  
> +     if (verify_commit_graph_error)
> +             return verify_commit_graph_error;

All right, so we by default short-circuit so that errors found earlier
wouldn't cause crash when checking other things.

Is it needed, though, in this case?  Though I guess it is better to be
conservative; lso by terminating after a batch of one type of errors we
don't get many different error messages from the same error (i.e. error
propagation).

> +
> +     for (i = 0; i < g->num_commits; i++) {
> +             struct commit *odb_commit;
> +
> +             hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> +
> +             odb_commit = (struct commit *)create_object(cur_oid.hash, 
> alloc_commit_node());

Do we really need to keep all those commits from the object database in
memory (in the object::obj_hash hash table)?  Perhaps using something
like Flywheel / Recycler pattern would be a better solution (if
possible)?

Though perhaps this doesn't matter much with respect to memory use.

> +             if (parse_commit_internal(odb_commit, 0, 0)) {

Just a reminder to myself: the signature is

  int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)


Hmmm... I wonder if with two boolean paramaters wouldn't it be better to
use flags parameter, i.e.

  int parse_commit_internal(struct commit *item, int flags)

  ...

  parse_commit_internal(commit, QUIET_ON_MISSING | USE_COMMIT_GRAPH)

But I guess that it is not worth it (especially for internal-ish
function).

> +                     graph_report("failed to parse %s from object database",
> +                                  oid_to_hex(&cur_oid));

Wouldn't parse_commit_internal() show it's own error message, in
addition to the one above?

> +                     continue;
> +             }
> +     }
> +
>       return verify_commit_graph_error;
>  }
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index c050ef980b..996a016239 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
>       git commit-graph verify >output
>  '
>  
> +NUM_COMMITS=9
>  HASH_LEN=20
>  GRAPH_BYTE_VERSION=4
>  GRAPH_BYTE_HASH=5
> @@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
>  GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
>  GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
>  GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
> +GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
> 4 + 10`

All right, so we modify 10-the byte of OID of 5-th commit out of 9,
am I correct here?

>  
>  # usage: corrupt_graph_and_verify <position> <data> <string>
>  # Manipulates the commit-graph file at the position
> @@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
>               "incorrect OID order"
>  '
>  
> +test_expect_success 'detect OID not in object database' '
> +     corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
> +             "from object database"
> +'

I guess that if we ensure that OIDs are constant, you have chosen the
change to actually corrupt the OID in OID Lookup chunk to point to OID
that is not in the object database (while still not violating the
constraint that OID in OID Lookup chunk needs to be sorted).

> +
>  test_done

All right (well, except for `expr ... ` --> $(( ... )) change).

-- 
Jakub Narębski

Re: [PATCH v3 10/20] commit-graph: verify objects exist

Reply via email to