On Tue, Sep 03, 2019 at 10:22:57PM -0400, Taylor Blau wrote:

> When we write a commit graph chunk, we process a given list of 'struct
> commit *'s and parse out the parent(s) and tree OID in order to write
> out its information.
> 
> We do this by calling 'parse_commit_no_graph', and then checking the
> result of 'get_commit_tree_oid' to write the tree OID. This process
> assumes that 'parse_commit_no_graph' parses the commit successfully.
> When this isn't the case, 'get_commit_tree_oid(*list)' may return NULL,
> in which case trying to '->hash' it causes a SIGSEGV.
> 
> Instead, teach 'write_graph_chunk_data' to stop when a commit isn't able
> to be parsed, at the peril of failing to write a commit-graph.

Yeah, I think it makes sense for commit-graph to bail completely if it
can't parse here. In theory it could omit the entry from the
commit-graph file (and a reader would just fall back to trying to access
the object contents itself), but I think we're too late in the process
for that. And besides, this should generally only happen in a corrupt
repository.

However...

> diff --git a/commit-graph.c b/commit-graph.c
> index f2888c203b..6aa6998ecd 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -843,7 +843,9 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>               uint32_t packedDate[2];
>               display_progress(ctx->progress, ++ctx->progress_cnt);
>  
> -             parse_commit_no_graph(*list);
> +             if (parse_commit_no_graph(*list))
> +                     die(_("unable to parse commit %s"),
> +                             oid_to_hex(&(*list)->object.oid));
>               hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);

I don't think parse_commit_no_graph() returning 0 assures us that
get_commit_tree() and friends will return non-NULL.

This is similar to the case discussed recently where a failed parse of a
tag may leave "tag->tagged == NULL" even though "tag->obj.parsed" is
set.

Here an earlier parsing error could cause (*list)->object.parsed to be
true, but with (*list)->maybe_tree still NULL. Our call to
parse_commit_no_graph() here would silently return "yep, already tried
to parse this", and then we'd still segfault.

We _could_ check:

  if (parse_commit_no_graph(*list))
        die("unable to parse...");
  tree = get_commit_tree_oid(*list);
  if (!tree)
        die("unable to get tree for %s...");

but trees are just one piece of data. In fact, the situation is much
worse for parents: there a NULL parent pointer is valid data, so worse
than segfaulting, we'd write invalid data to the commit graph, skipping
one or more parents!

And I think there's literally no way for this function to tell the
difference between "no parent" and "there was an earlier error, but we
set the parsed flag anyway and the parent flag is invalid".

I think that argues against Junio's response in:

  https://public-inbox.org/git/xmqqo90bhmi3....@gitster-ct.c.googlers.com/

about how we can use the parsed flag to look at "slightly corrupt"
objects. I think we'd need at least an extra flag for "corrupt", though
I think it is simpler just _not_ setting "parsed" and letting the next
caller spend the extra cycles to rediscover the problem if they're
interested.

(All of this presupposes that you can convince another piece of code in
the same process to parse the commit buffer and ignore the error. I have
no idea if that's possible or not in this case, but it sure would be
nice not to have to care).

-Peff

Reply via email to