On Thu, Sep 05, 2019 at 03:19:48PM -0700, Junio C Hamano wrote:

> > Here an earlier parsing error could cause (*list)->object.parsed to be
> > true, but with (*list)->maybe_tree still NULL. Our call to
> > parse_commit_no_graph() here would silently return "yep, already tried
> > to parse this", and then we'd still segfault.
> > ...
> > And I think there's literally no way for this function to tell the
> > difference between "no parent" and "there was an earlier error, but we
> > set the parsed flag anyway and the parent flag is invalid".
> >
> > I think that argues against Junio's response in:
> 
> Fair enough.  Forcing later users to reattempt parsing (and failing
> the same way) would be safer and it should also be sufficient as we
> are talking about how to handle a broken repository, i.e. an error
> case.

One of the tricky things, and the reason I used a "corrupt" flag in my
earlier sketch, is that the state after we encounter a parse error is
unknown. So imagine parse_commit_buffer() sees that one of the parent
lines is bogus, and we return an error. The caller gets to see whatever
half-parsed state we managed to come up with.

So far so good. But now imagine we call parse_commit_buffer() again, and
we re-parse. How does that interact with the half-parsed state? Some of
it works OK (e.g., lookup_tree() would find the same tree). Some not so
much (I think we'd keep appending parents at each call).

I guess this might not be too bad to handle. Value fields like
timestamp_t are OK to overwrite. Pointers to objects likewise, since the
memory is owned elsewhere. If we see existing parent pointers in an
object we're parsing, we could probably free them under the assumption
they're leftover cruft. Likewise for the "tag" field of "struct tag",
which is owned by the struct and should be freed.

Blobs and trees don't actually parse anything into their structs. So
it's really just special-casing those two items.

-Peff

Reply via email to