On Tue,  1 May 2018 14:33:51 -0700
Stefan Beller <sbel...@google.com> wrote:

> Git's object access code can be thought of as containing two layers:
> the raw object store provides access to raw object content, while the
> higher level obj_hash code parses raw objects and keeps track of
> parenthood and other object relationships using 'struct object'.
> Keeping these layers separate should make it easier to find relevant
> functions and to change the implementation of one without having to
> touch the other.

I only understood this after reading the code below. I think this
paragraph can be removed; I don't see its relevance - of course we need
to store metadata about how to load objects somewhere, and caching
objects we have already loaded is a good idea: and the metadata and
cache are two separate things before and after this patch anyway.

> Add an object_parser field to 'struct repository' to prepare obj_hash
> to be handled per repository.  Callers still only use the_repository
> for now --- later patches will adapt them to handle arbitrary
> repositories.

I think this is better reworded as:

  Convert the existing global cache for parsed objects (obj_hash) into
  repository-specific parsed object caches. Existing code that uses
  obj_hash are modified to use the parsed object cache of
  the_repository; future patches will use the parsed object caches of
  other repositories.

> +struct object_parser *object_parser_new(void)
> +{
> +     struct object_parser *o = xmalloc(sizeof(*o));
> +     memset(o, 0, sizeof(*o));
> +     return o;
> +}

I'm not sure that I like this style of code (I prefer the strbuf style
with _INIT and release(), which I think is more flexible) but I don't
feel too strongly about it.

> +struct object_parser {
> +     struct object **obj_hash;
> +     int nr_objs, obj_hash_size;
> +};

object_parser is probably a bad name. I think Duy had some good
suggestions in [1].

[1] 
https://public-inbox.org/git/CACsJy8CgX6BME=ehidbfmrzboydbnhe6ouxv4fzc-gw7rsi...@mail.gmail.com/

>       /*
> -      * Holds any information related to accessing the raw object content.
> +      * Holds any information needed to retrieve the raw content
> +      * of objects. The object_parser uses this to get object
> +      * content which it then parses.
>        */
>       struct raw_object_store *objects;

I think the additional sentence ("The object_parser uses this...") is
unnecessary and confusing, especially if its identity is going to be one
of storage (like "parsed_objects" implies).

> +     /*
> +      * State for the object parser. This owns all parsed objects
> +      * (struct object) so callers do not have to manage their
> +      * lifetime.
> +      */
> +     struct object_parser *parsed_objects;

Even after all the commits in this patch set, this does not store any
state for parsing. Maybe just document as:

  All objects in this repository that have been parsed. This structure
  owns all objects it references, so users of "struct object *"
  generally do not need to free them; instead, when a repository is no
  longer used, call object_parser_clear() on this structure.

(And maybe say that the freeing method on struct repository will
automatically call object_parser_clear().)

Reply via email to