On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote:

> Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
> are two codepaths in pack-objects: with & without using bitmap
> reachability index.

Sorry, I got distracted from reviewing these patches. I'll give them a
detailed look now and hopefully we can finalize the topic.

> In want_object_in_pack() we care to start the checks from already found
> pack, if we have one, this way determining the answer right away
> in case neither --local nor --honour-pack-keep are active. In
> particular, as p5310-pack-bitmaps.sh shows, we do not do harm to
> served-with-bitmap clones performance-wise:
> 
>     Test                      56dfeb62          this tree
>     -----------------------------------------------------------------
>     5310.2: repack to disk    9.63(8.67+0.33)   9.47(8.55+0.28) -1.7%
>     5310.3: simulated clone   2.07(2.17+0.12)   2.03(2.14+0.12) -1.9%
>     5310.4: simulated fetch   0.78(1.03+0.02)   0.76(1.00+0.03) -2.6%
>     5310.6: partial bitmap    1.97(2.43+0.15)   1.92(2.36+0.14) -2.5%
> 
> with all differences strangely showing we are a bit faster now, but
> probably all being within noise.

Good to know there is no regression. It is curious that there is a
slight _improvement_ across the board. Do we have an explanation for
that? It seems odd that noise would be so consistent.

> And in the general case we care not to have duplicate
> find_pack_entry_one(*found_pack) calls. Worst what can happen is we can
> call want_found_object(*found_pack) -- newly introduced helper for
> checking whether we want object -- twice, but since want_found_object()
> is very lightweight it does not make any difference.

I had trouble parsing this. I think maybe:

  In the general case we do not want to call find_pack_entry_one() more
  than once, because it is expensive. This patch splits the loop in
  want_object_in_pack() into two parts: finding the object and seeing if
  it impacts our choice to include it in the pack. We may call the
  inexpensive want_found_object() twice, but we will never call
  find_pack_entry_one() if we do not need to.

> +static int want_found_object(int exclude, struct packed_git *p)
> +{
> +     if (exclude)
> +             return 1;
> +     if (incremental)
> +             return 0;
> +
> +     /*
> +      * When asked to do --local (do not include an object that appears in a
> +      * pack we borrow from elsewhere) or --honor-pack-keep (do not include
> +      * an object that appears in a pack marked with .keep), finding a pack
> +      * that matches the criteria is sufficient for us to decide to omit it.
> +      * However, even if this pack does not satisfy the criteria, we need to
> +      * make sure no copy of this object appears in _any_ pack that makes us
> +      * to omit the object, so we need to check all the packs. Signal that by
> +      * returning -1 to the caller.
> +      */
> +     if (!ignore_packed_keep &&
> +         (!local || !have_non_local_packs))
> +             return 1;

Hmm. The comment says "-1", but the return says "1". That is because the
comment is describing the return that happens at the end. :)

I wonder if the last sentence should be:

  We can check here whether these options can possibly matter; if not,
  we can return early from the function here. Otherwise, we signal "-1"
  at the end to tell the caller that we do not know either way, and it
  needs to check more packs.

> -     *found_pack = NULL;
> -     *found_offset = 0;
> +     /*
> +      * If we already know the pack object lives in, start checks from that
> +      * pack - in the usual case when neither --local was given nor .keep 
> files
> +      * are present we will determine the answer right now.
> +      */
> +     if (*found_pack) {
> +             want = want_found_object(exclude, *found_pack);
> +             if (want != -1)
> +                     return want;
> +     }

Looks correct. Though it is not really "start checks from..." anymore,
but rather "do a quick check to see if we can quit early, and otherwise
start the loop". That might be nitpicking, though.

>       for (p = packed_git; p; p = p->next) {
> -             off_t offset = find_pack_entry_one(sha1, p);
> +             off_t offset;
> +
> +             if (p == *found_pack)
> +                     offset = *found_offset;
> +             else
> +                     offset = find_pack_entry_one(sha1, p);
> +

This hunk will conflict with the MRU optimizations in 'next', but I
think the resolution should be pretty trivial.

>  static int add_object_entry(const unsigned char *sha1, enum object_type type,
>                           const char *name, int exclude)
>  {
> -     struct packed_git *found_pack;
> -     off_t found_offset;
> +     struct packed_git *found_pack = NULL;
> +     off_t found_offset = 0;

I think technically we don't need to initialize found_offset here (it is
considered only if *found_pack is not NULL), but it doesn't hurt to make
our starting assumptions clear.

> @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned 
> char *sha1,
>       if (have_duplicate_entry(sha1, 0, &index_pos))
>               return 0;
>  
> +     if (!want_object_in_pack(sha1, 0, &pack, &offset))
> +             return 0;
> +

And this caller doesn't need to worry about initialization, because of
course it knows it has a pack/offset already. Good.

> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 3893afd..a278d30 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh

Tests look OK. I saw a few style nitpicks, but I think they are not even
against our style guide but more "I would have written it like this" and
are not even worth quibbling over.

So I think the code here is fine, and I just had a few minor complaints
on comment and commit message clarity.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to