On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote:
> Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
> are two codepaths in pack-objects: with & without using bitmap
> reachability index.
Sorry, I got distracted from reviewing these patches. I'll give them a
detailed look now and hopefully we can finalize the topic.
> In want_object_in_pack() we care to start the checks from already found
> pack, if we have one, this way determining the answer right away
> in case neither --local nor --honour-pack-keep are active. In
> particular, as p5310-pack-bitmaps.sh shows, we do not do harm to
> served-with-bitmap clones performance-wise:
>
> Test 56dfeb62 this tree
> -----------------------------------------------------------------
> 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7%
> 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9%
> 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6%
> 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5%
>
> with all differences strangely showing we are a bit faster now, but
> probably all being within noise.
Good to know there is no regression. It is curious that there is a
slight _improvement_ across the board. Do we have an explanation for
that? It seems odd that noise would be so consistent.
> And in the general case we care not to have duplicate
> find_pack_entry_one(*found_pack) calls. Worst what can happen is we can
> call want_found_object(*found_pack) -- newly introduced helper for
> checking whether we want object -- twice, but since want_found_object()
> is very lightweight it does not make any difference.
I had trouble parsing this. I think maybe:
In the general case we do not want to call find_pack_entry_one() more
than once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if
it impacts our choice to include it in the pack. We may call the
inexpensive want_found_object() twice, but we will never call
find_pack_entry_one() if we do not need to.
> +static int want_found_object(int exclude, struct packed_git *p)
> +{
> + if (exclude)
> + return 1;
> + if (incremental)
> + return 0;
> +
> + /*
> + * When asked to do --local (do not include an object that appears in a
> + * pack we borrow from elsewhere) or --honor-pack-keep (do not include
> + * an object that appears in a pack marked with .keep), finding a pack
> + * that matches the criteria is sufficient for us to decide to omit it.
> + * However, even if this pack does not satisfy the criteria, we need to
> + * make sure no copy of this object appears in _any_ pack that makes us
> + * to omit the object, so we need to check all the packs. Signal that by
> + * returning -1 to the caller.
> + */
> + if (!ignore_packed_keep &&
> + (!local || !have_non_local_packs))
> + return 1;
Hmm. The comment says "-1", but the return says "1". That is because the
comment is describing the return that happens at the end. :)
I wonder if the last sentence should be:
We can check here whether these options can possibly matter; if not,
we can return early from the function here. Otherwise, we signal "-1"
at the end to tell the caller that we do not know either way, and it
needs to check more packs.
> - *found_pack = NULL;
> - *found_offset = 0;
> + /*
> + * If we already know the pack object lives in, start checks from that
> + * pack - in the usual case when neither --local was given nor .keep
> files
> + * are present we will determine the answer right now.
> + */
> + if (*found_pack) {
> + want = want_found_object(exclude, *found_pack);
> + if (want != -1)
> + return want;
> + }
Looks correct. Though it is not really "start checks from..." anymore,
but rather "do a quick check to see if we can quit early, and otherwise
start the loop". That might be nitpicking, though.
> for (p = packed_git; p; p = p->next) {
> - off_t offset = find_pack_entry_one(sha1, p);
> + off_t offset;
> +
> + if (p == *found_pack)
> + offset = *found_offset;
> + else
> + offset = find_pack_entry_one(sha1, p);
> +
This hunk will conflict with the MRU optimizations in 'next', but I
think the resolution should be pretty trivial.
> static int add_object_entry(const unsigned char *sha1, enum object_type type,
> const char *name, int exclude)
> {
> - struct packed_git *found_pack;
> - off_t found_offset;
> + struct packed_git *found_pack = NULL;
> + off_t found_offset = 0;
I think technically we don't need to initialize found_offset here (it is
considered only if *found_pack is not NULL), but it doesn't hurt to make
our starting assumptions clear.
> @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned
> char *sha1,
> if (have_duplicate_entry(sha1, 0, &index_pos))
> return 0;
>
> + if (!want_object_in_pack(sha1, 0, &pack, &offset))
> + return 0;
> +
And this caller doesn't need to worry about initialization, because of
course it knows it has a pack/offset already. Good.
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 3893afd..a278d30 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
Tests look OK. I saw a few style nitpicks, but I think they are not even
against our style guide but more "I would have written it like this" and
are not even worth quibbling over.
So I think the code here is fine, and I just had a few minor complaints
on comment and commit message clarity.
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html