On Monday, August 05, 2013 08:38:47 pm Ramkumar Ramachandra 
wrote:
> This is the rough explanation I wrote down after reading
> it:
> 
> So, the problem is that my .git/objects/pack is polluted
> with little packs everytime I fetch (or push, if you're
> the server), and this is problematic from the
> perspective of a overtly (naively) aggressive gc that
> hammers out all fragmentation.  So, on the first run,
> the little packfiles I have are all "consolidated" into
> big packfiles; you also write .keep files to say that
> "don't gc these big packs we just generated".  In
> subsequent runs, the little packfiles from the fetch are
> absorbed into a pack that is immune to gc.  You're also
> using a size heuristic, to consolidate similarly sized
> packfiles.  You also have a --ratio to tweak the ratio
> of sizes.
> 
> From: Martin Fick<mf...@codeaurora.org>
> See: https://gerrit-review.googlesource.com/#/c/35215/
> Thread:
> http://thread.gmane.org/gmane.comp.version-control.git/2
> 31555 (Martin's emails are missing from the archive)
> ---

After analyzing today's data, I recognize that in some 
circumstances the size estimation after consolidation can be 
off by huge amounts.  The script naively just adds the 
current sizes together.  This gives a very rough estimate, 
of the new packfile size, but sometimes it can be off by 
over 2 orders of magnitude. :(  While many new packfiles are 
tiny (several K only), it seems like the larger new 
packfiles have a terrible tendency to throw the estimate way 
off (I suspect they simply have many duplicate objects).  
But despite this poor estimate, the script still offers 
drastic improvements over plain git gc.

So, it has me wondering if there isn't a more accurate way 
to estimate the new packfile without wasting a ton of time?

If not, one approach which might be worth experimenting with 
is to just assume that new packfiles have size 0!  Then just 
consolidate them with any other packfile which is ready for 
consolidation, or if none are ready, with the smallest 
packfile.  I would not be surprised to see this work on 
average better than the current summation,

-Martin


-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to