What happens when the repository is bigger than gc.autopacklimit * pack.packSizeLimit?
[Previously sent to the git-users mailing list, but it probably should be addressed here.] A number of commands invoke git gc --auto to clean up the repository when there might be a lot of dangling objects and/or there might be far too many unpacked files. The manual pages say: git gc: --auto With this option, git gc checks whether any housekeeping is required; if not, it exits without performing any work. Some git commands run git gc --auto after performing operations that could create many loose objects. Housekeeping is required if there are too many loose objects or too many packs in the repository. If the number of loose objects exceeds the value of the gc.auto configuration variable, then all loose objects are combined into a single pack using git repack -d -l. Setting the value of gc.auto to 0 disables automatic packing of loose objects. git config: gc.autopacklimit When there are more than this many packs that are not marked with *.keep file in the repository, git gc --auto consolidates them into one larger pack. The default value is 50. Setting this to 0 disables it. What happens when the amount of data in the repository exceeds gc.autopacklimit * pack.packSizeLimit? According to the documentation, git gc --auto will then *always* repack the repository, whether it needs it or not, because the data will require more than gc.autopacklimit pack files. And it appears from an experiment that this is what happens. I have a repository with pack.packSizeLimit = 99m, and there are 104 pack files, and even when git gc is done, if I do git gc --auto, it will do git-repack again. Looking at the code, I see: builtin/gc.c: static int too_many_packs(void) { struct packed_git *p; int cnt; if (gc_auto_pack_limit = 0) return 0; prepare_packed_git(); for (cnt = 0, p = packed_git; p; p = p-next) { if (!p-pack_local) continue; if (p-pack_keep) continue; /* * Perhaps check the size of the pack and count only * very small ones here? */ cnt++; } return gc_auto_pack_limit = cnt; } Yes, perhaps you *should* check the size of the pack! What is a good strategy for making this function behave as we want it to? Dale -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What happens when the repository is bigger than gc.autopacklimit * pack.packSizeLimit?
On Wed, Aug 27, 2014 at 03:36:53PM -0400, Dale R. Worley wrote: And it appears from an experiment that this is what happens. I have a repository with pack.packSizeLimit = 99m, and there are 104 pack files, and even when git gc is done, if I do git gc --auto, it will do git-repack again. I agree that gc --auto could be smarter here, but I have to wonder: why are you setting the packsize limit to 99m in the first place? It is generally much more efficient to place everything in a single pack. There are more delta opportunities, fewer base objects, lookup is faster (we binary search each pack index, but linearly move through the list of indices), and it is required for advanced techniques like bitmaps. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What happens when the repository is bigger than gc.autopacklimit * pack.packSizeLimit?
wor...@alum.mit.edu (Dale R. Worley) writes: builtin/gc.c: static int too_many_packs(void) { struct packed_git *p; int cnt; if (gc_auto_pack_limit = 0) return 0; prepare_packed_git(); for (cnt = 0, p = packed_git; p; p = p-next) { if (!p-pack_local) continue; if (p-pack_keep) continue; /* * Perhaps check the size of the pack and count only * very small ones here? */ cnt++; } return gc_auto_pack_limit = cnt; } Yes, perhaps you *should* check the size of the pack! What is a good strategy for making this function behave as we want it to? Whoever decides the details of as we want it to gets to decide ;-). I think what we want is a mode where we repack only loose objects and small packs by concatenating them into a single large one (with deduping of base objects, the total would become smaller than the sum), while leaving existing large ones alone. Daily repacking would just coalesce new objects into the current pack that grows gradually and at some point it stops growing and join the more longer term large ones, until a full gc is done to optimize the overall history traversal, or something. But if your definition of the boundary between small and large is unreasonably low (and/or your definition of too many is unreasonably small), you will always have the problem you found. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html