Re: [PATCH v3 5/7] gc: handle a corner case in gc.bigPackThreshold

2018-03-16 Thread Ævar Arnfjörð Bjarmason

On Fri, Mar 16 2018, Nguyễn Thái Ngọc Duy jotted:

> This config allows us to keep  packs back if their size is larger
> than a limit. But if this N >= gc.autoPackLimit, we may have a
> problem. We are supposed to reduce the number of packs after a
> threshold because it affects performance.
>
> We could tell the user that they have incompatible gc.bigPackThreshold
> and gc.autoPackLimit, but it's kinda hard when 'git gc --auto' runs in
> background. Instead let's fall back to the next best stategy: try to
> reduce the number of packs anyway, but keep the base pack out. This
> reduces the number of packs to two and hopefully won't take up too
> much resources to repack (the assumption still is the base pack takes
> most resources to handle).

I think this strategy makes perfect sense.

Those with say a 1GB "base" pack might set this setting at to 500MB or
something large like that, then it's realistically never going to happen
that you're going to then have a collision between gc.bigPackThreshold
and gc.autoPackLimit, even if your checkout is many years old *maybe*
you've accumulated 5-10 of those 500MB packs for any sane repo.

But this also allows for setting this value really low, e.g. 50MB or
something to place a very low upper bound on how much memory GC takes on
a regular basis, but of course you'll need to repack that set of 50MB's
eventually.

Great!


[PATCH v3 5/7] gc: handle a corner case in gc.bigPackThreshold

2018-03-16 Thread Nguyễn Thái Ngọc Duy
This config allows us to keep  packs back if their size is larger
than a limit. But if this N >= gc.autoPackLimit, we may have a
problem. We are supposed to reduce the number of packs after a
threshold because it affects performance.

We could tell the user that they have incompatible gc.bigPackThreshold
and gc.autoPackLimit, but it's kinda hard when 'git gc --auto' runs in
background. Instead let's fall back to the next best stategy: try to
reduce the number of packs anyway, but keep the base pack out. This
reduces the number of packs to two and hopefully won't take up too
much resources to repack (the assumption still is the base pack takes
most resources to handle).

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt | 5 +
 builtin/gc.c | 9 +++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c12c58813c..ce40112e31 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1554,6 +1554,11 @@ gc.bigPackThreshold::
`git gc` is run. This is very similar to `--keep-base-pack`
except that all packs that meet the threshold are kept, not
just the base pack. Defaults to zero.
++
+Note that if the number of kept packs is more than gc.autoPackLimit,
+this configuration variable is ignored, all packs except the base pack
+will be repacked. After this the number of packs should go below
+gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
diff --git a/builtin/gc.c b/builtin/gc.c
index c0f1922c24..140c1bb7dd 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -336,9 +336,14 @@ static int need_to_gc(void)
if (too_many_packs()) {
struct string_list keep_pack = STRING_LIST_INIT_NODUP;
 
-   if (big_pack_threshold)
+   if (big_pack_threshold) {
find_base_packs(&keep_pack, big_pack_threshold);
-   else {
+   if (keep_pack.nr >= gc_auto_pack_limit) {
+   big_pack_threshold = 0;
+   string_list_clear(&keep_pack, 0);
+   find_base_packs(&keep_pack, 0);
+   }
+   } else {
struct packed_git * p = find_base_packs(&keep_pack, 0);
uint64_t mem_have, mem_want;
 
-- 
2.16.2.903.gd04caf5039