Re: [PATCH] gc: do not warn about too many loose objects
On Wed, Jul 18, 2018 at 03:11:38PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Yeah, I agree that deferring repeated gc's based on time is going to run > > into pathological corner cases. OTOH, what we've found at GitHub is that > > "gc --auto" is quite insufficient for scheduling maintenance anyway > > (e.g., if a machine gets pushes to 100 separate repositories in quick > > succession, you probably want to queue and throttle any associated gc). > > I'm sure you have better solutions for this at GitHub, but as an aside > it might be interesting to add some sort of gc flock + retry setting for > this use-case, i.e. even if you had 100 concurrent gc's due to > too_many_*(), they'd wait + retry until they could get the flock on a > given file. > > Then again this is probably too specific, and could be done with a > pre-auto-gc hook too.. Yeah, I think any multi-repo solution is getting way too specific for Git, and the best thing we can do is provide a hook. I agree you could probably do this today with a pre-auto-gc hook (if it skips gc it would just queue itself and return non-zero). Or even just make a mark in a database that says "there was some activity here". Since we have so much other infrastructure sitting between the user and Git anyway, we do that marking at a separate layer which is already talking to a database. ;) Anyway, I do agree with your general notion that this isn't the right approach for many situations, and auto-gc is a better solution for many cases. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Tue, Jul 17 2018, Jeff King wrote: > On Tue, Jul 17, 2018 at 10:59:50AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> That doesn't make sense to me. Just because git itself is happily >> ignoring the exit code I don't think that should mean there shouldn't be >> a meaningful exit code. Why don't you just ignore the exit code in the >> repo tool as well? >> >> Now e.g. automation I have to see if git-gc ---auto is having issues >> can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet >> of servers, but will need to start caring if stderr was emitted to. > > If you're daemonizing gc, though, there are a number of cases where the > exit code is not propagated. If you really care about the outcome, > you're probably better off either: In theory a lot of the other stuff can fail, but in practice I've only seen it get stuck due to running out of disk space (easily detected in other ways), and because of having too many loose objects (e.g. due to, but not limited to https://public-inbox.org/git/87fu6bmr0j@evledraar.gmail.com/). > - doing synchronous gc's, which will still return a meaningful code > after Jonathan's patches (Reply to this and "we've found at GitHub..." later in your mail) I'm targeting clients (dev machines, laptops, staging boxes) which have clones of repos, some put in place by automation, some manually cloned by users in ~. So it's much easier to rely on shipping a /etc/gitconfig than setting gc.auto=0 and having some system-wide job that goes and hunts for local git repos to manually gc. It's also a big advantage that it's driven by user activity, because it's an implicit guard of only_do_this_if_the_user_is_active_here() before "git-gc" on a fleet of machines, which when some of those get their disk space from shared NFS resources cuts down on overall I/O. > - inspecting the log yourself. I know that comes close to the un-unixy > stderr thing. But in this case, the presence of a non-empty log is > literally the on-disk bit for "the previous run failed". And doing > so catches all of the daemonized cases, even the "first one" that > you'd miss by paying attention to the immediate exit code. > > This will treat the zero-exit-code "warning" case as an error. I'm > not against propagating the true original error code, if somebody > wants to work on it. But I think Jonathan's patches are a strict > improvement over the current situation, and a patch to propagate > could come on top. Yeah, I can check gc.log, if others are of a different opinion about this #3 case at the end of the day I don't mind if it returns 0. It just doesn't make any conceptual sense to me. >> I think you're conflating two things here in a way that (if I'm reading >> this correctly) produces a pretty bad regression for users. >> >> a) If we have something in the gc.log we keep yelling at users until we >> reach the gc.logExpiry, this was the subject of my thread back in >> January. This sucks, and should be fixed somehow. >> >> Maybe we should only emit the warning once, e.g. creating a >> .git/gc.log.wasemitted marker and as long as its ctime is later than >> the mtime on .git/gc.log we don't emit it again). >> >> But that also creates the UX problem that it's easy to miss this >> message in the middle of some huge "fetch" output. Perhaps we should >> just start emitting this as part of "git status" or something (or >> solve the root cause, as Peff notes...). > > I kind of like that "git status" suggestion. Of course many users run > "git status" more often than "git commit", so it may actually spam them > more! Maybe piggy-packing on the advice facility ... >> b) We *also* use this presence of a gc.log as a marker for "we failed >> too recently, let's not try again until after a day". >> >> The semantics of b) are very useful, and just because they're tied up >> with a) right now, let's not throw out b) just because we're trying to >> solve a). > > Yeah. In general my concern was the handling of (b), which I think this > last crop of patches is fine with. As far as showing the message > repeatedly or not, I don't have a strong opinion (it could even be > configurable, once it's split from the "marker" behavior). > >> Right now one thing we do right is we always try to look at the actual >> state of the repo with too_many_packs() and too_many_loose_objects(). >> >> We don't assume the state of your repo hasn't drastically changed >> recently. That means that we do the right thing and gc when the repo >> needs it, not just because we GC'd recently enough. >> >> With these proposed semantics we'd skip a needed GC (potentially for >> days, depending on the default) just because we recently ran one. > > Yeah, I agree that deferring repeated gc's based on time is going to run > into pathological corner cases. OTOH, what we've found at GitHub is that > "gc --auto" is quite insufficient for scheduling maintenance
Re: [PATCH] gc: do not warn about too many loose objects
On Tue, Jul 17, 2018 at 10:59:50AM +0200, Ævar Arnfjörð Bjarmason wrote: > That doesn't make sense to me. Just because git itself is happily > ignoring the exit code I don't think that should mean there shouldn't be > a meaningful exit code. Why don't you just ignore the exit code in the > repo tool as well? > > Now e.g. automation I have to see if git-gc ---auto is having issues > can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet > of servers, but will need to start caring if stderr was emitted to. If you're daemonizing gc, though, there are a number of cases where the exit code is not propagated. If you really care about the outcome, you're probably better off either: - doing synchronous gc's, which will still return a meaningful code after Jonathan's patches - inspecting the log yourself. I know that comes close to the un-unixy stderr thing. But in this case, the presence of a non-empty log is literally the on-disk bit for "the previous run failed". And doing so catches all of the daemonized cases, even the "first one" that you'd miss by paying attention to the immediate exit code. This will treat the zero-exit-code "warning" case as an error. I'm not against propagating the true original error code, if somebody wants to work on it. But I think Jonathan's patches are a strict improvement over the current situation, and a patch to propagate could come on top. > I think you're conflating two things here in a way that (if I'm reading > this correctly) produces a pretty bad regression for users. > > a) If we have something in the gc.log we keep yelling at users until we > reach the gc.logExpiry, this was the subject of my thread back in > January. This sucks, and should be fixed somehow. > > Maybe we should only emit the warning once, e.g. creating a > .git/gc.log.wasemitted marker and as long as its ctime is later than > the mtime on .git/gc.log we don't emit it again). > > But that also creates the UX problem that it's easy to miss this > message in the middle of some huge "fetch" output. Perhaps we should > just start emitting this as part of "git status" or something (or > solve the root cause, as Peff notes...). I kind of like that "git status" suggestion. Of course many users run "git status" more often than "git commit", so it may actually spam them more! > b) We *also* use this presence of a gc.log as a marker for "we failed > too recently, let's not try again until after a day". > > The semantics of b) are very useful, and just because they're tied up > with a) right now, let's not throw out b) just because we're trying to > solve a). Yeah. In general my concern was the handling of (b), which I think this last crop of patches is fine with. As far as showing the message repeatedly or not, I don't have a strong opinion (it could even be configurable, once it's split from the "marker" behavior). > Right now one thing we do right is we always try to look at the actual > state of the repo with too_many_packs() and too_many_loose_objects(). > > We don't assume the state of your repo hasn't drastically changed > recently. That means that we do the right thing and gc when the repo > needs it, not just because we GC'd recently enough. > > With these proposed semantics we'd skip a needed GC (potentially for > days, depending on the default) just because we recently ran one. Yeah, I agree that deferring repeated gc's based on time is going to run into pathological corner cases. OTOH, what we've found at GitHub is that "gc --auto" is quite insufficient for scheduling maintenance anyway (e.g., if a machine gets pushes to 100 separate repositories in quick succession, you probably want to queue and throttle any associated gc). But there are probably more mid-size sites that are big enough to have corner cases, but not so big that "gc --auto" is hopeless. ;) -Peff
Re: [PATCH] gc: do not warn about too many loose objects
Jonathan Nieder writes: > That doesn't really solve the problem: > > 1. "gc --auto" would produce noise *every time it runs* until gc.log > is removed, for example via expiry > > 2. "gc --auto" would not do any garbage collection until gc.log is > removed, for example by expiry > > 3. "gc --auto" would still not ratelimit itself, for example when > there are a large number of loose unreachable objects that is not > large enough to hit the loose object threshold. > > but maybe it's better than the present state. > > To solve (1) and (2), we could introduce a gc.warnings file that > behaves like gc.log except that it only gets printed once and then > self-destructs, allowing gc --auto to proceed. That makes it not rate-limit, no? > To solve (3), we could > introduce a gc.lastrun file that is touched whenever "gc --auto" runs > successfully and make "gc --auto" a no-op for a while after each run. Does absolute time-interval make sense here? Some repositories may be busily churning new objects and for them 5-minute interval may be too infrequent, while other repositories may be relatively slow and once a day may be sufficient for them. I wonder if we can somehow auto-tune this. > To avoid wasteful repeated fruitless gcs, when gc.log is present, the > subsequent "gc --auto" would die after print its contents. Most git s/print/&ing/ > commands, such as "git fetch", ignore the exit status from "git gc > --auto" so all is well at this point: the user gets to see the error > message, and the fetch succeeds, without a wasteful additional attempt > at an automatic gc. The above, by saying "Most git commands", leaves readers want to know "then what are minority git commands that do not correctly do so?" If you are not going to answer that question, it probably is better not to even say "Most". > External tools like repo[1], though, do care about the exit status > from "git gc --auto". In non-daemonized mode, the exit status is > straightforward: if there is an error, it is nonzero, but after a > warning like the above, the status is zero. The daemonized mode, as a > side effect of the other properties provided, offers a very strange > exit code convention: > > - if no housekeeping was required, the exit status is 0 OK. > - the first real run, after forking into the background, returns exit >status 0 unconditionally. The parent process has no way to know >whether gc will succeed. Understandable; allowing the parent to exit before we know the outcome of the gc is the whole point of backgrounding. > - if there is any diagnostic output in gc.log, subsequent runs return >a nonzero exit status to indicate that gc was not triggered. This is unfortunate. > There's nothing for the calling program to act on on the basis of that > error. Use status 0 consistently instead, to indicate that we decided > not to run a gc (just like if no housekeeping was required). s/.$/ in the last case./ And I fully agree with the reasoning. > This > way, repo and similar tools can get the benefit of the same behavior > as tools like "git fetch" that ignore the exit status from gc --auto. > > Once the period of time described by gc.pruneExpire elapses, the > unreachable loose objects will be removed by "git gc --auto" > automatically. > > [1] https://gerrit-review.googlesource.com/c/git-repo/+/10598/ > > Reported-by: Andrii Dehtiarov > Helped-by: Jeff King > Signed-off-by: Jonathan Nieder > --- > Documentation/config.txt | 3 ++- > builtin/gc.c | 16 +++- > t/t6500-gc.sh| 6 +++--- > 3 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 1cc18a828c..5eaa4aaa7d 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1649,7 +1649,8 @@ will be repacked. After this the number of packs should > go below > gc.autoPackLimit and gc.bigPackThreshold should be respected again. > > gc.logExpiry:: > - If the file gc.log exists, then `git gc --auto` won't run > + If the file gc.log exists, then `git gc --auto` will print > + its content and exit with status zero instead of running > unless that file is more than 'gc.logExpiry' old. Default is > "1.day". See `gc.pruneExpire` for more ways to specify its > value. > diff --git a/builtin/gc.c b/builtin/gc.c > index 2bebc52bda..484ab21b8c 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -438,7 +438,7 @@ static const char *lock_repo_for_gc(int force, pid_t* > ret_pid) > return NULL; > } > > -static void report_last_gc_error(void) > +static int report_last_gc_error(void) > { > struct strbuf sb = STRBUF_INIT; > ssize_t ret; > @@ -449,7 +449,7 @@ static void report_last_gc_error(void) > if (errno == ENOENT) > goto done; > > - die_errno(_("cannot stat '%s'"), gc_log_path); > + return error_errno(_
Re: [PATCH] gc: do not warn about too many loose objects
On Tue, Jul 17, 2018 at 3:53 AM Jonathan Nieder wrote: > Subject: gc: do not return error for prior errors in daemonized mode > > Some build machines started failing to fetch updated source using > "repo sync", with error > > error: The last gc run reported the following. Please correct the root cause > and remove /build/.repo/projects/tools/git.git/gc.log. > Automatic cleanup will not be performed until the file is removed. > > warning: There are too many unreachable loose objects; run 'git prune' to > remove them. > > The cause takes some time to describe. > > In v2.0.0-rc0~145^2 (gc: config option for running --auto in > background, 2014-02-08), "git gc --auto" learned to run in the > background instead of blocking the invoking command. In this mode, it > closed stderr to avoid interleaving output with any subsequent > commands, causing warnings like the above to be swallowed; v2.6.3~24^2 > (gc: save log from daemonized gc --auto and print it next time, > 2015-09-19) addressed this by storing any diagnostic output in > .git/gc.log and allowing the next "git gc --auto" run to print it. > > To avoid wasteful repeated fruitless gcs, when gc.log is present, the > subsequent "gc --auto" would die after print its contents. Most git > commands, such as "git fetch", ignore the exit status from "git gc > --auto" so all is well at this point: the user gets to see the error > message, and the fetch succeeds, without a wasteful additional attempt > at an automatic gc. > > External tools like repo[1], though, do care about the exit status > from "git gc --auto". In non-daemonized mode, the exit status is > straightforward: if there is an error, it is nonzero, but after a > warning like the above, the status is zero. The daemonized mode, as a > side effect of the other properties provided, offers a very strange > exit code convention: > > - if no housekeeping was required, the exit status is 0 > > - the first real run, after forking into the background, returns exit >status 0 unconditionally. The parent process has no way to know >whether gc will succeed. > > - if there is any diagnostic output in gc.log, subsequent runs return >a nonzero exit status to indicate that gc was not triggered. > > There's nothing for the calling program to act on on the basis of that > error. Use status 0 consistently instead, to indicate that we decided > not to run a gc (just like if no housekeeping was required). This > way, repo and similar tools can get the benefit of the same behavior > as tools like "git fetch" that ignore the exit status from gc --auto. This background gc thing is pretty much designed for interactive use. When it's scripted, you probably should avoid it. Then you can fork a new process by yourself and have much better control if you still want "background" gc. So an alternative here is to turn on or off gc.autodetach based on interactiveness (i.e. having tty). We can add a new (and default) value "auto" to gc.autodetach for this purpose. In automated scripts, it will always run in non-damonized mode. You will get non-zero exit code when real errors occur. You don't worry about hanging processes. Rate limit is also thrown out in this mode if I'm not mistaken, but it could be added back and more tailored for server needs. > Once the period of time described by gc.pruneExpire elapses, the > unreachable loose objects will be removed by "git gc --auto" > automatically. > > [1] https://gerrit-review.googlesource.com/c/git-repo/+/10598/ > > Reported-by: Andrii Dehtiarov > Helped-by: Jeff King > Signed-off-by: Jonathan Nieder > --- > Documentation/config.txt | 3 ++- > builtin/gc.c | 16 +++- > t/t6500-gc.sh| 6 +++--- > 3 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 1cc18a828c..5eaa4aaa7d 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1649,7 +1649,8 @@ will be repacked. After this the number of packs should > go below > gc.autoPackLimit and gc.bigPackThreshold should be respected again. > > gc.logExpiry:: > - If the file gc.log exists, then `git gc --auto` won't run > + If the file gc.log exists, then `git gc --auto` will print > + its content and exit with status zero instead of running > unless that file is more than 'gc.logExpiry' old. Default is > "1.day". See `gc.pruneExpire` for more ways to specify its > value. > diff --git a/builtin/gc.c b/builtin/gc.c > index 2bebc52bda..484ab21b8c 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -438,7 +438,7 @@ static const char *lock_repo_for_gc(int force, pid_t* > ret_pid) > return NULL; > } > > -static void report_last_gc_error(void) > +static int report_last_gc_error(void) > { > struct strbuf sb = STRBUF_INIT; > ssize_t ret; > @@ -449,7 +449,7 @@ static void report_last_gc_error(void) > if (err
Re: [PATCH] gc: do not warn about too many loose objects
On Tue, Jul 17 2018, Jonathan Nieder wrote: > Hi Ævar, > > Ævar Arnfjörð Bjarmason wrote: >> On Tue, Jul 17 2018, Jonathan Nieder wrote: > >>> That suggests a possible improvement. If all callers should be >>> ignoring the exit code, can we make the exit code in daemonize mode >>> unconditionally zero in this kind of case? >> >> That doesn't make sense to me. Just because git itself is happily >> ignoring the exit code I don't think that should mean there shouldn't be >> a meaningful exit code. Why don't you just ignore the exit code in the >> repo tool as well? > > I don't maintain the repo tool. I am just trying to improve git to > make some users' lives better. > > repo worked fine for years, until gc's daemonized mode broke it. I > don't think it makes sense for us to declare that it has been holding > git gc wrong for all that time before then. Before the sequence of commits noted at the bottom of my https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/ plus a831c06a2b ("gc: ignore old gc.log files", 2017-02-10) we wouldn't do that, that's true. We'd just happily run GC again pointlessly even though it wasn't going to accomplish anything, in cases where you really did have ~>6700 loose objects that weren't going to be pruned. I don't think it makes sense to revert those patches and go back to the much more naïve (and user waiting there twiddling his thumbs while it runs) method, but I also don't think making no distinction between the following states: 1. gc --auto has nothing to do 2. gc --auto has something to do, will fork and try to do it 3. gc --auto has something to do, but notices that gc has been failing before and can't do anything now. I think #3 should exit with non-zero. Yes, before this whole backgrounding etc. we wouldn't have exited with non-zero either, we'd just print a thing to STDERR. But with backgrounding we implicitly introduced a new state, which is we won't do *any* maintenance at all, including consolidating packfiles (very important for servers), so I think it makes sense to exit with non-zero since gc can't run at all. >> Now e.g. automation I have to see if git-gc ---auto is having issues >> can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet >> of servers, but will need to start caring if stderr was emitted to. > > Thanks for bringing this up. The thing is, the current exit code is > not useful for that purpose. It doesn't report a failure until the > *next* `git gc --auto` run, when it is too late to be useful. > > See the commit message on the proposed patch > https://public-inbox.org/git/20180717065740.gd177...@aiede.svl.corp.google.com/ > for more on that subject. Right. I know. What I mean is now I can (and do) use it to run 'git gc --auto' across our server fleet and see whether I have any of #3, or whether it's all #1 or #2. If there's nothing to do in #1 that's fine, and it just so happens that I'll run gc due to #2 that's also fine, but I'd like to see if gc really is stuck. This of course relies on them having other users / scripts doing normal git commands which would trigger previous 'git gc --auto' runs. >> I don't care if we e.g. have a 'git gc --auto --exit-code' similar to >> what git-diff does, but it doesn't make sense to me that we *know* we >> can't background the gc due to a previous error and then always return >> 0. Having to parse STDERR to see if it *really* failed is un-unixy, >> let's use exit codes. That's what they're for. > > Do you know of anyone trying to use the exit code from gc --auto in > this way? > > It sounds to me like for what you're proposing, a separate > > git gc --auto --last-run-failed > > command that a tool could use to check the outcome from the previous > gc --auto run without triggering a new one would be best. Yeah this is admittedly a bit of a poweruser thing I'm doing, so I don't mind if it's hidden behind something like that in principle, but it seems wrong to exit with zero in this (#3) case: $ git gc --auto; echo $? Auto packing the repository in background for optimum performance. See "git help gc" for manual housekeeping. error: The last gc run reported the following. Please correct the root cause and remove .git/gc.log. Automatic cleanup will not be performed until the file is removed. warning: There are too many unreachable loose objects; run 'git prune' to remove them. 255 As a previous (good) patch in this series notes that shouldn't be 255, let's fix that, but I don't see how emitting an "error" and "warning" saying we're broken and can't gc at all here in conjunction with exiting with zero makes sense. > [...] >> I think you're conflating two things here in a way that (if I'm reading >> this correctly) produces a pretty bad regression for users. > > The patch doesn't touch the "ratelimiting" behavior at all, so I'm not > sure what I'm doing to regress users. Sorry about being unclear again, this is not a comm
Re: [PATCH] gc: do not warn about too many loose objects
Hi Ævar, Ævar Arnfjörð Bjarmason wrote: > On Tue, Jul 17 2018, Jonathan Nieder wrote: >> That suggests a possible improvement. If all callers should be >> ignoring the exit code, can we make the exit code in daemonize mode >> unconditionally zero in this kind of case? > > That doesn't make sense to me. Just because git itself is happily > ignoring the exit code I don't think that should mean there shouldn't be > a meaningful exit code. Why don't you just ignore the exit code in the > repo tool as well? I don't maintain the repo tool. I am just trying to improve git to make some users' lives better. repo worked fine for years, until gc's daemonized mode broke it. I don't think it makes sense for us to declare that it has been holding git gc wrong for all that time before then. > Now e.g. automation I have to see if git-gc ---auto is having issues > can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet > of servers, but will need to start caring if stderr was emitted to. Thanks for bringing this up. The thing is, the current exit code is not useful for that purpose. It doesn't report a failure until the *next* `git gc --auto` run, when it is too late to be useful. See the commit message on the proposed patch https://public-inbox.org/git/20180717065740.gd177...@aiede.svl.corp.google.com/ for more on that subject. > I don't care if we e.g. have a 'git gc --auto --exit-code' similar to > what git-diff does, but it doesn't make sense to me that we *know* we > can't background the gc due to a previous error and then always return > 0. Having to parse STDERR to see if it *really* failed is un-unixy, > let's use exit codes. That's what they're for. Do you know of anyone trying to use the exit code from gc --auto in this way? It sounds to me like for what you're proposing, a separate git gc --auto --last-run-failed command that a tool could use to check the outcome from the previous gc --auto run without triggering a new one would be best. [...] > I think you're conflating two things here in a way that (if I'm reading > this correctly) produces a pretty bad regression for users. The patch doesn't touch the "ratelimiting" behavior at all, so I'm not sure what I'm doing to regress users. [...] >> To solve (3), we could >> introduce a gc.lastrun file that is touched whenever "gc --auto" runs >> successfully and make "gc --auto" a no-op for a while after each run. > > This would work around my concern with b) above in most cases by > introducing a purely time-based rate limit, but I'm uneasy about this > change in git-gc semantics. [..] > With these proposed semantics we'd skip a needed GC (potentially for > days, depending on the default) just because we recently ran one. > > In many environments, such as on busy servers, we could have tens of > thousands of packs by this point, since this facility would (presumably) > bypass both gc.autoPackLimit and gc.auto in favor of only running gc at > a maximum of every N minutes, similarly in a local checkout I could have > a crapload of loose objects because I ran a big rebase or a > filter-branch on one of my branches. I think we all agree that getting rid of the hack that 'explodes' unreachable objects into loose objects is the best way forward, long term. Even in that future, some ratelimiting may be useful, though. I also suspect that we're never going to achieve a perfect set of defaults that works well for both humans and servers, though we can try. Thanks and hope that helps, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Tue, Jul 17 2018, Jonathan Nieder wrote: > Hi, > > Jeff King wrote: >> On Mon, Jul 16, 2018 at 03:56:39PM -0700, Jonathan Nieder wrote: > >>> The calling command in the motivating example is Android's "repo" tool: >>> >>> bare_git.gc('--auto') >>> >>> from https://gerrit-review.googlesource.com/c/git-repo/+/10598/. I >>> think it's reasonable that it expects a status code of 0 in the normal >>> case. So life is less simple than I hoped. >> >> IMHO it should ignore the return code, since that's what Git does >> itself. And at any rate, you'd miss errors from daemonized gc's (at >> least until the _next_ one runs and propagates the error code). I've read this whole thread, and as Peff noted I've barked up this particular tree before[1] without coming up with a solution myself. So please don't take the following as critique of any way of moving forward, I'm just trying to poke holes in what you're doing to make sure we don't have regressions to the currently (sucky) logic. > That suggests a possible improvement. If all callers should be > ignoring the exit code, can we make the exit code in daemonize mode > unconditionally zero in this kind of case? That doesn't make sense to me. Just because git itself is happily ignoring the exit code I don't think that should mean there shouldn't be a meaningful exit code. Why don't you just ignore the exit code in the repo tool as well? Now e.g. automation I have to see if git-gc ---auto is having issues can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet of servers, but will need to start caring if stderr was emitted to. I don't care if we e.g. have a 'git gc --auto --exit-code' similar to what git-diff does, but it doesn't make sense to me that we *know* we can't background the gc due to a previous error and then always return 0. Having to parse STDERR to see if it *really* failed is un-unixy, let's use exit codes. That's what they're for. > That doesn't really solve the problem: > > 1. "gc --auto" would produce noise *every time it runs* until gc.log > is removed, for example via expiry > > 2. "gc --auto" would not do any garbage collection until gc.log is > removed, for example by expiry > > 3. "gc --auto" would still not ratelimit itself, for example when > there are a large number of loose unreachable objects that is not > large enough to hit the loose object threshold. > > but maybe it's better than the present state. > > To solve (1) and (2), we could introduce a gc.warnings file that > behaves like gc.log except that it only gets printed once and then > self-destructs, allowing gc --auto to proceed. I think you're conflating two things here in a way that (if I'm reading this correctly) produces a pretty bad regression for users. a) If we have something in the gc.log we keep yelling at users until we reach the gc.logExpiry, this was the subject of my thread back in January. This sucks, and should be fixed somehow. Maybe we should only emit the warning once, e.g. creating a .git/gc.log.wasemitted marker and as long as its ctime is later than the mtime on .git/gc.log we don't emit it again). But that also creates the UX problem that it's easy to miss this message in the middle of some huge "fetch" output. Perhaps we should just start emitting this as part of "git status" or something (or solve the root cause, as Peff notes...). b) We *also* use this presence of a gc.log as a marker for "we failed too recently, let's not try again until after a day". The semantics of b) are very useful, and just because they're tied up with a) right now, let's not throw out b) just because we're trying to solve a). We have dev machines with limited I/O & CPU/memory that occasionally break the gc.auto limit, I really don't want those churning a background "git gc" on every fetch/commit etc. until we're finally below the gc.auto limit (possibly days later), which would be a side-effect of printing the .git/gc.log once and then removing it. > To solve (3), we could > introduce a gc.lastrun file that is touched whenever "gc --auto" runs > successfully and make "gc --auto" a no-op for a while after each run. This would work around my concern with b) above in most cases by introducing a purely time-based rate limit, but I'm uneasy about this change in git-gc semantics. Right now one thing we do right is we always try to look at the actual state of the repo with too_many_packs() and too_many_loose_objects(). We don't assume the state of your repo hasn't drastically changed recently. That means that we do the right thing and gc when the repo needs it, not just because we GC'd recently enough. With these proposed semantics we'd skip a needed GC (potentially for days, depending on the default) just because we recently ran one. In many environments, such as on busy servers, we could have tens of thousands of packs by this point, since this facility would (presumably) bypass b
Re: [PATCH] gc: do not warn about too many loose objects
Hi, Jeff King wrote: > On Mon, Jul 16, 2018 at 03:56:39PM -0700, Jonathan Nieder wrote: >> The calling command in the motivating example is Android's "repo" tool: >> >> bare_git.gc('--auto') >> >> from https://gerrit-review.googlesource.com/c/git-repo/+/10598/. I >> think it's reasonable that it expects a status code of 0 in the normal >> case. So life is less simple than I hoped. > > IMHO it should ignore the return code, since that's what Git does > itself. And at any rate, you'd miss errors from daemonized gc's (at > least until the _next_ one runs and propagates the error code). That suggests a possible improvement. If all callers should be ignoring the exit code, can we make the exit code in daemonize mode unconditionally zero in this kind of case? That doesn't really solve the problem: 1. "gc --auto" would produce noise *every time it runs* until gc.log is removed, for example via expiry 2. "gc --auto" would not do any garbage collection until gc.log is removed, for example by expiry 3. "gc --auto" would still not ratelimit itself, for example when there are a large number of loose unreachable objects that is not large enough to hit the loose object threshold. but maybe it's better than the present state. To solve (1) and (2), we could introduce a gc.warnings file that behaves like gc.log except that it only gets printed once and then self-destructs, allowing gc --auto to proceed. To solve (3), we could introduce a gc.lastrun file that is touched whenever "gc --auto" runs successfully and make "gc --auto" a no-op for a while after each run. -- >8 -- Subject: gc: do not return error for prior errors in daemonized mode Some build machines started failing to fetch updated source using "repo sync", with error error: The last gc run reported the following. Please correct the root cause and remove /build/.repo/projects/tools/git.git/gc.log. Automatic cleanup will not be performed until the file is removed. warning: There are too many unreachable loose objects; run 'git prune' to remove them. The cause takes some time to describe. In v2.0.0-rc0~145^2 (gc: config option for running --auto in background, 2014-02-08), "git gc --auto" learned to run in the background instead of blocking the invoking command. In this mode, it closed stderr to avoid interleaving output with any subsequent commands, causing warnings like the above to be swallowed; v2.6.3~24^2 (gc: save log from daemonized gc --auto and print it next time, 2015-09-19) addressed this by storing any diagnostic output in .git/gc.log and allowing the next "git gc --auto" run to print it. To avoid wasteful repeated fruitless gcs, when gc.log is present, the subsequent "gc --auto" would die after print its contents. Most git commands, such as "git fetch", ignore the exit status from "git gc --auto" so all is well at this point: the user gets to see the error message, and the fetch succeeds, without a wasteful additional attempt at an automatic gc. External tools like repo[1], though, do care about the exit status from "git gc --auto". In non-daemonized mode, the exit status is straightforward: if there is an error, it is nonzero, but after a warning like the above, the status is zero. The daemonized mode, as a side effect of the other properties provided, offers a very strange exit code convention: - if no housekeeping was required, the exit status is 0 - the first real run, after forking into the background, returns exit status 0 unconditionally. The parent process has no way to know whether gc will succeed. - if there is any diagnostic output in gc.log, subsequent runs return a nonzero exit status to indicate that gc was not triggered. There's nothing for the calling program to act on on the basis of that error. Use status 0 consistently instead, to indicate that we decided not to run a gc (just like if no housekeeping was required). This way, repo and similar tools can get the benefit of the same behavior as tools like "git fetch" that ignore the exit status from gc --auto. Once the period of time described by gc.pruneExpire elapses, the unreachable loose objects will be removed by "git gc --auto" automatically. [1] https://gerrit-review.googlesource.com/c/git-repo/+/10598/ Reported-by: Andrii Dehtiarov Helped-by: Jeff King Signed-off-by: Jonathan Nieder --- Documentation/config.txt | 3 ++- builtin/gc.c | 16 +++- t/t6500-gc.sh| 6 +++--- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1cc18a828c..5eaa4aaa7d 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1649,7 +1649,8 @@ will be repacked. After this the number of packs should go below gc.autoPackLimit and gc.bigPackThreshold should be respected again. gc.logExpiry:: - If the file gc.log exists, then `git gc --auto` won't run + If the file gc.log exists, th
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 03:56:39PM -0700, Jonathan Nieder wrote: > The calling command in the motivating example is Android's "repo" tool: > > bare_git.gc('--auto') > > from https://gerrit-review.googlesource.com/c/git-repo/+/10598/. I > think it's reasonable that it expects a status code of 0 in the normal > case. So life is less simple than I hoped. IMHO it should ignore the return code, since that's what Git does itself. And at any rate, you'd miss errors from daemonized gc's (at least until the _next_ one runs and propagates the error code). > Interesting! It looks like that thread anticipated the problems we've > seen here. Three years without having to have fixed it is a good run, > I suppose. > > The discussion of stopping there appears to be primarily about > stopping in the error case, not rate-limiting in the success or > warning case. I think the two are essentially the same. The point is that we cannot make further progress by re-running the gc again, so we should not. Amusingly, the warning we're talking about is the exact reason that the logging in that thread was added. 329e6e8794 (gc: save log from daemonized gc --auto and print it next time, 2015-09-19) says: The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid "gc --auto" running repeatedly. Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU. > -- >8 -- > Subject: gc: exit with status 128 on failure > > A value of -1 returned from cmd_gc gets propagated to exit(), > resulting in an exit status of 255. Use die instead for a clearer > error message and a controlled exit. I agree it's better to not pass -1 to exit(). But I thought the original motivation was the fact that we were returning non-zero in this case at all? (And I thought you and I both agreed with that motivation). So is this meant to be a preparatory fix until somebody is interested in fixing that? > -static int gc_before_repack(void) > +static void gc_before_repack(void) > { > if (pack_refs && run_command_v_opt(pack_refs_cmd.argv, RUN_GIT_CMD)) > - return error(FAILED_RUN, pack_refs_cmd.argv[0]); > + die(FAILED_RUN, pack_refs_cmd.argv[0]); > > if (prune_reflogs && run_command_v_opt(reflog.argv, RUN_GIT_CMD)) > - return error(FAILED_RUN, reflog.argv[0]); > + die(FAILED_RUN, reflog.argv[0]); Dscho is going to yell at you about replacing error returns with die(). ;) I wonder if it would be simpler to just reinterpret the "-1" into "128" in cmd_gc(). I thought we had talked about having run_builtin() do that at one point, but I don't think we ever did. I dunno. I'm fine with it either way. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 3:55 PM, Jeff King wrote: > On Mon, Jul 16, 2018 at 03:07:34PM -0700, Elijah Newren wrote: > >> > If we were to delete those objects, wouldn't it be exactly the same as >> > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5 >> > minutes"? Or are you concerned with taking other objects along for the >> > ride that weren't part of old reflogs? I think that's a valid concern, >> >> Yes, I was worried about taking other objects along for the ride that >> weren't part of old reflogs. >> >> > but it's also an issue for objects which were previously referenced in >> > a reflog, but are part of another current operation. >> >> I'm not certain what you're referring to here. > > I mean that an ongoing operation could refer to a blob that just > recently became unreachable via reflog pruning. And then we would delete > it, leaving the repository corrupt. > > One of the protections we have against that is that if I ask to write > blob XYZ and we find that we already have the object, Git will freshen > the mtime of the loose object or pack that contains it, protecting it > from pruning. But with your suggestion, we'd delete it immediately, > regardless of the mtime of the containing pack. > > Another way to think of it is this: a reflog mentioning an object does > not mean it is the exclusive user of that object. So when we expire it, > that does not mean that it is OK to delete it immediately; there may be > other users. > >> > Also, what do you do if there weren't reflogs in the repo? Or the >> > reflogs were deleted (e.g., because branch deletion drops the associated >> > reflog entirely)? >> >> Yes, there are issues this rule won't help with, but in my experience >> it was a primary (if not sole) actual cause in practice. (I believe I >> even said elsewhere in this thread that I knew there were unreachable >> objects for other reasons and they might also become large in number). >> At $DAYJOB we've had multiple people including myself hit the "too >> many unreachable loose objects" nasty loop issue (some of us multiple >> different times), and as far as I can tell, most (perhaps even all) of >> them would have been avoided by just "properly" deleting garbage as >> per my object-age-is-reflog-age-if-not-otherwise-referenced rule. > > I agree with you that this is a frequent cause, and probably even the > primary one. But my concern is that your loosening increases the risk of > corruption for other cases. > >> > I assume by "these objects" you mean ones which used to be reachable >> > from a reflog, but that reflog entry just expired. I think you'd be >> > sacrificing some race-safety in that case. >> >> Is that inherently any more race unsafe than 'git prune >> --expire=2.weeks.ago'? I thought it'd be racy in the same ways, and >> thus a tradeoff folks are already accepting (at least implicitly) when >> running git-gc. Since these objects are actually 90 days old rather >> than a mere two weeks, it actually seemed more safe to me. But maybe >> I'm overlooking something with the pack->loose transition that makes >> it more racy? > > I think it's worse in terms of race-safety because you're losing one of > the signals that users of the objects can provide to git-prune to tell > it the object is useful: updating the mtime. So yes, you think of the > objects as "90 days old", but we don't know if there are other users. > Has somebody else been accessing them in the meantime? > > To be honest, I'm not sure how meaningful that distinction is in > practice. The current scheme is still racy, even if the windows are > shorter in some circumstances. But it seems like cruft packfiles are > a similar amount of work to your scheme, cover more cases, and are > slightly safer. And possibly even give us a long-term route to true > race-free pruning (via the "garbage pack" mark that Jonathan mentioned). > > Assuming you buy into the idea that objects in a cruft-pack are not > hurting anything aside from a little wasted storage for up to 2 weeks > (which it sounds like you're at least partially on board with ;) ). Ah, I see now. Thanks (to you and Jonathan) for the thorough explanations. I'm totally on board now.
Re: [PATCH] gc: do not warn about too many loose objects
Jeff King wrote: > On Mon, Jul 16, 2018 at 03:03:06PM -0700, Jonathan Nieder wrote: >> Oh, good point. In non-daemon mode, we don't let "gc --auto" failure >> cause the invoking command to fail, but in daemon mode we do. That >> should be a straightforward fix; patch coming in a moment. > > OK, that definitely sounds like a bug. I'm still confused how that could > happen, though, since from the caller's perspective they ignore git-gc's > exit code either way. I guess I'll see in your patch. :) Alas, I just misremembered. What I was remembering is that gc --auto does if (auto_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " "run 'git prune' to remove them.")); return 0; which means that too_many_loose_objects is not an error in undaemonized mode, while it is in daemonized mode. But we've already discussed that. The calling command in the motivating example is Android's "repo" tool: bare_git.gc('--auto') from https://gerrit-review.googlesource.com/c/git-repo/+/10598/. I think it's reasonable that it expects a status code of 0 in the normal case. So life is less simple than I hoped. [...] >> Can you point me to some discussion about building that rate-limiting? >> The commit message for v2.12.2~17^2 (gc: ignore old gc.log files, >> 2017-02-10) definitely doesn't describe that as its intent. > > I think that commit is a loosening of the rate-limiting (because we'd > refuse to progress for something that was actually time-based). But the > original stopping comes from this discussion, I think: > > https://public-inbox.org/git/xmqqlhijznpm@gitster.dls.corp.google.com/ Interesting! It looks like that thread anticipated the problems we've seen here. Three years without having to have fixed it is a good run, I suppose. The discussion of stopping there appears to be primarily about stopping in the error case, not rate-limiting in the success or warning case. Here's a patch for the 'return -1' thing. -- >8 -- Subject: gc: exit with status 128 on failure A value of -1 returned from cmd_gc gets propagated to exit(), resulting in an exit status of 255. Use die instead for a clearer error message and a controlled exit. Signed-off-by: Jonathan Nieder --- builtin/gc.c | 36 t/t6500-gc.sh | 2 +- 2 files changed, 17 insertions(+), 21 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index ccfb1ceaeb..2bebc52bda 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -438,10 +438,10 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid) return NULL; } -static int report_last_gc_error(void) +static void report_last_gc_error(void) { struct strbuf sb = STRBUF_INIT; - int ret = 0; + ssize_t ret; struct stat st; char *gc_log_path = git_pathdup("gc.log"); @@ -449,16 +449,17 @@ static int report_last_gc_error(void) if (errno == ENOENT) goto done; - ret = error_errno(_("Can't stat %s"), gc_log_path); - goto done; + die_errno(_("cannot stat '%s'"), gc_log_path); } if (st.st_mtime < gc_log_expire_time) goto done; ret = strbuf_read_file(&sb, gc_log_path, 0); + if (ret < 0) + die_errno(_("cannot read '%s'"), gc_log_path); if (ret > 0) - ret = error(_("The last gc run reported the following. " + die(_("The last gc run reported the following. " "Please correct the root cause\n" "and remove %s.\n" "Automatic cleanup will not be performed " @@ -468,20 +469,18 @@ static int report_last_gc_error(void) strbuf_release(&sb); done: free(gc_log_path); - return ret; } -static int gc_before_repack(void) +static void gc_before_repack(void) { if (pack_refs && run_command_v_opt(pack_refs_cmd.argv, RUN_GIT_CMD)) - return error(FAILED_RUN, pack_refs_cmd.argv[0]); + die(FAILED_RUN, pack_refs_cmd.argv[0]); if (prune_reflogs && run_command_v_opt(reflog.argv, RUN_GIT_CMD)) - return error(FAILED_RUN, reflog.argv[0]); + die(FAILED_RUN, reflog.argv[0]); pack_refs = 0; prune_reflogs = 0; - return 0; } int cmd_gc(int argc, const char **argv, const char *prefix) @@ -562,13 +561,11 @@ int cmd_gc(int argc, const char **argv, const char *prefix) fprintf(stderr, _("See \"git help gc\" for manual housekeeping.\n")); } if (detach_auto) { - if (report_last_gc_error()) - return -1; + report_last_gc_error(); /* dies on error */ if (lock_repo_for_gc(force, &pid))
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 03:07:34PM -0700, Elijah Newren wrote: > > If we were to delete those objects, wouldn't it be exactly the same as > > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5 > > minutes"? Or are you concerned with taking other objects along for the > > ride that weren't part of old reflogs? I think that's a valid concern, > > Yes, I was worried about taking other objects along for the ride that > weren't part of old reflogs. > > > but it's also an issue for objects which were previously referenced in > > a reflog, but are part of another current operation. > > I'm not certain what you're referring to here. I mean that an ongoing operation could refer to a blob that just recently became unreachable via reflog pruning. And then we would delete it, leaving the repository corrupt. One of the protections we have against that is that if I ask to write blob XYZ and we find that we already have the object, Git will freshen the mtime of the loose object or pack that contains it, protecting it from pruning. But with your suggestion, we'd delete it immediately, regardless of the mtime of the containing pack. Another way to think of it is this: a reflog mentioning an object does not mean it is the exclusive user of that object. So when we expire it, that does not mean that it is OK to delete it immediately; there may be other users. > > Also, what do you do if there weren't reflogs in the repo? Or the > > reflogs were deleted (e.g., because branch deletion drops the associated > > reflog entirely)? > > Yes, there are issues this rule won't help with, but in my experience > it was a primary (if not sole) actual cause in practice. (I believe I > even said elsewhere in this thread that I knew there were unreachable > objects for other reasons and they might also become large in number). > At $DAYJOB we've had multiple people including myself hit the "too > many unreachable loose objects" nasty loop issue (some of us multiple > different times), and as far as I can tell, most (perhaps even all) of > them would have been avoided by just "properly" deleting garbage as > per my object-age-is-reflog-age-if-not-otherwise-referenced rule. I agree with you that this is a frequent cause, and probably even the primary one. But my concern is that your loosening increases the risk of corruption for other cases. > > I assume by "these objects" you mean ones which used to be reachable > > from a reflog, but that reflog entry just expired. I think you'd be > > sacrificing some race-safety in that case. > > Is that inherently any more race unsafe than 'git prune > --expire=2.weeks.ago'? I thought it'd be racy in the same ways, and > thus a tradeoff folks are already accepting (at least implicitly) when > running git-gc. Since these objects are actually 90 days old rather > than a mere two weeks, it actually seemed more safe to me. But maybe > I'm overlooking something with the pack->loose transition that makes > it more racy? I think it's worse in terms of race-safety because you're losing one of the signals that users of the objects can provide to git-prune to tell it the object is useful: updating the mtime. So yes, you think of the objects as "90 days old", but we don't know if there are other users. Has somebody else been accessing them in the meantime? To be honest, I'm not sure how meaningful that distinction is in practice. The current scheme is still racy, even if the windows are shorter in some circumstances. But it seems like cruft packfiles are a similar amount of work to your scheme, cover more cases, and are slightly safer. And possibly even give us a long-term route to true race-free pruning (via the "garbage pack" mark that Jonathan mentioned). Assuming you buy into the idea that objects in a cruft-pack are not hurting anything aside from a little wasted storage for up to 2 weeks (which it sounds like you're at least partially on board with ;) ). -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 03:03:06PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > I don't think any command should report failure of its _own_ operation > > if "gc --auto" failed. And grepping around the source code shows that we > > typically ignore it. > > Oh, good point. In non-daemon mode, we don't let "gc --auto" failure > cause the invoking command to fail, but in daemon mode we do. That > should be a straightforward fix; patch coming in a moment. OK, that definitely sounds like a bug. I'm still confused how that could happen, though, since from the caller's perspective they ignore git-gc's exit code either way. I guess I'll see in your patch. :) > > What I was trying to say earlier is that we _did_ build this > > rate-limiting, and I think it is a bug that the non-daemon case does not > > rate-limit (but nobody noticed, because the default is daemonizing). > > > > So the fix is not "rip out the rate-limiting in daemon mode", but rather > > "extend it to the non-daemon case". > > Can you point me to some discussion about building that rate-limiting? > The commit message for v2.12.2~17^2 (gc: ignore old gc.log files, > 2017-02-10) definitely doesn't describe that as its intent. I think that commit is a loosening of the rate-limiting (because we'd refuse to progress for something that was actually time-based). But the original stopping comes from this discussion, I think: https://public-inbox.org/git/xmqqlhijznpm@gitster.dls.corp.google.com/ (I didn't read the whole thread, but that was what I hit by blaming the log code and then tracing that back to the list). > This is the kind of review that Dscho often complains about, where > someone tries to fix something small but significant to users and gets > told to build something larger that was not their itch instead. I don't know how to say more emphatically that I am not asking anyone to build something larger (like cruft packfiles). I'm just trying to bring up an impact that the author didn't consider (and that IMHO would be a regression). Isn't that what reviews are for? I only mention packfiles because as the discussion turns to "well, all of these solutions are mediocre hacks" (because they absolutely are), it's important to realize that there _is_ a right solution, and we even already know about it. Even if we don't work on it now, knowing that it's there makes it easier to decide about the various hacks. > The comments about the "Why is 'git commit' so slow?" experience and > how having the warning helps with that are well taken. I think we > should be able to find a way to keep the warning in a v2 of this > patch. But the rest about rate-limiting and putting unreachable > objects in packs etc as a blocker for this are demoralizing, since > they gives the feeling that even if I handle the cases that are > handled today well, it will never be enough for the project unless I > solve the larger problems that were already there. I really don't know why we are having such trouble communicating. I've tried to make it clear several times that the pack thing is not something I expect your or Jonathan Tan to work on, but obviously I failed. I'd be _delighted_ if you wanted to work on it, but AFAICT this patch is purely motivated by: 1. there's a funny exit code thing going on (0 on the first run, -1 on the second) 2. the warning is not actionable by users I disagree with the second, and I think we've discussed easy solutions for the first. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 2:21 PM, Jeff King wrote: > On Mon, Jul 16, 2018 at 02:09:43PM -0700, Elijah Newren wrote: >> The point of gc is to: expire reflogs, repack referenced objects, and >> delete loose objects that (1) are no longer referenced and (2) are >> "old enough". >> >> The "old enough" bit is the special part here. Before the gc, those >> objects were referenced (only by old reflog entries) and were found in >> a pack. git currently writes those objects out to disk and gives them >> the age of the packfile they are contained in, which I think is the >> source of the bug. We have a reference for those objects from the >> reflogs, know they aren't referenced anywhere else, so those objects >> to me are the age of those reflog entries: 90 days. As such, they are >> "old enough" and should be deleted. > > OK, I see what you're saying, but... > >> I never got around to fixing it properly, though, because 'git prune' >> is such a handy workaround that for now. Having people nuke all their >> loose objects is a bit risky, but the only other workaround people had >> was to re-clone (waiting the requisite 20+ minutes for the repo to >> download) and throw away their old clone. (Though some people even >> went that route, IIRC.) > > If we were to delete those objects, wouldn't it be exactly the same as > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5 > minutes"? Or are you concerned with taking other objects along for the > ride that weren't part of old reflogs? I think that's a valid concern, Yes, I was worried about taking other objects along for the ride that weren't part of old reflogs. > but it's also an issue for objects which were previously referenced in > a reflog, but are part of another current operation. I'm not certain what you're referring to here. > Also, what do you do if there weren't reflogs in the repo? Or the > reflogs were deleted (e.g., because branch deletion drops the associated > reflog entirely)? Yes, there are issues this rule won't help with, but in my experience it was a primary (if not sole) actual cause in practice. (I believe I even said elsewhere in this thread that I knew there were unreachable objects for other reasons and they might also become large in number). At $DAYJOB we've had multiple people including myself hit the "too many unreachable loose objects" nasty loop issue (some of us multiple different times), and as far as I can tell, most (perhaps even all) of them would have been avoided by just "properly" deleting garbage as per my object-age-is-reflog-age-if-not-otherwise-referenced rule. >> With big repos, it's easy to get into situations where there are well >> more than 1 objects satisfying these conditions. In fact, it's >> not all that rare that the repo has far more loose objects after a git >> gc than before. > > Yes, this is definitely a wart and I think is worth addressing. > >> I totally agree with your general plan to put unreferenced loose >> objects into a pack. However, I don't think these objects should be >> part of that pack; they should just be deleted instead. > > I assume by "these objects" you mean ones which used to be reachable > from a reflog, but that reflog entry just expired. I think you'd be > sacrificing some race-safety in that case. Is that inherently any more race unsafe than 'git prune --expire=2.weeks.ago'? I thought it'd be racy in the same ways, and thus a tradeoff folks are already accepting (at least implicitly) when running git-gc. Since these objects are actually 90 days old rather than a mere two weeks, it actually seemed more safe to me. But maybe I'm overlooking something with the pack->loose transition that makes it more racy? > If the objects went into a pack under a race-proof scheme, would that > actually bother you? Is it the 10,000 objects that's a problem, or is it > the horrible I/O from exploding them coupled with the annoying auto-gc > behavior? Yeah, good point. It's mostly the annoying auto-gc behavior and the horrible I/O of future git operations from having lots of loose objects. They've already been paying the cost of storing those objects in packed form for 90 days; a few more won't hurt much. I'd be slightly annoyed knowing that we're storing garbage we don't need to be, but I don't think it's a real issue.
Re: [PATCH] gc: do not warn about too many loose objects
Jeff King wrote: > I don't think any command should report failure of its _own_ operation > if "gc --auto" failed. And grepping around the source code shows that we > typically ignore it. Oh, good point. In non-daemon mode, we don't let "gc --auto" failure cause the invoking command to fail, but in daemon mode we do. That should be a straightforward fix; patch coming in a moment. > On Mon, Jul 16, 2018 at 02:40:03PM -0700, Jonathan Nieder wrote: >> For comparison, in non-daemon mode, there is nothing enforcing the >> kind of ratelimiting you are talking about. It is an accident of >> history. If we want this kind of ratelimiting, I'd rather we build it >> deliberately instead of relying on this accident. > > What I was trying to say earlier is that we _did_ build this > rate-limiting, and I think it is a bug that the non-daemon case does not > rate-limit (but nobody noticed, because the default is daemonizing). > > So the fix is not "rip out the rate-limiting in daemon mode", but rather > "extend it to the non-daemon case". Can you point me to some discussion about building that rate-limiting? The commit message for v2.12.2~17^2 (gc: ignore old gc.log files, 2017-02-10) definitely doesn't describe that as its intent. This is the kind of review that Dscho often complains about, where someone tries to fix something small but significant to users and gets told to build something larger that was not their itch instead. The comments about the "Why is 'git commit' so slow?" experience and how having the warning helps with that are well taken. I think we should be able to find a way to keep the warning in a v2 of this patch. But the rest about rate-limiting and putting unreachable objects in packs etc as a blocker for this are demoralizing, since they gives the feeling that even if I handle the cases that are handled today well, it will never be enough for the project unless I solve the larger problems that were already there. Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 02:40:03PM -0700, Jonathan Nieder wrote: > > The key thing is that the presence of the warning/log still suppress > > the actual gc for the gc.logExpiry period. > > Ah, now I think I see the source of the misunderstanding. > > When I initially read the docs, I also assumed that > > If the file gc.log exists, then git gc --auto won’t run unless > that file is more than gc.logExpiry old. > > meant "git gc --auto" would be skipped (and thus silently succeed) when > the file is less than gc.logExpiry old. But that assumption was wrong: > it errors out! Right. That's the mysterious "-1" error code, and I agree that part is confusing. > This makes scripting difficult, since arbitrary commands that > incidentally run "git gc --auto" will fail in this state, until gc.log > expires (but at that point they'll fail again anyway). I don't think any command should report failure of its _own_ operation if "gc --auto" failed. And grepping around the source code shows that we typically ignore it. > For comparison, in non-daemon mode, there is nothing enforcing the > kind of ratelimiting you are talking about. It is an accident of > history. If we want this kind of ratelimiting, I'd rather we build it > deliberately instead of relying on this accident. What I was trying to say earlier is that we _did_ build this rate-limiting, and I think it is a bug that the non-daemon case does not rate-limit (but nobody noticed, because the default is daemonizing). So the fix is not "rip out the rate-limiting in daemon mode", but rather "extend it to the non-daemon case". -Peff
Re: [PATCH] gc: do not warn about too many loose objects
Jeff King wrote: > On Mon, Jul 16, 2018 at 01:37:53PM -0700, Jonathan Nieder wrote: >> Jeff King wrote: >>> With the current code, that produces a bunch of annoying warnings for >>> every commit ("I couldn't gc because the last gc reported a warning"). >>> But after Jonathan's patch, every single commit will do a full gc of the >>> repository. In this tiny repository that's relatively quick, but in a >>> real repo it's a ton of CPU and I/O, all for nothing. >> >> I see. Do I understand correctly that if we find a way to print the >> warning but not error out, that would be good enough for you? > > Yes. I thought that's what I proposed originally. ;) > > The key thing is that the presence of the warning/log still suppress > the actual gc for the gc.logExpiry period. Ah, now I think I see the source of the misunderstanding. When I initially read the docs, I also assumed that If the file gc.log exists, then git gc --auto won’t run unless that file is more than gc.logExpiry old. meant "git gc --auto" would be skipped (and thus silently succeed) when the file is less than gc.logExpiry old. But that assumption was wrong: it errors out! This makes scripting difficult, since arbitrary commands that incidentally run "git gc --auto" will fail in this state, until gc.log expires (but at that point they'll fail again anyway). For comparison, in non-daemon mode, there is nothing enforcing the kind of ratelimiting you are talking about. It is an accident of history. If we want this kind of ratelimiting, I'd rather we build it deliberately instead of relying on this accident. Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
Elijah Newren wrote: > I totally agree with your general plan to put unreferenced loose > objects into a pack. However, I don't think these objects should be > part of that pack; they should just be deleted instead. This might be the wrong thread to discuss it, but did you follow the reference/prune race that Peff mentioned? The simplest cure I'm aware of to it does involve writing those objects to a pack. The idea is to enforce a straightforward contract: There are two kinds of packs: GC and UNREACHABLE_GARBAGE. Every object in a GC pack has a minimum lifetime of (let's say "1 days") from the time they are read. If you start making use of an object from a GC pack (e.g. by creating a new object referencing it), you have three days to ensure it's referenced. Each UNREACHABLE_GARBAGE pack has a (let's say "3 days") from the time it is created. Objects in an UNREACHABLE_GARBAGE have no minimum ttl from the time they are read. If you want to start making use of an object from an UNREACHABLE_GARBAGE pack (e.g. by creating a new object referencing it), then copy it and everything it references to a GC pack. To avoid a proliferation of UNREACHABLE_GARBAGE packs, there's a rule for coalescing them, but that's not relevant here. It is perfectly possible for an object in a GC pack to reference an object in an UNREACHABLE_GARBAGE pack via writes racing with gc, but that's fine --- on the next gc run, the unreachable garbage objects get copied to a GC pack. We've been using this on a JGit DfsRepository based server for > 2 years now and it's been working well. More details are in the "Loose objects and unreachable objects" section in Documentation/technical/ mentioned before. Thanks, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 02:09:43PM -0700, Elijah Newren wrote: > >> Um, except it doesn't actually do that. The testcase I provided shows > >> that it leaves around 1 objects that are totally deletable and > >> were only previously referenced by reflog entries -- entries that gc > >> removed without removing the corresponding objects. > > > > What's your definition of "totally deletable"? > > The point of gc is to: expire reflogs, repack referenced objects, and > delete loose objects that (1) are no longer referenced and (2) are > "old enough". > > The "old enough" bit is the special part here. Before the gc, those > objects were referenced (only by old reflog entries) and were found in > a pack. git currently writes those objects out to disk and gives them > the age of the packfile they are contained in, which I think is the > source of the bug. We have a reference for those objects from the > reflogs, know they aren't referenced anywhere else, so those objects > to me are the age of those reflog entries: 90 days. As such, they are > "old enough" and should be deleted. OK, I see what you're saying, but... > I never got around to fixing it properly, though, because 'git prune' > is such a handy workaround that for now. Having people nuke all their > loose objects is a bit risky, but the only other workaround people had > was to re-clone (waiting the requisite 20+ minutes for the repo to > download) and throw away their old clone. (Though some people even > went that route, IIRC.) If we were to delete those objects, wouldn't it be exactly the same as running "git prune"? Or setting your gc.pruneExpire to "now" or even "5 minutes"? Or are you concerned with taking other objects along for the ride that weren't part of old reflogs? I think that's a valid concern, but it's also an issue for objects which were previously referenced in a reflog, but are part of another current operation. Also, what do you do if there weren't reflogs in the repo? Or the reflogs were deleted (e.g., because branch deletion drops the associated reflog entirely)? > With big repos, it's easy to get into situations where there are well > more than 1 objects satisfying these conditions. In fact, it's > not all that rare that the repo has far more loose objects after a git > gc than before. Yes, this is definitely a wart and I think is worth addressing. > I totally agree with your general plan to put unreferenced loose > objects into a pack. However, I don't think these objects should be > part of that pack; they should just be deleted instead. I assume by "these objects" you mean ones which used to be reachable from a reflog, but that reflog entry just expired. I think you'd be sacrificing some race-safety in that case. If the objects went into a pack under a race-proof scheme, would that actually bother you? Is it the 10,000 objects that's a problem, or is it the horrible I/O from exploding them coupled with the annoying auto-gc behavior? -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 01:56:46PM -0700, Jonathan Nieder wrote: > I don't believe we are very good at transitively propagating freshness > today, by the way. Perhaps I don't understand what you mean by transitive here. But we should be traversing from any fresh object and marking any non-fresh ones that are reachable from it to be saved. If you know of a case where we're not, there's a bug, and I'd be happy to look into it. The only problem I know about is the utter lack of atomicity. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 1:38 PM, Jeff King wrote: > On Mon, Jul 16, 2018 at 01:16:45PM -0700, Elijah Newren wrote: > >> >> The basic problem here, at least for us, is that gc has enough >> >> information to know it could expunge some objects, but because of how >> >> it is structured in terms of several substeps (reflog expiration, >> >> repack, prune), the information is lost between the steps and it >> >> instead writes them out as unreachable objects. If we could prune (or >> >> avoid exploding) loose objects that are only reachable from reflog >> >> entries that we are expiring, then the problem goes away for us. (I >> >> totally understand that other repos may have enough unreachable >> >> objects for other reasons that Peff's suggestion to just pack up >> >> unreachable objects is still a really good idea. But on its own, it >> >> seems like a waste since it's packing stuff that we know we could just >> >> expunge.) >> > >> > No, we should have expunged everything that could be during the "repack" >> > and "prune" steps. We feed the expiration time to repack, so that it >> > knows to drop objects entirely instead of exploding them loose. >> >> Um, except it doesn't actually do that. The testcase I provided shows >> that it leaves around 1 objects that are totally deletable and >> were only previously referenced by reflog entries -- entries that gc >> removed without removing the corresponding objects. > > What's your definition of "totally deletable"? The point of gc is to: expire reflogs, repack referenced objects, and delete loose objects that (1) are no longer referenced and (2) are "old enough". The "old enough" bit is the special part here. Before the gc, those objects were referenced (only by old reflog entries) and were found in a pack. git currently writes those objects out to disk and gives them the age of the packfile they are contained in, which I think is the source of the bug. We have a reference for those objects from the reflogs, know they aren't referenced anywhere else, so those objects to me are the age of those reflog entries: 90 days. As such, they are "old enough" and should be deleted. With big repos, it's easy to get into situations where there are well more than 1 objects satisfying these conditions. In fact, it's not all that rare that the repo has far more loose objects after a git gc than before. I never got around to fixing it properly, though, because 'git prune' is such a handy workaround that for now. Having people nuke all their loose objects is a bit risky, but the only other workaround people had was to re-clone (waiting the requisite 20+ minutes for the repo to download) and throw away their old clone. (Though some people even went that route, IIRC.) > If the pack they were in is less than 2 weeks old, then they are > unreachable but "fresh", and are intentionally retained. If it was older > than 2 weeks, they should have been deleted. That's what's currently implemented, yes, but as above I think that logic is bad. > If you don't like that policy, you can set gc.pruneExpire to something > lower (including "now", but watch out, as that can race with other > operations!). But I don't think that's an implementation short-coming. > It's intentional to keep those objects. It's just that the format > they're in is inefficient and confuses later gc runs. I'm aware of pruneExpire, but I think it's the wrong way to handle this. I totally agree with your general plan to put unreferenced loose objects into a pack. However, I don't think these objects should be part of that pack; they should just be deleted instead.
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 01:37:53PM -0700, Jonathan Nieder wrote: > >and I think the > > right solution is to pack the unreachable objects instead of exploding > > them. > > That seems like a huge widening in scope relative to what this > original patch tackled. I'm not at all saying that Jonathan needs to work on that. It's a big topic that we've been putting off for years. I _am_ saying that I don't think his patch as-is should be merged, as it makes the daemonized behavior here worse. > I'm not aware of a way to do this without > breaking compatibility with the current (broken) race prevention. If > you're saying that breaking compatibility in that way is okay with > you, then great! It depends what you're trying to fix. If you're trying to fix auto-gc outputting an annoying message (or repeatedly doing useless invocations of pack-objects), you don't have to break compatibility at all. You just have to put the objects into a pack instead of exploding them loose. Worst case if you use an old implementation to run "git gc", it may accidentally freshen a cruft pack's timestamp. If you want to improve the raciness, then yes, I think you need stronger rules around marking packs as cruft/garbage, and making operations that want to use those objects copy them out to a non-cruft location (like you describe in the hash transition document). There you risk corruption if an old implementation fails to make the non-cruft copy. But that's really no worse than we are today. So I don't even really see it as a backwards-compatibility break. But even if it were, I'd probably still be fine with it. This is a local-repo thing, and in the absolute worst case we could bump core.repositoryformatversion (though again, I don't even think that would be necessary since it would degrade to behaving just as the current code does). > > With the current code, that produces a bunch of annoying warnings for > > every commit ("I couldn't gc because the last gc reported a warning"). > > But after Jonathan's patch, every single commit will do a full gc of the > > repository. In this tiny repository that's relatively quick, but in a > > real repo it's a ton of CPU and I/O, all for nothing. > > I see. Do I understand correctly that if we find a way to print the > warning but not error out, that would be good enough for you? Yes. I thought that's what I proposed originally. ;) The key thing is that the presence of the warning/log still suppress the actual gc for the gc.logExpiry period. > >> Have you looked over the discussion in "Loose objects and unreachable > >> objects" in Documentation/technical/hash-function-transition.txt? Do > >> you have thoughts on it (preferrably in a separate thread)? > > > > It seems to propose putting the unreachable objects into a pack. Which > > yes, I absolutely agree with (as I thought I'd been saying in every > > single email in this thread). > > I figured you were proposing something like > https://public-inbox.org/git/20180113100734.ga30...@sigill.intra.peff.net/, > which is still racy (because it does not handle the freshening in a safe > way). That message is just me telling somebody that their idea _won't_ work. ;) But I assume you mean the general idea of putting things in a cruft pack[1]. Yes, I was only suggesting going that far as a solution to the auto-gc woes. Solving races is another issue entirely (though obviously it may make sense to build on the single-pack work, if it exists). [1] The best message discussion on that is I think: https://public-inbox.org/git/20170610080626.sjujpmgkli4mu...@sigill.intra.peff.net/ which I think I linked earlier, so possibly that is even what you meant. :) > What decides it for me is that the user did not invoke "git gc --auto" > explicitly, so anything "git gc --auto" prints is tangential to what > the user was trying to do. If the gc failed, that is worth telling > them, but if e.g. it encountered a disk I/O error reading and > succeeded on retry (to make up a fake example), then that's likely > worth logging to syslog but it's not something the user asked to be > directly informed about. I'm not sure I agree. If a repack discovered that you had a bit corruption on disk, but you happened to have another copy of the object available that made the repack succeed, I'd like to know that I'm getting bit corruptions, and earlier is better. I think we need to surface that information to the user eventually, and I don't think we can count on syslog being universally available. By definition we're daemonized here, so we can't count on even having access to the user's terminal. But it would make more sense to me if the logic were: - at the end of a "gc --auto" run, gc should write a machine-readable indication of its exit code (either as a final line in the log file, or to a gc.exitcode file). The warnings/errors remain intermingled in the logfile. - on the
Re: [PATCH] gc: do not warn about too many loose objects
Jeff King wrote: > On Mon, Jul 16, 2018 at 01:21:40PM -0700, Elijah Newren wrote: >> Jonathan Nieder wrote: >>> My understanding is that exploding the objects is intentional behavior, >>> to avoid a race where objects are newly referenced while they are being >>> pruned. [...] >> Ah, that's good to know and at least makes sense. It seems somewhat >> odd, though; loose objects that are two weeks old are just as >> susceptible to being referenced anew by new commits, so the default of >> running 'git prune --expire=2.weeks.ago' as gc currently does would >> also be unsafe, wouldn't it? Why is that any more or less unsafe than >> pruning objects only referenced by reflog entries that are more than >> 90 days old? Part of the answer is that this safety feature applies even when reflogs are not in use. Another part is that as you say, you can start referencing an object at the same time as the reflog entry pointing to it is expiring. [...] >That's why we retain non-fresh > objects that are referenced from fresh ones (so as long as you made the > new commit recently, it transitively infers freshness on the old blob), > and why we fresh mtimes when we elide a write for an existing object. > > That's _still_ not race-proof, because none of these operations is > atomic. Indeed. One of the advantages of using a packfile for unreachable objects is that metadata associated with that packfile can be updated atomically. I don't believe we are very good at transitively propagating freshness today, by the way. Thanks, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 01:16:45PM -0700, Elijah Newren wrote: > >> The basic problem here, at least for us, is that gc has enough > >> information to know it could expunge some objects, but because of how > >> it is structured in terms of several substeps (reflog expiration, > >> repack, prune), the information is lost between the steps and it > >> instead writes them out as unreachable objects. If we could prune (or > >> avoid exploding) loose objects that are only reachable from reflog > >> entries that we are expiring, then the problem goes away for us. (I > >> totally understand that other repos may have enough unreachable > >> objects for other reasons that Peff's suggestion to just pack up > >> unreachable objects is still a really good idea. But on its own, it > >> seems like a waste since it's packing stuff that we know we could just > >> expunge.) > > > > No, we should have expunged everything that could be during the "repack" > > and "prune" steps. We feed the expiration time to repack, so that it > > knows to drop objects entirely instead of exploding them loose. > > Um, except it doesn't actually do that. The testcase I provided shows > that it leaves around 1 objects that are totally deletable and > were only previously referenced by reflog entries -- entries that gc > removed without removing the corresponding objects. What's your definition of "totally deletable"? If the pack they were in is less than 2 weeks old, then they are unreachable but "fresh", and are intentionally retained. If it was older than 2 weeks, they should have been deleted. If you don't like that policy, you can set gc.pruneExpire to something lower (including "now", but watch out, as that can race with other operations!). But I don't think that's an implementation short-coming. It's intentional to keep those objects. It's just that the format they're in is inefficient and confuses later gc runs. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
Jeff King wrote: > On Mon, Jul 16, 2018 at 12:54:31PM -0700, Jonathan Nieder wrote: >> Even restricting attention to the warning, not the exit code, I think >> you are saying the current behavior is acceptable. I do not believe >> it is. > > I didn't say that at all. The current situation sucks, Thanks for clarifying. That helps. >and I think the > right solution is to pack the unreachable objects instead of exploding > them. That seems like a huge widening in scope relative to what this original patch tackled. I'm not aware of a way to do this without breaking compatibility with the current (broken) race prevention. If you're saying that breaking compatibility in that way is okay with you, then great! [...] >> I think you are saying Jonathan's patch makes the behavior >> worse, and I'm not seeing it. Can you describe an example user >> interaction where the current behavior would be better than the >> behavior after Jonathan's patch? That might help make this discussion >> more concrete. > > It makes it worse because there is nothing to throttle a "gc --auto" > that is making no progress. [...] > With the current code, that produces a bunch of annoying warnings for > every commit ("I couldn't gc because the last gc reported a warning"). > But after Jonathan's patch, every single commit will do a full gc of the > repository. In this tiny repository that's relatively quick, but in a > real repo it's a ton of CPU and I/O, all for nothing. I see. Do I understand correctly that if we find a way to print the warning but not error out, that would be good enough for you? [...] >> Have you looked over the discussion in "Loose objects and unreachable >> objects" in Documentation/technical/hash-function-transition.txt? Do >> you have thoughts on it (preferrably in a separate thread)? > > It seems to propose putting the unreachable objects into a pack. Which > yes, I absolutely agree with (as I thought I'd been saying in every > single email in this thread). I figured you were proposing something like https://public-inbox.org/git/20180113100734.ga30...@sigill.intra.peff.net/, which is still racy (because it does not handle the freshening in a safe way). [...] > Even if we were going to remove this message to help the > daemonized case, I think we'd want to retain it for the non-daemon case. Interesting. That should be doable, e.g. following the approach described below. [...] >> A simple way to do that without changing formats is to truncate the >> file when exiting with status 0. > > That's a different behavior than what we do now (and what was suggested > earlier), in that it assumes that anything written to stderr is OK for > gc to hide from the user if the process exits with code zero. > > That's probably OK in most cases, though I wonder if there are corner > cases. For example, if you have a corrupt ref, we used to say "error: > refs/heads/foo does not point to a valid object!" but otherwise ignore > it. These days git-gc sets REF_PARANOIA=1, so we'll actually barf on a > corrupt ref. But I wonder if there are other cases lurking. What decides it for me is that the user did not invoke "git gc --auto" explicitly, so anything "git gc --auto" prints is tangential to what the user was trying to do. If the gc failed, that is worth telling them, but if e.g. it encountered a disk I/O error reading and succeeded on retry (to make up a fake example), then that's likely worth logging to syslog but it's not something the user asked to be directly informed about. Thanks, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 01:21:40PM -0700, Elijah Newren wrote: > > My understanding is that exploding the objects is intentional behavior, > > to avoid a race where objects are newly referenced while they are being > > pruned. > > > > I am not a fan of that behavior. It's still racy. But when we've > > brought it up in the past, the consensus seems to have been that it's > > better than nothing. Documentation/technical/hash-function-transition.txt > > section "Loose objects and unreachable objects" describes a way to > > eliminate the race. > > Ah, that's good to know and at least makes sense. It seems somewhat > odd, though; loose objects that are two weeks old are just as > susceptible to being referenced anew by new commits, so the default of > running 'git prune --expire=2.weeks.ago' as gc currently does would > also be unsafe, wouldn't it? Why is that any more or less unsafe than > pruning objects only referenced by reflog entries that are more than > 90 days old? The 2-week safety isn't primarily about things which just became unreferenced. It's about things which are in the act of being referenced. Imagine a "git commit" racing with a "git prune". The commit has to create an object, and then it will update a ref to point to it. But between those two actions, prune may racily delete the object! The mtime grace period is what makes that work. Using 2 weeks is sort of ridiculous for that. But it also helps with manual recovery (e.g., imagine a blob added to the index but never committed; 3 days later you may want to try to recover your old work). And you're correct that a new git-commit may still reference an old object (e.g., a blob that's 5 seconds shy of being 2 weeks old that you're including in a new commit). That's why we retain non-fresh objects that are referenced from fresh ones (so as long as you made the new commit recently, it transitively infers freshness on the old blob), and why we fresh mtimes when we elide a write for an existing object. That's _still_ not race-proof, because none of these operations is atomic. git-prune can decide the blob is unfresh at the exact moment you're creating the commit object. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 12:54:31PM -0700, Jonathan Nieder wrote: > Even restricting attention to the warning, not the exit code, I think > you are saying the current behavior is acceptable. I do not believe > it is. I didn't say that at all. The current situation sucks, and I think the right solution is to pack the unreachable objects instead of exploding them. > I think you are saying Jonathan's patch makes the behavior > worse, and I'm not seeing it. Can you describe an example user > interaction where the current behavior would be better than the > behavior after Jonathan's patch? That might help make this discussion > more concrete. It makes it worse because there is nothing to throttle a "gc --auto" that is making no progress. Try this: -- >8 -- #!/bin/sh rm -rf repo # new mostly-empty repo git init repo cd repo git commit --allow-empty -m base # make a bunch of unreachable blobs for i in $(seq 7000); do echo "blob" echo "data < [...] > > See the thread I linked earlier on putting unreachable objects into a > > pack, which I think is the real solution. > > Have you looked over the discussion in "Loose objects and unreachable > objects" in Documentation/technical/hash-function-transition.txt? Do > you have thoughts on it (preferrably in a separate thread)? It seems to propose putting the unreachable objects into a pack. Which yes, I absolutely agree with (as I thought I'd been saying in every single email in this thread). > > I mean that if you do not write a persistent log, then "gc --auto" will > > do an unproductive gc every time it is invoked. I.e., it will see "oh, > > there are too many loose objects", and then waste a bunch of CPU every > > time you commit. > > If so, then this would already be the behavior in non daemon mode. > Are you saying this accidental effect of daemon mode is in fact > intentional? I'm not sure if I'd call it intentional, since I don't recall ever seeing this side effect discussed. But since daemonizing is the default, I think that's the case people have focused on when they hit annoying cases. E.g., a831c06a2b (gc: ignore old gc.log files, 2017-02-10). I'd consider the fact that "gc --auto" with gc.autoDetach=false will repeatedly do useless work to be a minor bug. But I think Jonathan's patch makes it even worse because we do not even tell people that useless work is being done (while making them wait for each gc to finish!). Even if we were going to remove this message to help the daemonized case, I think we'd want to retain it for the non-daemon case. > > A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune") > > with their stderr redirected into the logfile. If you want to have > > warnings go somewhere else, then either: > > > > - you need some way to tell those sub-commands to send the warnings > > elsewhere (i.e., _not_ stderr) > > > > or > > > > - you have to post-process the output they send to separate warnings > > from other errors. Right now that means scraping. If you are > > proposing a system of machine-readable output, it would need to work > > not just for git-gc, but for every sub-command it runs. > > or > > - you can simply record and trust the exit code > > A simple way to do that without changing formats is to truncate the > file when exiting with status 0. That's a different behavior than what we do now (and what was suggested earlier), in that it assumes that anything written to stderr is OK for gc to hide from the user if the process exits with code zero. That's probably OK in most cases, though I wonder if there are corner cases. For example, if you have a corrupt ref, we used to say "error: refs/heads/foo does not point to a valid object!" but otherwise ignore it. These days git-gc sets REF_PARANOIA=1, so we'll actually barf on a corrupt ref. But I wonder if there are other cases lurking. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
Hi, On Mon, Jul 16, 2018 at 12:19 PM, Jonathan Nieder wrote: > Elijah Newren wrote: > >> The basic problem here, at least for us, is that gc has enough >> information to know it could expunge some objects, but because of how >> it is structured in terms of several substeps (reflog expiration, >> repack, prune), the information is lost between the steps and it >> instead writes them out as unreachable objects. If we could prune (or >> avoid exploding) loose objects that are only reachable from reflog >> entries that we are expiring, then the problem goes away for us. > > My understanding is that exploding the objects is intentional behavior, > to avoid a race where objects are newly referenced while they are being > pruned. > > I am not a fan of that behavior. It's still racy. But when we've > brought it up in the past, the consensus seems to have been that it's > better than nothing. Documentation/technical/hash-function-transition.txt > section "Loose objects and unreachable objects" describes a way to > eliminate the race. Ah, that's good to know and at least makes sense. It seems somewhat odd, though; loose objects that are two weeks old are just as susceptible to being referenced anew by new commits, so the default of running 'git prune --expire=2.weeks.ago' as gc currently does would also be unsafe, wouldn't it? Why is that any more or less unsafe than pruning objects only referenced by reflog entries that are more than 90 days old? Elijah
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 12:52 PM, Jeff King wrote: > On Mon, Jul 16, 2018 at 12:15:05PM -0700, Elijah Newren wrote: > >> The basic problem here, at least for us, is that gc has enough >> information to know it could expunge some objects, but because of how >> it is structured in terms of several substeps (reflog expiration, >> repack, prune), the information is lost between the steps and it >> instead writes them out as unreachable objects. If we could prune (or >> avoid exploding) loose objects that are only reachable from reflog >> entries that we are expiring, then the problem goes away for us. (I >> totally understand that other repos may have enough unreachable >> objects for other reasons that Peff's suggestion to just pack up >> unreachable objects is still a really good idea. But on its own, it >> seems like a waste since it's packing stuff that we know we could just >> expunge.) > > No, we should have expunged everything that could be during the "repack" > and "prune" steps. We feed the expiration time to repack, so that it > knows to drop objects entirely instead of exploding them loose. Um, except it doesn't actually do that. The testcase I provided shows that it leaves around 1 objects that are totally deletable and were only previously referenced by reflog entries -- entries that gc removed without removing the corresponding objects. I will note that my testcase was slightly out-of-date; with current git it needs a call to 'wait_for_background_gc_to_finish' right before the 'git gc --quiet' to avoid erroring out. > You > could literally just do: > > find .git/objects/?? -type f | > perl -lne 's{../.{38}$} and print "$1$2"' | > git pack-objects .git/objects/pack/cruft-pack > > But: > > - that will explode them out only to repack them, which is inefficient > (if they're already packed, you can probably reuse deltas, not to > mention the I/O savings) > > - there's the question of how to handle timestamps. Some of those > objects may have been _about_ to expire, but now you've just put > them in a brand-new pack that adds another 2 weeks to their life > > - the find above is sloppy, and will race with somebody adding new > objects to the repo > > So probably you want to have pack-objects write out the list of objects > it _would_ explode, rather than exploding them. And then before > git-repack deletes the old packs, put those into a new cruft pack. That > _just_ leaves the timestamp issue (which is discussed at length in the > thread I linked earlier). > >> git_actual_garbage_collect() { >> GITDIR=$(git rev-parse --git-dir) >> >> # Record all revisions stored in reflog before and after gc >> git rev-list --no-walk --reflog >$GITDIR/gc.original-refs >> git gc --auto >> wait_for_background_gc_to_finish >> git rev-list --no-walk --reflog >$GITDIR/gc.final-refs >> >> # Find out which reflog entries were removed >> DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort >> $GITDIR/gc.final-refs)) > > This is too detailed, I think. There are other reasons to have > unreachable objects than expired reflogs. I think you really just want > to consider all unreachable objects (like the pack-objects thing I > mentioned above). Yes, like I said, coarse workaround and I never had time to create a real fix. But I thought the testcase might be useful as a demonstration of how git gc leaves around loose objects that were previously reference by reflogs that gc itself pruned.
Re: [PATCH] gc: do not warn about too many loose objects
Jeff King wrote: > On Mon, Jul 16, 2018 at 12:09:49PM -0700, Jonathan Nieder wrote: >> Jeff King wrote: >>> Er, the action is to run "git prune", like the warning says. :) >> >> I don't think we want to recommend that, especially when "git gc --auto" >> does the right thing automatically. > > But that's the point. This warning is written literally after running > "git gc --auto" _didn't_ do the right thing. Yes, it would be nicer if > it could do the right thing. But it doesn't yet know how to. I think we fundamentally disagree, and I would like your help getting past this impasse. Even restricting attention to the warning, not the exit code, I think you are saying the current behavior is acceptable. I do not believe it is. I think you are saying Jonathan's patch makes the behavior worse, and I'm not seeing it. Can you describe an example user interaction where the current behavior would be better than the behavior after Jonathan's patch? That might help make this discussion more concrete. [...] > See the thread I linked earlier on putting unreachable objects into a > pack, which I think is the real solution. Have you looked over the discussion in "Loose objects and unreachable objects" in Documentation/technical/hash-function-transition.txt? Do you have thoughts on it (preferrably in a separate thread)? [...] >> This sounds awful. It sounds to me like you're saying "git gc --auto" >> is saying "I just did the wrong thing, and here is how you can clean >> up after me". That's not how I want a program to behave. > > Sure, it would be nice if it did the right thing. Nobody has written > that yet. Until they do, we have to deal with the fallout. "git gc --auto" is already doing the right thing. [...] > I mean that if you do not write a persistent log, then "gc --auto" will > do an unproductive gc every time it is invoked. I.e., it will see "oh, > there are too many loose objects", and then waste a bunch of CPU every > time you commit. If so, then this would already be the behavior in non daemon mode. Are you saying this accidental effect of daemon mode is in fact intentional? [...] >>> But in practice, I think you have to >>> resort to scraping anyway, if you are interested in treating warnings >>> from sub-processes the same way. >> >> Can you say more about this for me? I am not understanding what >> you're saying necessitates scraping the output. I would strongly >> prefer to avoid scraping the output. > > A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune") > with their stderr redirected into the logfile. If you want to have > warnings go somewhere else, then either: > > - you need some way to tell those sub-commands to send the warnings > elsewhere (i.e., _not_ stderr) > > or > > - you have to post-process the output they send to separate warnings > from other errors. Right now that means scraping. If you are > proposing a system of machine-readable output, it would need to work > not just for git-gc, but for every sub-command it runs. or - you can simply record and trust the exit code A simple way to do that without changing formats is to truncate the file when exiting with status 0. Thanks, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 12:15:05PM -0700, Elijah Newren wrote: > The basic problem here, at least for us, is that gc has enough > information to know it could expunge some objects, but because of how > it is structured in terms of several substeps (reflog expiration, > repack, prune), the information is lost between the steps and it > instead writes them out as unreachable objects. If we could prune (or > avoid exploding) loose objects that are only reachable from reflog > entries that we are expiring, then the problem goes away for us. (I > totally understand that other repos may have enough unreachable > objects for other reasons that Peff's suggestion to just pack up > unreachable objects is still a really good idea. But on its own, it > seems like a waste since it's packing stuff that we know we could just > expunge.) No, we should have expunged everything that could be during the "repack" and "prune" steps. We feed the expiration time to repack, so that it knows to drop objects entirely instead of exploding them loose. So the leftovers really are objects that cannot be deleted yet. You could literally just do: find .git/objects/?? -type f | perl -lne 's{../.{38}$} and print "$1$2"' | git pack-objects .git/objects/pack/cruft-pack But: - that will explode them out only to repack them, which is inefficient (if they're already packed, you can probably reuse deltas, not to mention the I/O savings) - there's the question of how to handle timestamps. Some of those objects may have been _about_ to expire, but now you've just put them in a brand-new pack that adds another 2 weeks to their life - the find above is sloppy, and will race with somebody adding new objects to the repo So probably you want to have pack-objects write out the list of objects it _would_ explode, rather than exploding them. And then before git-repack deletes the old packs, put those into a new cruft pack. That _just_ leaves the timestamp issue (which is discussed at length in the thread I linked earlier). > git_actual_garbage_collect() { > GITDIR=$(git rev-parse --git-dir) > > # Record all revisions stored in reflog before and after gc > git rev-list --no-walk --reflog >$GITDIR/gc.original-refs > git gc --auto > wait_for_background_gc_to_finish > git rev-list --no-walk --reflog >$GITDIR/gc.final-refs > > # Find out which reflog entries were removed > DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort > $GITDIR/gc.final-refs)) This is too detailed, I think. There are other reasons to have unreachable objects than expired reflogs. I think you really just want to consider all unreachable objects (like the pack-objects thing I mentioned above). -Peff
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 12:09:49PM -0700, Jonathan Nieder wrote: > >>> So while I completely agree that it's not a great thing to encourage > >>> users to blindly run "git prune", I think it _is_ actionable. > >> > >> To flesh this out a little more: what user action do you suggest? Could > >> we carry out that action automatically? > > > > Er, the action is to run "git prune", like the warning says. :) > > I don't think we want to recommend that, especially when "git gc --auto" > does the right thing automatically. But that's the point. This warning is written literally after running "git gc --auto" _didn't_ do the right thing. Yes, it would be nicer if it could do the right thing. But it doesn't yet know how to. See the thread I linked earlier on putting unreachable objects into a pack, which I think is the real solution. > > The warning that is deleted by this patch is: you _just_ ran gc, and hey > > we even did it automatically for you, but we're still in a funky state > > afterwards. You might need to clean up this state. > > This sounds awful. It sounds to me like you're saying "git gc --auto" > is saying "I just did the wrong thing, and here is how you can clean > up after me". That's not how I want a program to behave. Sure, it would be nice if it did the right thing. Nobody has written that yet. Until they do, we have to deal with the fallout. > > If you do that without anything further, then it will break the > > protection against repeatedly running auto-gc, as I described in the > > previous email. > > Do you mean ratelimiting for the message, or do you actually mean > repeatedly running auto-gc itself? > > If we suppress warnings, there would still be a gc.log while "git gc > --auto" is running, just as though there had been no warnings at all. > I believe this is close to the intended behavior; it's the same as > what you'd get without daemon mode, except that you lose the warning. I mean that if you do not write a persistent log, then "gc --auto" will do an unproductive gc every time it is invoked. I.e., it will see "oh, there are too many loose objects", and then waste a bunch of CPU every time you commit. > > Any of those would work similarly to the "just detect warnings" I > > suggested earlier, with respect to keeping the "1 day" expiration > > intact, so I'd be OK with them. In theory they'd be more robust than > > scraping the "warning:" prefix. But in practice, I think you have to > > resort to scraping anyway, if you are interested in treating warnings > > from sub-processes the same way. > > Can you say more about this for me? I am not understanding what > you're saying necessitates scraping the output. I would strongly > prefer to avoid scraping the output. A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune") with their stderr redirected into the logfile. If you want to have warnings go somewhere else, then either: - you need some way to tell those sub-commands to send the warnings elsewhere (i.e., _not_ stderr) or - you have to post-process the output they send to separate warnings from other errors. Right now that means scraping. If you are proposing a system of machine-readable output, it would need to work not just for git-gc, but for every sub-command it runs. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
Hi, Elijah Newren wrote: > The basic problem here, at least for us, is that gc has enough > information to know it could expunge some objects, but because of how > it is structured in terms of several substeps (reflog expiration, > repack, prune), the information is lost between the steps and it > instead writes them out as unreachable objects. If we could prune (or > avoid exploding) loose objects that are only reachable from reflog > entries that we are expiring, then the problem goes away for us. My understanding is that exploding the objects is intentional behavior, to avoid a race where objects are newly referenced while they are being pruned. I am not a fan of that behavior. It's still racy. But when we've brought it up in the past, the consensus seems to have been that it's better than nothing. Documentation/technical/hash-function-transition.txt section "Loose objects and unreachable objects" describes a way to eliminate the race. Thanks and hope that helps, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 10:27 AM, Jonathan Tan wrote: > In a087cc9819 ("git-gc --auto: protect ourselves from accumulated > cruft", 2007-09-17), the user was warned if there were too many > unreachable loose objects. This made sense at the time, because gc > couldn't prune them safely. But subsequently, git prune learned the > ability to not prune recently created loose objects, making pruning able > to be done more safely, and gc was made to automatically prune old > unreachable loose objects in 25ee9731c1 ("gc: call "prune --expire > 2.weeks.ago" by default", 2008-03-12). ... > > --- > This was noticed when a daemonized gc run wrote this warning to the log > file, and returned 0; but a subsequent run merely read the log file, saw > that it is non-empty and returned -1 (which is inconsistent in that such > a run should return 0, as it did the first time). Yeah, we've hit this several times too. I even created a testcase and a workaround, though I never got it into proper submission form. The basic problem here, at least for us, is that gc has enough information to know it could expunge some objects, but because of how it is structured in terms of several substeps (reflog expiration, repack, prune), the information is lost between the steps and it instead writes them out as unreachable objects. If we could prune (or avoid exploding) loose objects that are only reachable from reflog entries that we are expiring, then the problem goes away for us. (I totally understand that other repos may have enough unreachable objects for other reasons that Peff's suggestion to just pack up unreachable objects is still a really good idea. But on its own, it seems like a waste since it's packing stuff that we know we could just expunge.) Anyway, my very rough testcase is below. The workaround is the git_actual_garbage_collect() function (minus the call to wait_for_background_gc_to_finish). Elijah --- wait_for_background_gc_to_finish() { while ( ps -ef | grep -v grep | grep --quiet git.gc.--auto ); do sleep 1; done } git_standard_garbage_collect() { # Current git gc sprays unreachable objects back in loose form; this is # fine in many cases, but is annoying when done with objects which # newly become unreachable because of something else git-gc does and # git-gc doesn't clean them up. git gc --auto wait_for_background_gc_to_finish } git_actual_garbage_collect() { GITDIR=$(git rev-parse --git-dir) # Record all revisions stored in reflog before and after gc git rev-list --no-walk --reflog >$GITDIR/gc.original-refs git gc --auto wait_for_background_gc_to_finish git rev-list --no-walk --reflog >$GITDIR/gc.final-refs # Find out which reflog entries were removed DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort $GITDIR/gc.final-refs)) # Get the list of objects which used to be reachable, but were made # unreachable due to gc's reflog expiration. To get these, I need # the intersection of things reachable from $DELETED_REFS and things # which are unreachable now. git rev-list --objects $DELETED_REFS --not --all --reflog | awk '{print $1}' >$GITDIR/gc.previously-reachable-objects git prune --expire=now --dry-run | awk '{print $1}' > $GITDIR/gc.unreachable-objects # Delete all the previously-reachable-objects made unreachable by the # reflog expiration done by git gc comm -12 <(sort $GITDIR/gc.unreachable-objects) <(sort $GITDIR/gc.previously-reachable-objects) | sed -e "s#^\(..\)#$GITDIR/objects/\1/#" | xargs rm } test -d testcase && { echo "testcase exists; exiting"; exit 1; } git init testcase/ cd testcase # Create a basic commit echo hello >world git add world git commit -q -m "Initial" # Create a commit with lots of files for i in {..}; do echo $i >$i; done git add [0-9]* git commit --quiet -m "Lots of numbers" # Pack it all up git gc --quiet # Stop referencing the garbage git reset --quiet --hard HEAD~1 # Pretend we did all the above stuff 30 days ago for rlog in $(find .git/logs -type f); do # Subtract 3E6 (just over 30 days) from every date (assuming dates have # exactly 10 digits, which just happens to be valid...right now at least) perl -i -ne '/(.*?)(\b[0-9]{10}\b)(.*)/ && print $1,$2-300,$3,"\n"' $rlog done # HOWEVER: note that the pack is new; if we make the pack old, the old objects # will get pruned for us. But it is quite common to have new packfiles with # old-and-soon-to-be-unreferenced-objects because frequent gc's mean moving # the objects to new packfiles often, and eventually the reflog is expired. # If you want to test them being part of an old packfile, uncomment this: # find .git/objects/pack -type f | xargs touch -t 21010101 # Create 50 packfiles in the current repo so that 'git gc --auto' will # trigger `git repack -A -d -l` instead of just `git repack -d -l` for i in {01..50}; do git fast-export master | sed -e s/Initial/In
Re: [PATCH] gc: do not warn about too many loose objects
Hi, Jeff King wrote: > On Mon, Jul 16, 2018 at 11:22:07AM -0700, Jonathan Nieder wrote: >> Jeff King wrote: >>> So while I completely agree that it's not a great thing to encourage >>> users to blindly run "git prune", I think it _is_ actionable. >> >> To flesh this out a little more: what user action do you suggest? Could >> we carry out that action automatically? > > Er, the action is to run "git prune", like the warning says. :) I don't think we want to recommend that, especially when "git gc --auto" does the right thing automatically. Can you convince me I'm wrong? [...] >> I mentally make a distinction between this warning from "git gc >> --auto" and the warning from commands like "git gui". [...] > I don't think those warnings are the same. The warning from "git gui" > is: you may benefit from running gc. > > The warning that is deleted by this patch is: you _just_ ran gc, and hey > we even did it automatically for you, but we're still in a funky state > afterwards. You might need to clean up this state. This sounds awful. It sounds to me like you're saying "git gc --auto" is saying "I just did the wrong thing, and here is how you can clean up after me". That's not how I want a program to behave. But there's a simpler explanation for what "git gc --auto" was trying to say, which Jonathan described. [...] >> Yes, this is a real problem, and it would affect any other warning >> (or even GIT_TRACE=1 output) produced by gc --auto as well. I think we >> should consider it a serious bug, separate from the symptom Jonathan is >> fixing. >> >> Unfortunately I don't have a great idea about how to fix it. Should >> we avoid writing warnings to gc.log in daemon mode? That would >> prevent the user from ever seeing the warnings, but because of the >> nature of a warning, that might be reasonable. > > If you do that without anything further, then it will break the > protection against repeatedly running auto-gc, as I described in the > previous email. Do you mean ratelimiting for the message, or do you actually mean repeatedly running auto-gc itself? If we suppress warnings, there would still be a gc.log while "git gc --auto" is running, just as though there had been no warnings at all. I believe this is close to the intended behavior; it's the same as what you'd get without daemon mode, except that you lose the warning. [...] >> Should we put warnings >> in a separate log file with different semantics? Should we change the >> format of gc.log to allow more structured information (include 'gc' >> exit code) and perform a two-stage migration to the new format (first >> learn to read the new format, then switch to writing it)? > > Any of those would work similarly to the "just detect warnings" I > suggested earlier, with respect to keeping the "1 day" expiration > intact, so I'd be OK with them. In theory they'd be more robust than > scraping the "warning:" prefix. But in practice, I think you have to > resort to scraping anyway, if you are interested in treating warnings > from sub-processes the same way. Can you say more about this for me? I am not understanding what you're saying necessitates scraping the output. I would strongly prefer to avoid scraping the output. Thanks, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 11:22:07AM -0700, Jonathan Nieder wrote: > > I'm not sure if this tells the whole story. You may still run into a > > case where auto-gc runs over and over during the grace period, because > > you have a bunch of objects which are not reachable and not yet ready to > > be expired. So a gc cannot make forward progress until the 2-week > > expiration, and the way to break the cycle is to run an immediate > > "prune". > > > > So while I completely agree that it's not a great thing to encourage > > users to blindly run "git prune", I think it _is_ actionable. > > To flesh this out a little more: what user action do you suggest? Could > we carry out that action automatically? Er, the action is to run "git prune", like the warning says. :) And no, I do not think we should run it automatically. It deletes objects without respect to the grace period, which may not be what the user wants. (And yes, the existing warning is terrible because it does not at all make clear that following its advice may be dangerous). > I mentally make a distinction between this warning from "git gc > --auto" and the warning from commands like "git gui". The former was > saying "Please run 'git prune', because I'm not going to do it", and > as Jonathan noticed, that's not true any more. The latter says > > This repository currently has approximately %i loose objects. > > To maintain optimal performance it is strongly recommended > that you compress the database. > > Compress the database now? > > which is relevant to the current operation (slowly reading the > repository) and easy to act on (choose "yes"). I don't think those warnings are the same. The warning from "git gui" is: you may benefit from running gc. The warning that is deleted by this patch is: you _just_ ran gc, and hey we even did it automatically for you, but we're still in a funky state afterwards. You might need to clean up this state. (As an aside, I think the git-gui warning pre-dates "gc --auto", and probably should just run that rather than implementing its own heuristic). > > I agree that the "-1" return is a little funny. Perhaps on the reading > > side we should detect that the log contains only a "warning" line and > > adjust our exit code accordingly. > > Yes, this is a real problem, and it would affect any other warning > (or even GIT_TRACE=1 output) produced by gc --auto as well. I think we > should consider it a serious bug, separate from the symptom Jonathan is > fixing. > > Unfortunately I don't have a great idea about how to fix it. Should > we avoid writing warnings to gc.log in daemon mode? That would > prevent the user from ever seeing the warnings, but because of the > nature of a warning, that might be reasonable. If you do that without anything further, then it will break the protection against repeatedly running auto-gc, as I described in the previous email. > Should we put warnings > in a separate log file with different semantics? Should we change the > format of gc.log to allow more structured information (include 'gc' > exit code) and perform a two-stage migration to the new format (first > learn to read the new format, then switch to writing it)? Any of those would work similarly to the "just detect warnings" I suggested earlier, with respect to keeping the "1 day" expiration intact, so I'd be OK with them. In theory they'd be more robust than scraping the "warning:" prefix. But in practice, I think you have to resort to scraping anyway, if you are interested in treating warnings from sub-processes the same way. -Peff
Re: [PATCH] gc: do not warn about too many loose objects
Hi, Jeff King wrote: > On Mon, Jul 16, 2018 at 10:27:17AM -0700, Jonathan Tan wrote: >> In a087cc9819 ("git-gc --auto: protect ourselves from accumulated >> cruft", 2007-09-17), the user was warned if there were too many >> unreachable loose objects. This made sense at the time, because gc >> couldn't prune them safely. But subsequently, git prune learned the >> ability to not prune recently created loose objects, making pruning able >> to be done more safely, and gc was made to automatically prune old >> unreachable loose objects in 25ee9731c1 ("gc: call "prune --expire >> 2.weeks.ago" by default", 2008-03-12). >> >> This makes the warning unactionable by the user, as any loose objects >> left are not deleted yet because of safety, and "git prune" is not a >> command that the user is recommended to run directly anyway. > > I'm not sure if this tells the whole story. You may still run into a > case where auto-gc runs over and over during the grace period, because > you have a bunch of objects which are not reachable and not yet ready to > be expired. So a gc cannot make forward progress until the 2-week > expiration, and the way to break the cycle is to run an immediate > "prune". > > So while I completely agree that it's not a great thing to encourage > users to blindly run "git prune", I think it _is_ actionable. To flesh this out a little more: what user action do you suggest? Could we carry out that action automatically? I mentally make a distinction between this warning from "git gc --auto" and the warning from commands like "git gui". The former was saying "Please run 'git prune', because I'm not going to do it", and as Jonathan noticed, that's not true any more. The latter says This repository currently has approximately %i loose objects. To maintain optimal performance it is strongly recommended that you compress the database. Compress the database now? which is relevant to the current operation (slowly reading the repository) and easy to act on (choose "yes"). [...] > I agree that the "-1" return is a little funny. Perhaps on the reading > side we should detect that the log contains only a "warning" line and > adjust our exit code accordingly. Yes, this is a real problem, and it would affect any other warning (or even GIT_TRACE=1 output) produced by gc --auto as well. I think we should consider it a serious bug, separate from the symptom Jonathan is fixing. Unfortunately I don't have a great idea about how to fix it. Should we avoid writing warnings to gc.log in daemon mode? That would prevent the user from ever seeing the warnings, but because of the nature of a warning, that might be reasonable. Should we put warnings in a separate log file with different semantics? Should we change the format of gc.log to allow more structured information (include 'gc' exit code) and perform a two-stage migration to the new format (first learn to read the new format, then switch to writing it)? Thanks, Jonathan
Re: [PATCH] gc: do not warn about too many loose objects
On Mon, Jul 16, 2018 at 10:27:17AM -0700, Jonathan Tan wrote: > In a087cc9819 ("git-gc --auto: protect ourselves from accumulated > cruft", 2007-09-17), the user was warned if there were too many > unreachable loose objects. This made sense at the time, because gc > couldn't prune them safely. But subsequently, git prune learned the > ability to not prune recently created loose objects, making pruning able > to be done more safely, and gc was made to automatically prune old > unreachable loose objects in 25ee9731c1 ("gc: call "prune --expire > 2.weeks.ago" by default", 2008-03-12). > > This makes the warning unactionable by the user, as any loose objects > left are not deleted yet because of safety, and "git prune" is not a > command that the user is recommended to run directly anyway. I'm not sure if this tells the whole story. You may still run into a case where auto-gc runs over and over during the grace period, because you have a bunch of objects which are not reachable and not yet ready to be expired. So a gc cannot make forward progress until the 2-week expiration, and the way to break the cycle is to run an immediate "prune". So while I completely agree that it's not a great thing to encourage users to blindly run "git prune", I think it _is_ actionable. But... > This was noticed when a daemonized gc run wrote this warning to the log > file, and returned 0; but a subsequent run merely read the log file, saw > that it is non-empty and returned -1 (which is inconsistent in that such > a run should return 0, as it did the first time). Yes, this got much worse with daemonized gc. The warning blocks further runs. And then even after the 2-week period, you still cannot make further progress until the user steps in! I think we dealt with that in a831c06a2b (gc: ignore old gc.log files, 2017-02-10). So now we won't run gc for a day, but eventually we may make further progress. So the warning _is_ still doing something useful there (it's blocking immediate auto-gc and kicking in the 1-day threshold). I agree that the "-1" return is a little funny. Perhaps on the reading side we should detect that the log contains only a "warning" line and adjust our exit code accordingly. Ultimately, I think Git should avoid keeping unreachable objects as loose in the first place, which would make this whole problem go away. There's some discussion in this thread: https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/ -Peff