Re: gc and repack ignore .git/*HEAD when checking reachability
On Wed, Jul 13, 2016 at 8:59 PM, Johannes Schindelinwrote: >> No. > > Oh? So you do not see that we are already heading into serious trouble > ourselves? I do see problems, that have solutions. But I have not yet seen that we are heading to a dead end. >> I prefer we have a single interface for dealing with _any_ worktree. > > I agree. So far, I did not see an interface, though, but the idea that we > should somehow fake things so that there does not *have* to be an > interface. Calling it "fake" is a bit too strong. I'd call it an abstraction. We have always had the one-repo/one-worktree relationship, now we're breaking that assumption. Some operations work at repo level and require a "repo view" while others work at worktree level. We do it in a way that a program designed with these separate views can still work correctly in the traditional one-worktree-one-repo configuration, where both views are the same. - For accessing data in $GIT_DIR, you do not peek directly into it any more. You use "git rev-parse --git-path" to retrieve a path in $GIT_DIR (instead of doing `git rev-parse --git-dir`/some/path), then you can do something about it. This is the equivalent of git_path() at C level. - We have a set of rules to define what part of $GIT_DIR is shared and what is not. When you add new stuff and follow this rule, it will work in single or multiple worktree config. So far pretty much every unknown thing is per-worktree. $GIT_DIR/common will be the shared place for external scripts, soon. - It's the same thing for refs: we may reserve a portion of it for per-worktree, and the rest is shared. - We provide means for one worktree to access data of any other worktree if needed (e.g. $GIT_COMMON_DIR, or the new ref storage API) - Because the majority of operations is per-worktree, that has been the default view so far when working in multiple worktrees. Repo-level operations such as git-gc, rev-list --all, fsck... need to "switch view". I missed this and this seemed to cause big problem for you. My only excuse is, this is an experimental feature. The idea of single config file with separate "worktree namespace" (eg. core.worktree vs worktree..worktree) was shot down because it would result in a lot of changes. And it's the same thing why we start to split views this way (and make default view "per-worktree") instead of rewriting every piece of code to deal with multiple worktrees explicitly (by carrying the worktree id everywhere). There are two loose ends that are on my mind, but I have not clear ideas yet: the incorporation of ref namespace and odb alternates. Imagine when you you submodules, all submodule repos (refs and odb) can be stored in the top-level .git repo, separated by namespaces. After all a repo is just a container, of a bunch of refs. This make it much easier to peek into another submodule from a supermodule, and makes it safer to "rm -r a-submodule" when you get mad at something. I hope this shows that it is less of a fake thing. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Wed, 13 Jul 2016, Duy Nguyen wrote: > On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelin >wrote: > > > > On Tue, 12 Jul 2016, Duy Nguyen wrote: > > > >> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King wrote: > >> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: > >> > > >> >> I'm not opposed to letting one worktree see everything, but this move > >> >> makes it harder to write new scripts (or new builtin commands, even) > >> >> that works with both single and multiple worktrees because you refer > >> >> to one ref (in current worktree perspective) differently. If we kill > >> >> of the main worktree (i.e. git init always creates a linked worktree) > >> >> then it's less of a problem, but still a nuisance to write > >> >> refs/worktree/$CURRENT/ everywhere. > >> > > >> > True. I gave a suggestion for the reading side, but the writing side > >> > would still remain tedious. > >> > > >> > I wonder if, in a worktree, we could simply convert requests to read or > >> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? > >> > That makes it a read/write-time alias conversion, but the actual storage > >> > is just vanilla (so the ref storage doesn't need to care, and > >> > reachability just works). > >> > >> A conversion like that is already happening, but it works at > >> git_path() level instead and maps anything outside refs/ to > >> worktrees/$CURRENT. > > > > Wouldn't you agree that the entire discussion goes into a direction that > > reveals that it might simply be a better idea to require commands that want > > to have per-worktree refs to do that explicitly? > > No. Oh? So you do not see that we are already heading into serious trouble ourselves? > I prefer we have a single interface for dealing with _any_ worktree. I agree. So far, I did not see an interface, though, but the idea that we should somehow fake things so that there does not *have* to be an interface. > > The same holds true for the config, BTW. I really have no love for the > > idea to make the config per-worktree. It just holds too many nasty > > opportunities for violate the Law of Least Surprises. > > > > Just to name one: imagine you check out a different branch in worktree A, > > then switch worktree B to the branch that A had, and all of a sudden you > > may end up with a different upstream! > > Everything in moderation. You wouldn't want to enable sparse checkout > on one worktree and it suddenly affects all worktrees because > core.sparsecheckout is shared. And submodules are known not to work > when core.worktree is still shared. We have precendence for config variables that are global but also allow per- overrides. Think e.g. the http.* variables. The point is: this solution still uses *one* config per repo. > I will not enforce any rule (unless it's very obvious that the other > way is wrong, like core.worktree). I will give you a rifle and you can > either hunt for food or shoot your foot. In other words, you should be > able to share everything if you like it that way while someone else > can share just some config vars, or even nothing in config file. You gave me a rifle alright, and I shot into my foot (by losing objects to auto-gc). I just did not expect it to be a rifle. To keep with the analogy: let's not build arms, but a kick-ass tool. And I seriously disagree that per-worktree refs, reflogs or config are part of said tool. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelinwrote: > Hi Duy, > > On Tue, 12 Jul 2016, Duy Nguyen wrote: > >> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King wrote: >> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: >> > >> >> I'm not opposed to letting one worktree see everything, but this move >> >> makes it harder to write new scripts (or new builtin commands, even) >> >> that works with both single and multiple worktrees because you refer >> >> to one ref (in current worktree perspective) differently. If we kill >> >> of the main worktree (i.e. git init always creates a linked worktree) >> >> then it's less of a problem, but still a nuisance to write >> >> refs/worktree/$CURRENT/ everywhere. >> > >> > True. I gave a suggestion for the reading side, but the writing side >> > would still remain tedious. >> > >> > I wonder if, in a worktree, we could simply convert requests to read or >> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? >> > That makes it a read/write-time alias conversion, but the actual storage >> > is just vanilla (so the ref storage doesn't need to care, and >> > reachability just works). >> >> A conversion like that is already happening, but it works at >> git_path() level instead and maps anything outside refs/ to >> worktrees/$CURRENT. > > Wouldn't you agree that the entire discussion goes into a direction that > reveals that it might simply be a better idea to require commands that want > to have per-worktree refs to do that explicitly? No. To me that's equivalent to let people deal explicitly with file-based and lmdb refs backends everywhere. Unless the main worktree concept will die (I doubt it) it may remain the common use case that people care about and extra worktrees become second citizen that's rarely tested. I prefer we have a single interface for dealing with _any_ worktree. If there are fallouts, we deal with them. > The same holds true for the config, BTW. I really have no love for the > idea to make the config per-worktree. It just holds too many nasty > opportunities for violate the Law of Least Surprises. > > Just to name one: imagine you check out a different branch in worktree A, > then switch worktree B to the branch that A had, and all of a sudden you > may end up with a different upstream! Everything in moderation. You wouldn't want to enable sparse checkout on one worktree and it suddenly affects all worktrees because core.sparsecheckout is shared. And submodules are known not to work when core.worktree is still shared. I will not enforce any rule (unless it's very obvious that the other way is wrong, like core.worktree). I will give you a rifle and you can either hunt for food or shoot your foot. In other words, you should be able to share everything if you like it that way while someone else can share just some config vars, or even nothing in config file. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Tue, 12 Jul 2016, Duy Nguyen wrote: > On Tue, Jul 12, 2016 at 5:51 PM, Jeff Kingwrote: > > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: > > > >> I'm not opposed to letting one worktree see everything, but this move > >> makes it harder to write new scripts (or new builtin commands, even) > >> that works with both single and multiple worktrees because you refer > >> to one ref (in current worktree perspective) differently. If we kill > >> of the main worktree (i.e. git init always creates a linked worktree) > >> then it's less of a problem, but still a nuisance to write > >> refs/worktree/$CURRENT/ everywhere. > > > > True. I gave a suggestion for the reading side, but the writing side > > would still remain tedious. > > > > I wonder if, in a worktree, we could simply convert requests to read or > > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? > > That makes it a read/write-time alias conversion, but the actual storage > > is just vanilla (so the ref storage doesn't need to care, and > > reachability just works). > > A conversion like that is already happening, but it works at > git_path() level instead and maps anything outside refs/ to > worktrees/$CURRENT. Wouldn't you agree that the entire discussion goes into a direction that reveals that it might simply be a better idea to require commands that want to have per-worktree refs to do that explicitly? I mean, it looks to me that the harder we try to avoid that, the more problems crop up, some of that as serious as my reported data loss. I do not see any indication that trying even harder to "protect" commands from knowing that they are running in one of many worktrees is making things easier. To the contrary, I expect that direction to hold many more awful surprises for us. The same holds true for the config, BTW. I really have no love for the idea to make the config per-worktree. It just holds too many nasty opportunities for violate the Law of Least Surprises. Just to name one: imagine you check out a different branch in worktree A, then switch worktree B to the branch that A had, and all of a sudden you may end up with a different upstream! Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Tue, Jul 12, 2016 at 5:51 PM, Jeff Kingwrote: > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: > >> I'm not opposed to letting one worktree see everything, but this move >> makes it harder to write new scripts (or new builtin commands, even) >> that works with both single and multiple worktrees because you refer >> to one ref (in current worktree perspective) differently. If we kill >> of the main worktree (i.e. git init always creates a linked worktree) >> then it's less of a problem, but still a nuisance to write >> refs/worktree/$CURRENT/ everywhere. > > True. I gave a suggestion for the reading side, but the writing side > would still remain tedious. > > I wonder if, in a worktree, we could simply convert requests to read or > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? > That makes it a read/write-time alias conversion, but the actual storage > is just vanilla (so the ref storage doesn't need to care, and > reachability just works). A conversion like that is already happening, but it works at git_path() level instead and maps anything outside refs/ to worktrees/$CURRENT. Reorganizing all refs in a single ref storage is probably possible, but... > The trickiest thing, I think, is FETCH_HEAD, which is not really a > ref (because it may have a bunch of values, and contain extra > information). Yeah.. I think David and Junio touched this when lmdb backend was discussed, which resulted in leaving per-worktree refs to filesystem backend even when shared refs are in lmdb. We probably can still make it work, I think refs subsystem has special case for FETCH_HEAD already. But 'refs' stuff is really not my area, I should stop writing now before making too many wrong statements. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote: > I'm not opposed to letting one worktree see everything, but this move > makes it harder to write new scripts (or new builtin commands, even) > that works with both single and multiple worktrees because you refer > to one ref (in current worktree perspective) differently. If we kill > of the main worktree (i.e. git init always creates a linked worktree) > then it's less of a problem, but still a nuisance to write > refs/worktree/$CURRENT/ everywhere. True. I gave a suggestion for the reading side, but the writing side would still remain tedious. I wonder if, in a worktree, we could simply convert requests to read or write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"? That makes it a read/write-time alias conversion, but the actual storage is just vanilla (so the ref storage doesn't need to care, and reachability just works). The trickiest thing, I think, is FETCH_HEAD, which is not really a ref (because it may have a bunch of values, and contain extra information). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Tue, Jul 12, 2016 at 5:26 PM, Jeff Kingwrote: > Likewise for other per-worktree items. If we used refs/MERGE_HEAD and > refs/worktree/foo/MERGE_HEAD, then you could access them independently > by using the fully qualified names. I'm not opposed to letting one worktree see everything, but this move makes it harder to write new scripts (or new builtin commands, even) that works with both single and multiple worktrees because you refer to one ref (in current worktree perspective) differently. If we kill of the main worktree (i.e. git init always creates a linked worktree) then it's less of a problem, but still a nuisance to write refs/worktree/$CURRENT/ everywhere. > The only downside I see is that the existing names are sometimes > well-known. I wonder if we could simply add: > > refs/worktree//%s > > to the dwim ref-lookup when a command is running in a worktree. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Tue, Jul 12, 2016 at 12:47:06PM +0200, Johannes Schindelin wrote: > > Not so fast; it cuts both ways. > > > > People who want multiple worktrees with branches checked out to work > > in would want to do per-worktree things like bisection, which needs > > tons more state than we'd be comfortable having directly under > > $GIT_DIR (e.g. they may also want "git merge" or "git pull", which > > would use MERGE_HEAD and FETCH_HEAD that are per-worktree and not > > under refs/; "git bisect" would want to mark number of commits to > > denote the perimeter of the area of the history being bisected and > > they live refs/bisect/). > > Sure, `git bisect` would need to realize that it is running in a worktree > separate from the original one and use a different ref. I am mostly a bystander in all of these worktree discussions, but your comment here makes a lot of sense to me. We currently have a global ref namespace, but the current proposals seem to want to slice it on invisible lines into per-worktree and global bits, where "refs/bisect" is no longer a global name. But we could also retain a global view, and just let worktrees carve out their own portion of the namespace ("refs/worktree/foo/bisect", or even organize it by application area, "refs/bisect/foo/bad", etc). Besides being conceptually simpler in the code (global reachability Just Works, because you see all of the refs), it would also let you access individual refs between worktrees if you wanted. So for example, if you are bisecting in worktree "foo", you can access its results from another worktree with "git show bisect/foo/bad". Likewise for other per-worktree items. If we used refs/MERGE_HEAD and refs/worktree/foo/MERGE_HEAD, then you could access them independently by using the fully qualified names. The only downside I see is that the existing names are sometimes well-known. I wonder if we could simply add: refs/worktree//%s to the dwim ref-lookup when a command is running in a worktree. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Junio, On Mon, 11 Jul 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > >> No, the point is, refs subsystem needs to know which refs is per-repo, > >> which is per-worktree. So far the rules are "everything under refs, > >> except a few known exceptions, is per-repo" and "everything directly > >> under $GIT_DIR is per-worktree", which work fine. But if you allow to > >> move per-worktree to "refs" freely, then the "known exceptions" will > >> have to be updated every time a new per-worktree ref appears. It'll be > >> easier to modify the first rule as "everything under refs, except some > >> legacy exceptions and refs/worktree, is per-repo". > > > > Given the substantial pain and suffering I have due to per-worktree > > reflogs (and those are *just* HEAD's reflogs!), it appears to me that > > per-worktree refs would be a particularly poor feature. > > > > I agree that HEAD needs to be per-worktree, but already the fact that the > > HEADs of the worktrees, along with their reflogs, are *not* visible to > > all other worktrees causes substantial trouble. > > Not so fast; it cuts both ways. > > People who want multiple worktrees with branches checked out to work > in would want to do per-worktree things like bisection, which needs > tons more state than we'd be comfortable having directly under > $GIT_DIR (e.g. they may also want "git merge" or "git pull", which > would use MERGE_HEAD and FETCH_HEAD that are per-worktree and not > under refs/; "git bisect" would want to mark number of commits to > denote the perimeter of the area of the history being bisected and > they live refs/bisect/). Sure, `git bisect` would need to realize that it is running in a worktree separate from the original one and use a different ref. > And when you are bisecting in the worktree dedicated for a topic, > it is a feature that your other worktrees do not need to see how > much history you narrowed down in that topic. If you intentionally hide bisections from other worktrees, you will invariably end up with the same problems I faced with auto-gc: in a worktree, it is *much, much easier* to forget a bisect in progress. In other words, your comments make me even more certain that per-worktree refs are undesirable. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Johannes Schindelinwrites: >> No, the point is, refs subsystem needs to know which refs is per-repo, >> which is per-worktree. So far the rules are "everything under refs, >> except a few known exceptions, is per-repo" and "everything directly >> under $GIT_DIR is per-worktree", which work fine. But if you allow to >> move per-worktree to "refs" freely, then the "known exceptions" will >> have to be updated every time a new per-worktree ref appears. It'll be >> easier to modify the first rule as "everything under refs, except some >> legacy exceptions and refs/worktree, is per-repo". > > Given the substantial pain and suffering I have due to per-worktree > reflogs (and those are *just* HEAD's reflogs!), it appears to me that > per-worktree refs would be a particularly poor feature. > > I agree that HEAD needs to be per-worktree, but already the fact that the > HEADs of the worktrees, along with their reflogs, are *not* visible to > all other worktrees causes substantial trouble. Not so fast; it cuts both ways. People who want multiple worktrees with branches checked out to work in would want to do per-worktree things like bisection, which needs tons more state than we'd be comfortable having directly under $GIT_DIR (e.g. they may also want "git merge" or "git pull", which would use MERGE_HEAD and FETCH_HEAD that are per-worktree and not under refs/; "git bisect" would want to mark number of commits to denote the perimeter of the area of the history being bisected and they live refs/bisect/). And when you are bisecting in the worktree dedicated for a topic, it is a feature that your other worktrees do not need to see how much history you narrowed down in that topic. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Sun, 10 Jul 2016, Duy Nguyen wrote: > We will be able to see refs from all worktrees if we choose to. What I tried to say is: even if we make it technically feasible to have per-worktree refs or reflogs, the downsides are too prohibitive. We should simply not introduce support for that (and resolve the per-worktree HEAD issue accordingly). Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Sun, Jul 10, 2016 at 4:16 PM, Johannes Schindelinwrote: >> No, the point is, refs subsystem needs to know which refs is per-repo, >> which is per-worktree. So far the rules are "everything under refs, >> except a few known exceptions, is per-repo" and "everything directly >> under $GIT_DIR is per-worktree", which work fine. But if you allow to >> move per-worktree to "refs" freely, then the "known exceptions" will >> have to be updated every time a new per-worktree ref appears. It'll be >> easier to modify the first rule as "everything under refs, except some >> legacy exceptions and refs/worktree, is per-repo". > > Given the substantial pain and suffering I have due to per-worktree > reflogs (and those are *just* HEAD's reflogs!), I apologize for that. I clearly did not see all the consequences of my simple-minded approach. Watch out for the shared config issue too if you start to rely on multiple worktrees. It may take even longer time to address than this one. > it appears to me that per-worktree refs would be a particularly poor feature. > > I agree that HEAD needs to be per-worktree, but already the fact that the > HEADs of the worktrees, along with their reflogs, are *not* visible to > all other worktrees causes substantial trouble. > > In my mind, a much more natural design would have been to map > transparently each worktree's HEAD to refs/worktree//HEAD *and keep > those refs and reflogs visible across all worktrees*. We will be able to see refs from all worktrees if we choose to. There is no question about that. It's needed for full-repo operations, gc included. Whether we expose all worktree namespaces via refs/worktree (as a logical view, not storage one), remains to be seen. Michael's new ref API may be flexible enough that we could do without. I'm not sure yet. > With that design, I would not have irretrievably lost reflogs to auto-gc > nor dozens of hours trying to figure out how to have *any* auto-gc succeed > again (because I need to reduce the horrible slowness incurred by too many > loose objects). My interactive rebases would also not have started to > print the auto-gc error after every friggin' pick. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Sun, 10 Jul 2016, Duy Nguyen wrote: > On Sun, Jul 10, 2016 at 12:59 PM, Johannes Schindelin >wrote: > >> >> Now, how about special-casing *just* these legacy files in gc: HEAD, > >> >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs > >> >> should > >> >> live in the refs/ namespace, which is already handled. > >> > > >> > That seems workable as well; in that case, we should also document this > >> > (in the git-gc manpage at a minimum), and explicitly suggest creating > >> > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than > >> > directly in .git/. > >> > >> Not just outside refs/heads and refs/tags. It has to be in a specified > >> namespace like refs/worktree/ or something (we are close to be ready > >> for that). We could update the man page about git-gc shortcomings now, > >> but I think we should wait until refs/worktree (or something like > >> that) becomes true before suggesting more. > > > > We have a precedent for a ref that is directly underneath refs/: > > refs/stash. > > > > IMO that is okay: depending on the use case, we would need multiple refs > > (like refs/notes/*) or a single ref (like refs/stash). > > > > The important part is that the new refs start with refs/, and if they are > > to be transient, start neither with refs/heads/ nor with refs/tags/. > > No, the point is, refs subsystem needs to know which refs is per-repo, > which is per-worktree. So far the rules are "everything under refs, > except a few known exceptions, is per-repo" and "everything directly > under $GIT_DIR is per-worktree", which work fine. But if you allow to > move per-worktree to "refs" freely, then the "known exceptions" will > have to be updated every time a new per-worktree ref appears. It'll be > easier to modify the first rule as "everything under refs, except some > legacy exceptions and refs/worktree, is per-repo". Given the substantial pain and suffering I have due to per-worktree reflogs (and those are *just* HEAD's reflogs!), it appears to me that per-worktree refs would be a particularly poor feature. I agree that HEAD needs to be per-worktree, but already the fact that the HEADs of the worktrees, along with their reflogs, are *not* visible to all other worktrees causes substantial trouble. In my mind, a much more natural design would have been to map transparently each worktree's HEAD to refs/worktree//HEAD *and keep those refs and reflogs visible across all worktrees*. With that design, I would not have irretrievably lost reflogs to auto-gc nor dozens of hours trying to figure out how to have *any* auto-gc succeed again (because I need to reduce the horrible slowness incurred by too many loose objects). My interactive rebases would also not have started to print the auto-gc error after every friggin' pick. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Sun, Jul 10, 2016 at 12:59 PM, Johannes Schindelinwrote: >> >> Now, how about special-casing *just* these legacy files in gc: HEAD, >> >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should >> >> live in the refs/ namespace, which is already handled. >> > >> > That seems workable as well; in that case, we should also document this >> > (in the git-gc manpage at a minimum), and explicitly suggest creating >> > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than >> > directly in .git/. >> >> Not just outside refs/heads and refs/tags. It has to be in a specified >> namespace like refs/worktree/ or something (we are close to be ready >> for that). We could update the man page about git-gc shortcomings now, >> but I think we should wait until refs/worktree (or something like >> that) becomes true before suggesting more. > > We have a precedent for a ref that is directly underneath refs/: > refs/stash. > > IMO that is okay: depending on the use case, we would need multiple refs > (like refs/notes/*) or a single ref (like refs/stash). > > The important part is that the new refs start with refs/, and if they are > to be transient, start neither with refs/heads/ nor with refs/tags/. No, the point is, refs subsystem needs to know which refs is per-repo, which is per-worktree. So far the rules are "everything under refs, except a few known exceptions, is per-repo" and "everything directly under $GIT_DIR is per-worktree", which work fine. But if you allow to move per-worktree to "refs" freely, then the "known exceptions" will have to be updated every time a new per-worktree ref appears. It'll be easier to modify the first rule as "everything under refs, except some legacy exceptions and refs/worktree, is per-repo". -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Duy, On Sat, 9 Jul 2016, Duy Nguyen wrote: > On Sat, Jul 9, 2016 at 4:09 PM, Josh Triplettwrote: > > On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote: > >> On Fri, 8 Jul 2016, Junio C Hamano wrote: > >> > Josh Triplett writes: > >> > > >> > > That sounds reasonable. And if they *do* end up taking any time to > >> > > traverse, it's because they weren't reachable from other anchoring > >> > > points, so taking the extra time to traverse them seems fine. > >> > > >> > The only thing that is hard is to clearly define _what_ are the new > >> > anchoring points. > >> > > >> > It cannot be "anything directly under .git that has all-caps name > >> > that ends with _HEAD". The ones we write we know are going to be > >> > removed at some point in time (e.g. "git reset", "git bisect reset", > >> > "git merge --abort", etc.). We do not have any control on random > >> > ones that the users and third-party tools leave behind, holding onto > >> > irrelevant objects forever. > >> > >> Please note that bisect already uses the (transient) refs/bisect/ > >> namespace. So I do not think we need to take specific care of the > >> BISECT_* files. > >> > >> If we had thought of it back then, we could have used such a transient > >> namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached > >> HEADs (which we should have called "unnamed branches"). > >> > >> Now, how about special-casing *just* these legacy files in gc: HEAD, > >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should > >> live in the refs/ namespace, which is already handled. > > > > That seems workable as well; in that case, we should also document this > > (in the git-gc manpage at a minimum), and explicitly suggest creating > > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than > > directly in .git/. > > Not just outside refs/heads and refs/tags. It has to be in a specified > namespace like refs/worktree/ or something (we are close to be ready > for that). We could update the man page about git-gc shortcomings now, > but I think we should wait until refs/worktree (or something like > that) becomes true before suggesting more. We have a precedent for a ref that is directly underneath refs/: refs/stash. IMO that is okay: depending on the use case, we would need multiple refs (like refs/notes/*) or a single ref (like refs/stash). The important part is that the new refs start with refs/, and if they are to be transient, start neither with refs/heads/ nor with refs/tags/. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Sat, Jul 9, 2016 at 4:09 PM, Josh Triplettwrote: > On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote: >> On Fri, 8 Jul 2016, Junio C Hamano wrote: >> > Josh Triplett writes: >> > >> > > That sounds reasonable. And if they *do* end up taking any time to >> > > traverse, it's because they weren't reachable from other anchoring >> > > points, so taking the extra time to traverse them seems fine. >> > >> > The only thing that is hard is to clearly define _what_ are the new >> > anchoring points. >> > >> > It cannot be "anything directly under .git that has all-caps name >> > that ends with _HEAD". The ones we write we know are going to be >> > removed at some point in time (e.g. "git reset", "git bisect reset", >> > "git merge --abort", etc.). We do not have any control on random >> > ones that the users and third-party tools leave behind, holding onto >> > irrelevant objects forever. >> >> Please note that bisect already uses the (transient) refs/bisect/ >> namespace. So I do not think we need to take specific care of the >> BISECT_* files. >> >> If we had thought of it back then, we could have used such a transient >> namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached >> HEADs (which we should have called "unnamed branches"). >> >> Now, how about special-casing *just* these legacy files in gc: HEAD, >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should >> live in the refs/ namespace, which is already handled. > > That seems workable as well; in that case, we should also document this > (in the git-gc manpage at a minimum), and explicitly suggest creating > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than > directly in .git/. Not just outside refs/heads and refs/tags. It has to be in a specified namespace like refs/worktree/ or something (we are close to be ready for that). We could update the man page about git-gc shortcomings now, but I think we should wait until refs/worktree (or something like that) becomes true before suggesting more. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote: > On Fri, 8 Jul 2016, Junio C Hamano wrote: > > Josh Triplettwrites: > > > > > That sounds reasonable. And if they *do* end up taking any time to > > > traverse, it's because they weren't reachable from other anchoring > > > points, so taking the extra time to traverse them seems fine. > > > > The only thing that is hard is to clearly define _what_ are the new > > anchoring points. > > > > It cannot be "anything directly under .git that has all-caps name > > that ends with _HEAD". The ones we write we know are going to be > > removed at some point in time (e.g. "git reset", "git bisect reset", > > "git merge --abort", etc.). We do not have any control on random > > ones that the users and third-party tools leave behind, holding onto > > irrelevant objects forever. > > Please note that bisect already uses the (transient) refs/bisect/ > namespace. So I do not think we need to take specific care of the > BISECT_* files. > > If we had thought of it back then, we could have used such a transient > namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached > HEADs (which we should have called "unnamed branches"). > > Now, how about special-casing *just* these legacy files in gc: HEAD, > FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should > live in the refs/ namespace, which is already handled. That seems workable as well; in that case, we should also document this (in the git-gc manpage at a minimum), and explicitly suggest creating refs in refs/ but outside of refs/heads/ and refs/tags/, rather than directly in .git/. > BTW this issue is getting much more problematic when you have a lot of > worktrees, some of which operate on detached HEADs. Which I do. > > Ciao, > Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Hi Junio, On Fri, 8 Jul 2016, Junio C Hamano wrote: > Josh Triplettwrites: > > > That sounds reasonable. And if they *do* end up taking any time to > > traverse, it's because they weren't reachable from other anchoring > > points, so taking the extra time to traverse them seems fine. > > The only thing that is hard is to clearly define _what_ are the new > anchoring points. > > It cannot be "anything directly under .git that has all-caps name > that ends with _HEAD". The ones we write we know are going to be > removed at some point in time (e.g. "git reset", "git bisect reset", > "git merge --abort", etc.). We do not have any control on random > ones that the users and third-party tools leave behind, holding onto > irrelevant objects forever. Please note that bisect already uses the (transient) refs/bisect/ namespace. So I do not think we need to take specific care of the BISECT_* files. If we had thought of it back then, we could have used such a transient namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached HEADs (which we should have called "unnamed branches"). Now, how about special-casing *just* these legacy files in gc: HEAD, FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should live in the refs/ namespace, which is already handled. BTW this issue is getting much more problematic when you have a lot of worktrees, some of which operate on detached HEADs. Which I do. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Fri, Jul 08, 2016 at 07:50:53PM -0400, Theodore Ts'o wrote: > On Fri, Jul 08, 2016 at 01:30:06PM -0700, Junio C Hamano wrote: > > > > I can imagine I'd start hacking on a project that I rarely touch, give up > > resolving a "git pull" from an unconfigured place (hence, some stuff is > > only reachable from FETCH_HEAD), and after 2*N days, come back > > to the repository and find that I cannot continue working on it. > > Sure, but that's something that could happen today, and no one has > really complained, have they? Until now. :) > > Turning the rule to "*_HEAD we know about, and those we don't that > > are young" would not change the situation, as I may be depending on > > some third-party tool that uses its OWN_HEAD just like we use > > FETCH_HEAD in the above example. > > > > So I dunno if that is a good solution. If we are going to declare that > > transient stuff will now be kept, i.e. keeping them alive is no longer > > end user's responsibility, then probably we should make it end user's > > responsibility to clean things up. > > Well, the question is what does "transient" stuff really mean? If we > keep them forever, then are they really any different from stuff under > refs/heads? No, they're not. And I don't think they should be; HEAD itself is *not* transient, as a detached HEAD can reference valuable non-transient objects. In an ideal world, HEAD and all other such refs would live somewhere under refs, to avoid the special case. > Maybe pester the user if there is stale *_HEAD files, but don't > actually get rid of the objects? Why pester at all? Just leave them, and if the user has large objects they don't care about and wants to decrease the size of the repository, they can follow the advice from the git-gc manpage: "If you are expecting some objects to be collected and they aren’t, check all of those locations and decide whether it makes sense in your case to remove those references." (Where "those locations" will need to expand to mention "*HEAD".) (Also, I'd suggest "*HEAD", possibly limited to characters matching [-_A-Z], rather than "*_HEAD".) - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Fri, Jul 08, 2016 at 01:30:06PM -0700, Junio C Hamano wrote: > > I can imagine I'd start hacking on a project that I rarely touch, give up > resolving a "git pull" from an unconfigured place (hence, some stuff is > only reachable from FETCH_HEAD), and after 2*N days, come back > to the repository and find that I cannot continue working on it. Sure, but that's something that could happen today, and no one has really complained, have they? > Turning the rule to "*_HEAD we know about, and those we don't that > are young" would not change the situation, as I may be depending on > some third-party tool that uses its OWN_HEAD just like we use > FETCH_HEAD in the above example. > > So I dunno if that is a good solution. If we are going to declare that > transient stuff will now be kept, i.e. keeping them alive is no longer > end user's responsibility, then probably we should make it end user's > responsibility to clean things up. Well, the question is what does "transient" stuff really mean? If we keep them forever, then are they really any different from stuff under refs/heads? Maybe pester the user if there is stale *_HEAD files, but don't actually get rid of the objects? - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Fri, Jul 8, 2016 at 12:25 PM, Theodore Ts'owrote: > On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote: >> >> It cannot be "anything directly under .git that has all-caps name >> that ends with _HEAD". The ones we write we know are going to be >> removed at some point in time (e.g. "git reset", "git bisect reset", >> "git merge --abort", etc.). We do not have any control on random >> ones that the users and third-party tools leave behind, holding onto >> irrelevant objects forever. > > What about anything that is all-caps and ends in _HEAD which has a > mod-time within the last N days? (Where N is 2-7 days.) If it's > older than that, it's almost certainly stale... I can imagine I'd start hacking on a project that I rarely touch, give up resolving a "git pull" from an unconfigured place (hence, some stuff is only reachable from FETCH_HEAD), and after 2*N days, come back to the repository and find that I cannot continue working on it. Turning the rule to "*_HEAD we know about, and those we don't that are young" would not change the situation, as I may be depending on some third-party tool that uses its OWN_HEAD just like we use FETCH_HEAD in the above example. So I dunno if that is a good solution. If we are going to declare that transient stuff will now be kept, i.e. keeping them alive is no longer end user's responsibility, then probably we should make it end user's responsibility to clean things up. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote: > Josh Triplettwrites: > > > That sounds reasonable. And if they *do* end up taking any time to > > traverse, it's because they weren't reachable from other anchoring > > points, so taking the extra time to traverse them seems fine. > > The only thing that is hard is to clearly define _what_ are the new > anchoring points. > > It cannot be "anything directly under .git that has all-caps name > that ends with _HEAD". The ones we write we know are going to be > removed at some point in time (e.g. "git reset", "git bisect reset", > "git merge --abort", etc.). We do not have any control on random > ones that the users and third-party tools leave behind, holding onto > irrelevant objects forever. "We don't understand it, so it must not be important" does not seem like a safe approach. An object matching "[_A-Z]*HEAD" might act more like a detached HEAD, referencing objects the user wants to hold onto permanently. And even FETCH_HEAD might point to objects the user expected to stick around until they chose to ignore them; what makes FETCH_HEAD a less legitimate head to work with than refs/heads/scratch? (Or, for that matter, what about an ugly merge in MERGE_HEAD that the user put off until after a vacation?) The only downside of "holding onto irrelevant objects forever" is storage space, which the user can easily reclaim by removing HEADs they don't want. The downside of pruning objects the user wanted is data loss. Why not just mark all those objects as roots for reachability permanently, and let the user remove them if they want to prune those objects? - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote: > > It cannot be "anything directly under .git that has all-caps name > that ends with _HEAD". The ones we write we know are going to be > removed at some point in time (e.g. "git reset", "git bisect reset", > "git merge --abort", etc.). We do not have any control on random > ones that the users and third-party tools leave behind, holding onto > irrelevant objects forever. What about anything that is all-caps and ends in _HEAD which has a mod-time within the last N days? (Where N is 2-7 days.) If it's older than that, it's almost certainly stale... - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Josh Triplettwrites: > That sounds reasonable. And if they *do* end up taking any time to > traverse, it's because they weren't reachable from other anchoring > points, so taking the extra time to traverse them seems fine. The only thing that is hard is to clearly define _what_ are the new anchoring points. It cannot be "anything directly under .git that has all-caps name that ends with _HEAD". The ones we write we know are going to be removed at some point in time (e.g. "git reset", "git bisect reset", "git merge --abort", etc.). We do not have any control on random ones that the users and third-party tools leave behind, holding onto irrelevant objects forever. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
On Thu, Jul 07, 2016 at 09:34:02PM -0700, Junio C Hamano wrote: > Josh Triplettwrites: > > This could result in data loss, if a user expected that having an object > > referenced from those places would protect it from pruning. > > Yeah, luckily, nobody expects such. I do not think any of our > document says nothing other than HEAD like CHERRY_PICK_HEAD is > reachability anchoring point; they are designed to be transient. I can imagine at least one scenario that would result in data loss here: git pull a URL (not referenced via any ref other than FETCH_HEAD/MERGE_HEAD), get a merge conflict, get halfway through resolving it, set that repository aside for a while, do something that triggers a gc, then attempt to finish and commit. Unlikely, but not impossible. Same reason the reachability logic looks at the index. (I originally encountered this because I intended to add another HEAD-like ref in .git, so I started investigating the logic around such HEADs.) > Because they are designed to be transient, I do not think there is > any downside (other than the initial start-up cost) to including > them in reachability computation. Because they are meant to be > transient, the objects anchored by them would be reachable from > other anchoring points anyway. That sounds reasonable. And if they *do* end up taking any time to traverse, it's because they weren't reachable from other anchoring points, so taking the extra time to traverse them seems fine. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gc and repack ignore .git/*HEAD when checking reachability
Josh Triplettwrites: > This could result in data loss, if a user expected that having an object > referenced from those places would protect it from pruning. Yeah, luckily, nobody expects such. I do not think any of our document says nothing other than HEAD like CHERRY_PICK_HEAD is reachability anchoring point; they are designed to be transient. Because they are designed to be transient, I do not think there is any downside (other than the initial start-up cost) to including them in reachability computation. Because they are meant to be transient, the objects anchored by them would be reachable from other anchoring points anyway. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html