Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Sat, Jan 21, 2017 at 2:17 AM, Jeff King wrote: > On Thu, Jan 19, 2017 at 11:30:23AM -0800, Stefan Beller wrote: > >> Now let's ask the same question for "git -C sub status ." (which is a >> command that is only reading and not writing to the repository) >> >> 1) If the submodule is populated, the user clearly intended to know >>more about the submodules status >> 2) It is unclear if the user wanted to learn about the submodules state >>(So ideally: "The submodule 'sub' is not initialized. To init ...") >>or the status check should be applied to the superproject instead. >> >> Avoid the confusion in 2) as well and just error out for now. Later on >> we may want to add another flag to git.c to allow commands to be run >> inside unpopulated submodules and each command reacts appropriately. > > I like the general idea of catching commands in unpopulated submodules, > but I'm somewhat uncomfortable with putting an unconditional check into > git.c, for two reasons: > > 1. Reading the index can be expensive. You would not want "git > rev-parse" to incur this cost. > > 2. How does this interact with commands which do interact with the > index? Don't they expect to find the_index unpopulated? > > (I notice that it's effectively tied to RUN_SETUP, which is good. > But that also means that many commands, like "diff", won't get the > benefit. Not to mention non-builtins). > > I'd rather see it in the commands themselves. Especially given the > "ideal" in your status example, which requires command-specific > knowledge. I agree. It's already bad enough for pathspec code to peek into the index, adding a hidden dependency between parse_pathspec() and read_cache(). And I still think parse_pathspec() is not the right place to check submodule paths. Worktree should be checked as well, in the case that the submodule is not yet registered in the index. The right place to do that is per-command, with their consent so to speak, because they may need to set things up (index, .git/config and stuff) properly before explicitly doing this check. -- Duy
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 1:58 PM, Junio C Hamano wrote: > Junio C Hamano writes: > >> Jeff King writes: >> And in my current understanding of submodules the check in .gitmodules ought to be enough, too. >>> >>> Yeah, that probably makes sense. You can have a gitlink without a >>> .gitmodules file, but I don't quite know what that would mean in terms >>> of submodules (I guess it's not a submodule but "something else"). >> >> That may be a lot better than reading the index unconditionally, but >> I'd rather not to see "git rev-parse" read ".gitmodules" at all. It >> would discourage scripted use of Git for no good reason. > > Thinking about this more, I suspect that > > cd sub && git anything > > when the index of the top-level thinks "sub" must be a submodule and > the user is not interested in "sub" (hence it hasn't gone through > "git submodule init" or "update") should get the same error as you > would get if you did > > cd /var/tmp/ && git anything > > when none of /, /var, /var/tmp/ is controlled by any Git repository. > I.e. "fatal: Not a git repository". I agree. The idea with a tombstone sounds great from a performance perspective as you do not need to do extra work in the superproject at all, because any gitlink is detected early in the discovery. The big BUT is however the following: How do current users know if a submodule is e.g. populated? (From say a third party script). Most likely they use something like test -e ${sub}/.git as that just worked. So if we go with the tombstone idea, we may break things. (i.e. the fictional third party script confirms any submodule to be there, but it is not) I do really like the idea though, so maybe we also need to provide some submodule plumbing that we opine to be the "correct" way to see the submodules state[1] to make the transition easier for the script writers? [1] c.f. submodule states in https://github.com/gitster/git/commit/e2b51b9df618ceeff7c4ec044e20f5ce9a87241e > > Perhaps we can update two things and make it cheap. > > - checking out the top-level working tree without populating the >working tree of a submodule learns to do a bit more than just >creating an empty directory. Instead, it creates the third kind >of ".git" (we currently support two kinds of ".git", one that is >a repository itself, and another that is points at a repository), >that tells us that there is not (yet) a repository there. > > - the "discovering the root of the working tree" logic learns to >notice the third kind of ".git" and stop with "Not a git >repository".
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 01:58:02PM -0800, Junio C Hamano wrote: > Junio C Hamano writes: > > > Jeff King writes: > > > >>> And in my current understanding of submodules the check in > >>> .gitmodules ought to be enough, too. > >> > >> Yeah, that probably makes sense. You can have a gitlink without a > >> .gitmodules file, but I don't quite know what that would mean in terms > >> of submodules (I guess it's not a submodule but "something else"). > > > > That may be a lot better than reading the index unconditionally, but > > I'd rather not to see "git rev-parse" read ".gitmodules" at all. It > > would discourage scripted use of Git for no good reason. > > Thinking about this more, I suspect that > > cd sub && git anything > > when the index of the top-level thinks "sub" must be a submodule and > the user is not interested in "sub" (hence it hasn't gone through > "git submodule init" or "update") should get the same error as you > would get if you did > > cd /var/tmp/ && git anything > > when none of /, /var, /var/tmp/ is controlled by any Git repository. > I.e. "fatal: Not a git repository". Not sure if our emails just crossed, but yes, I agree completely. > Perhaps we can update two things and make it cheap. > > - checking out the top-level working tree without populating the >working tree of a submodule learns to do a bit more than just >creating an empty directory. Instead, it creates the third kind >of ".git" (we currently support two kinds of ".git", one that is >a repository itself, and another that is points at a repository), >that tells us that there is not (yet) a repository there. > > - the "discovering the root of the working tree" logic learns to >notice the third kind of ".git" and stop with "Not a git >repository". Yeah, I thought about suggesting something like that earlier. It's slightly more efficient than the "find the root and then complain" thing Stefan and I were talking about, but I'd worry it comes with weird corner cases. E.g., are there tools (or other parts of git) that care about it literally being an empty directory? E.g., would parts of "checkout" need to know that it's OK to blow it away?
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
Junio C Hamano writes: > Jeff King writes: > >>> And in my current understanding of submodules the check in >>> .gitmodules ought to be enough, too. >> >> Yeah, that probably makes sense. You can have a gitlink without a >> .gitmodules file, but I don't quite know what that would mean in terms >> of submodules (I guess it's not a submodule but "something else"). > > That may be a lot better than reading the index unconditionally, but > I'd rather not to see "git rev-parse" read ".gitmodules" at all. It > would discourage scripted use of Git for no good reason. Thinking about this more, I suspect that cd sub && git anything when the index of the top-level thinks "sub" must be a submodule and the user is not interested in "sub" (hence it hasn't gone through "git submodule init" or "update") should get the same error as you would get if you did cd /var/tmp/ && git anything when none of /, /var, /var/tmp/ is controlled by any Git repository. I.e. "fatal: Not a git repository". Perhaps we can update two things and make it cheap. - checking out the top-level working tree without populating the working tree of a submodule learns to do a bit more than just creating an empty directory. Instead, it creates the third kind of ".git" (we currently support two kinds of ".git", one that is a repository itself, and another that is points at a repository), that tells us that there is not (yet) a repository there. - the "discovering the root of the working tree" logic learns to notice the third kind of ".git" and stop with "Not a git repository".
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 01:41:54PM -0800, Junio C Hamano wrote: > Jeff King writes: > > >> And in my current understanding of submodules the check in > >> .gitmodules ought to be enough, too. > > > > Yeah, that probably makes sense. You can have a gitlink without a > > .gitmodules file, but I don't quite know what that would mean in terms > > of submodules (I guess it's not a submodule but "something else"). > > That may be a lot better than reading the index unconditionally, but > I'd rather not to see "git rev-parse" read ".gitmodules" at all. It > would discourage scripted use of Git for no good reason. Why is that? Just because it makes rev-parse seem more bloated? I think Stefan's putting it into git.c is confusing the issue a bit. This is fundamentally about figuring out which git repository we're in, and that procedure is the right place to put the check. IOW, when we call setup_git_repository() we are already walking up the tree and looking at .git/HEAD, .git/config, etc to see if we are in a valid git repository. It doesn't seem unreasonable to me to make this part of that check. I.e.: - if we we walked up from the working tree (so we have a non-NULL prefix); and - if there is a .gitmodules file; and - if the .gitmodules file shows that we were inside what _should_ have been a submodule; then - complain and do not accept the outer repository as a valid repo. That adds only an extra failed open() for people who do not use submodules, and an extra config-file parse for people who do. And then only when they are not in the top-level of the working tree (so scripts, etc that cd_to_toplevel wouldn't pay per-invocation). -Peff
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
Jeff King writes: >> And in my current understanding of submodules the check in >> .gitmodules ought to be enough, too. > > Yeah, that probably makes sense. You can have a gitlink without a > .gitmodules file, but I don't quite know what that would mean in terms > of submodules (I guess it's not a submodule but "something else"). That may be a lot better than reading the index unconditionally, but I'd rather not to see "git rev-parse" read ".gitmodules" at all. It would discourage scripted use of Git for no good reason.
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 12:00 PM, Jeff King wrote: > On Fri, Jan 20, 2017 at 11:53:01AM -0800, Stefan Beller wrote: > >> > One alternative would be to make the check cheaper. Could we reliably >> > tell from the submodule.foo.* block in the config that path "foo" is a >> > submodule? I think that would work after "submodule init" but not right >> > after "git clone". So the index really is the source of truth there. >> >> Well we can check if there is a .gitmodules file that has a >> submodule.*.path equal to the last part of $CWD, no need to look >> at the git config. >> >> And that would also work right after git clone (in an >> unpopulated/uninitialized submodule as I call it). >> >> And in my current understanding of submodules the check in >> .gitmodules ought to be enough, too. > > Yeah, that probably makes sense. You can have a gitlink without a > .gitmodules file, but I don't quite know what that would mean in terms > of submodules (I guess it's not a submodule but "something else"). yeah, I agree it could be git series[1] at work, or as you said "something else", and we have no idea what to do. I think this could actually be implemented top-down, because the check is cheap as we're beginning with lstat(.gitmodules), and no further pursue checking this corner case in case the .gitmodules is not found. I'll see if I can make a patch that passes the test suite. [1] https://github.com/git-series/git-series/blob/master/INTERNALS.md > > -Peff
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 11:53:01AM -0800, Stefan Beller wrote: > > One alternative would be to make the check cheaper. Could we reliably > > tell from the submodule.foo.* block in the config that path "foo" is a > > submodule? I think that would work after "submodule init" but not right > > after "git clone". So the index really is the source of truth there. > > Well we can check if there is a .gitmodules file that has a > submodule.*.path equal to the last part of $CWD, no need to look > at the git config. > > And that would also work right after git clone (in an > unpopulated/uninitialized submodule as I call it). > > And in my current understanding of submodules the check in > .gitmodules ought to be enough, too. Yeah, that probably makes sense. You can have a gitlink without a .gitmodules file, but I don't quite know what that would mean in terms of submodules (I guess it's not a submodule but "something else"). -Peff
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 11:42 AM, Jeff King wrote: > On Fri, Jan 20, 2017 at 11:33:45AM -0800, Stefan Beller wrote: > >> > I'd rather see it in the commands themselves. Especially given the >> > "ideal" in your status example, which requires command-specific >> > knowledge. >> >> So you rather want to go bottom up, i.e. add it to each command individually >> for which it makes sense, instead of rather first having a catch-it-all like >> this and then we can have a flag similar to RUN_SETUP, e.g. >> ALLOW_IN_UNPOP_SUBMODULE, which allows commands to >> take over the responsibility to act responsibly in this case? > > Yes. I know it's "less safe" in the sense that commands have to make an > effort to detect the situation, but I feel like only they'll know what > the sensible behavior is. And they can also do the check at a time when > they would be reading the index anyway. > >> status may be the first command for going that route; I wonder if we'd >> want to add this feature unconditionally or only in the porcelain case. >> (In plumbing you're supposed to know what you're doing... so there is >> no need as well as our promise to not change it) > > Yeah. The reason that it would be so painful to load the index > for every rev-parse is not just that it probably doesn't otherwise need > the index, but that scripts may make a _ton_ of rev-parse (or other > plumbing) calls. > > One alternative would be to make the check cheaper. Could we reliably > tell from the submodule.foo.* block in the config that path "foo" is a > submodule? I think that would work after "submodule init" but not right > after "git clone". So the index really is the source of truth there. Well we can check if there is a .gitmodules file that has a submodule.*.path equal to the last part of $CWD, no need to look at the git config. And that would also work right after git clone (in an unpopulated/uninitialized submodule as I call it). And in my current understanding of submodules the check in .gitmodules ought to be enough, too. > > I guess there could be an index extension "these are the gitlinks I > contain" and in theory we could read just that extension. I dunno. > > -Peff
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 11:33:45AM -0800, Stefan Beller wrote: > > I'd rather see it in the commands themselves. Especially given the > > "ideal" in your status example, which requires command-specific > > knowledge. > > So you rather want to go bottom up, i.e. add it to each command individually > for which it makes sense, instead of rather first having a catch-it-all like > this and then we can have a flag similar to RUN_SETUP, e.g. > ALLOW_IN_UNPOP_SUBMODULE, which allows commands to > take over the responsibility to act responsibly in this case? Yes. I know it's "less safe" in the sense that commands have to make an effort to detect the situation, but I feel like only they'll know what the sensible behavior is. And they can also do the check at a time when they would be reading the index anyway. > status may be the first command for going that route; I wonder if we'd > want to add this feature unconditionally or only in the porcelain case. > (In plumbing you're supposed to know what you're doing... so there is > no need as well as our promise to not change it) Yeah. The reason that it would be so painful to load the index for every rev-parse is not just that it probably doesn't otherwise need the index, but that scripts may make a _ton_ of rev-parse (or other plumbing) calls. One alternative would be to make the check cheaper. Could we reliably tell from the submodule.foo.* block in the config that path "foo" is a submodule? I think that would work after "submodule init" but not right after "git clone". So the index really is the source of truth there. I guess there could be an index extension "these are the gitlinks I contain" and in theory we could read just that extension. I dunno. -Peff
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Fri, Jan 20, 2017 at 11:17 AM, Jeff King wrote: > On Thu, Jan 19, 2017 at 11:30:23AM -0800, Stefan Beller wrote: > >> Now let's ask the same question for "git -C sub status ." (which is a >> command that is only reading and not writing to the repository) >> >> 1) If the submodule is populated, the user clearly intended to know >>more about the submodules status >> 2) It is unclear if the user wanted to learn about the submodules state >>(So ideally: "The submodule 'sub' is not initialized. To init ...") >>or the status check should be applied to the superproject instead. >> >> Avoid the confusion in 2) as well and just error out for now. Later on >> we may want to add another flag to git.c to allow commands to be run >> inside unpopulated submodules and each command reacts appropriately. > > I like the general idea of catching commands in unpopulated submodules, > but I'm somewhat uncomfortable with putting an unconditional check into > git.c, for two reasons: > > 1. Reading the index can be expensive. You would not want "git > rev-parse" to incur this cost. Well, I would want rev-parse to not be run in the wrong repo. (intended to rev-parse something in the submodule, but got results for the superproject). Talking about rev-parse, I was about to propose an extension in reply to "[PATCH] git-prompt.sh: add submodule indicator" that rev-parse could learn a flag similar to --show-toplevel, named: --show-superproject-if-any or --indicate-if-in-submodule-possibly which would help out there. > > 2. How does this interact with commands which do interact with the > index? Don't they expect to find the_index unpopulated? That is another sloppiness in this RFC patch, as I haven't nailed down the corner cases yet. > > (I notice that it's effectively tied to RUN_SETUP, which is good. > But that also means that many commands, like "diff", won't get the > benefit. Not to mention non-builtins). > > I'd rather see it in the commands themselves. Especially given the > "ideal" in your status example, which requires command-specific > knowledge. So you rather want to go bottom up, i.e. add it to each command individually for which it makes sense, instead of rather first having a catch-it-all like this and then we can have a flag similar to RUN_SETUP, e.g. ALLOW_IN_UNPOP_SUBMODULE, which allows commands to take over the responsibility to act responsibly in this case? status may be the first command for going that route; I wonder if we'd want to add this feature unconditionally or only in the porcelain case. (In plumbing you're supposed to know what you're doing... so there is no need as well as our promise to not change it) Thanks, Stefan > > -Peff
Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.
On Thu, Jan 19, 2017 at 11:30:23AM -0800, Stefan Beller wrote: > Now let's ask the same question for "git -C sub status ." (which is a > command that is only reading and not writing to the repository) > > 1) If the submodule is populated, the user clearly intended to know >more about the submodules status > 2) It is unclear if the user wanted to learn about the submodules state >(So ideally: "The submodule 'sub' is not initialized. To init ...") >or the status check should be applied to the superproject instead. > > Avoid the confusion in 2) as well and just error out for now. Later on > we may want to add another flag to git.c to allow commands to be run > inside unpopulated submodules and each command reacts appropriately. I like the general idea of catching commands in unpopulated submodules, but I'm somewhat uncomfortable with putting an unconditional check into git.c, for two reasons: 1. Reading the index can be expensive. You would not want "git rev-parse" to incur this cost. 2. How does this interact with commands which do interact with the index? Don't they expect to find the_index unpopulated? (I notice that it's effectively tied to RUN_SETUP, which is good. But that also means that many commands, like "diff", won't get the benefit. Not to mention non-builtins). I'd rather see it in the commands themselves. Especially given the "ideal" in your status example, which requires command-specific knowledge. -Peff