Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-21 Thread Duy Nguyen
On Sat, Jan 21, 2017 at 2:17 AM, Jeff King  wrote:
> On Thu, Jan 19, 2017 at 11:30:23AM -0800, Stefan Beller wrote:
>
>> Now let's ask the same question for "git -C sub status ." (which is a
>> command that is only reading and not writing to the repository)
>>
>> 1) If the submodule is populated, the user clearly intended to know
>>more about the submodules status
>> 2) It is unclear if the user wanted to learn about the submodules state
>>(So ideally: "The submodule 'sub' is not initialized. To init ...")
>>or the status check should be applied to the superproject instead.
>>
>> Avoid the confusion in 2) as well and just error out for now. Later on
>> we may want to add another flag to git.c to allow commands to be run
>> inside unpopulated submodules and each command reacts appropriately.
>
> I like the general idea of catching commands in unpopulated submodules,
> but I'm somewhat uncomfortable with putting an unconditional check into
> git.c, for two reasons:
>
>   1. Reading the index can be expensive. You would not want "git
>  rev-parse" to incur this cost.
>
>   2. How does this interact with commands which do interact with the
>  index? Don't they expect to find the_index unpopulated?
>
>  (I notice that it's effectively tied to RUN_SETUP, which is good.
>   But that also means that many commands, like "diff", won't get the
>   benefit. Not to mention non-builtins).
>
> I'd rather see it in the commands themselves. Especially given the
> "ideal" in your status example, which requires command-specific
> knowledge.

I agree. It's already bad enough for pathspec code to peek into the
index, adding a hidden dependency between parse_pathspec() and
read_cache(). And I still think parse_pathspec() is not the right
place to check submodule paths. Worktree should be checked as well, in
the case that the submodule is not yet registered in the index. The
right place to do that is per-command, with their consent so to speak,
because they may need to set things up (index, .git/config and stuff)
properly before explicitly doing this check.
-- 
Duy


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Stefan Beller
On Fri, Jan 20, 2017 at 1:58 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Jeff King  writes:
>>
 And in my current understanding of submodules the check in
 .gitmodules ought to be enough, too.
>>>
>>> Yeah, that probably makes sense. You can have a gitlink without a
>>> .gitmodules file, but I don't quite know what that would mean in terms
>>> of submodules (I guess it's not a submodule but "something else").
>>
>> That may be a lot better than reading the index unconditionally, but
>> I'd rather not to see "git rev-parse" read ".gitmodules" at all.  It
>> would discourage scripted use of Git for no good reason.
>
> Thinking about this more, I suspect that
>
> cd sub && git anything
>
> when the index of the top-level thinks "sub" must be a submodule and
> the user is not interested in "sub" (hence it hasn't gone through
> "git submodule init" or "update") should get the same error as you
> would get if you did
>
> cd /var/tmp/ && git anything
>
> when none of /, /var, /var/tmp/ is controlled by any Git repository.
> I.e. "fatal: Not a git repository".

I agree. The idea with a tombstone sounds great from a
performance perspective as you do not need to do extra work
in the superproject at all, because any gitlink is detected early
in the discovery.

The big BUT is however the following:
How do current users know if a submodule is e.g. populated?
(From say a third party script). Most likely they use something like

test -e ${sub}/.git

as that just worked. So if we go with the tombstone idea, we
may break things. (i.e. the fictional third party script confirms any
submodule to be there, but it is not)

I do really like the idea though, so maybe we also need to provide
some submodule plumbing that we opine to be the "correct" way
to see the submodules state[1] to make the transition easier for the
script writers?

[1] c.f. submodule states in
https://github.com/gitster/git/commit/e2b51b9df618ceeff7c4ec044e20f5ce9a87241e

>
> Perhaps we can update two things and make it cheap.
>
>  - checking out the top-level working tree without populating the
>working tree of a submodule learns to do a bit more than just
>creating an empty directory.  Instead, it creates the third kind
>of ".git" (we currently support two kinds of ".git", one that is
>a repository itself, and another that is points at a repository),
>that tells us that there is not (yet) a repository there.
>
>  - the "discovering the root of the working tree" logic learns to
>notice the third kind of ".git" and stop with "Not a git
>repository".


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Jeff King
On Fri, Jan 20, 2017 at 01:58:02PM -0800, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Jeff King  writes:
> >
> >>> And in my current understanding of submodules the check in
> >>> .gitmodules ought to be enough, too.
> >>
> >> Yeah, that probably makes sense. You can have a gitlink without a
> >> .gitmodules file, but I don't quite know what that would mean in terms
> >> of submodules (I guess it's not a submodule but "something else").
> >
> > That may be a lot better than reading the index unconditionally, but
> > I'd rather not to see "git rev-parse" read ".gitmodules" at all.  It
> > would discourage scripted use of Git for no good reason.
> 
> Thinking about this more, I suspect that
> 
>   cd sub && git anything
> 
> when the index of the top-level thinks "sub" must be a submodule and
> the user is not interested in "sub" (hence it hasn't gone through
> "git submodule init" or "update") should get the same error as you
> would get if you did
> 
>   cd /var/tmp/ && git anything
> 
> when none of /, /var, /var/tmp/ is controlled by any Git repository.
> I.e. "fatal: Not a git repository".

Not sure if our emails just crossed, but yes, I agree completely.

> Perhaps we can update two things and make it cheap.
> 
>  - checking out the top-level working tree without populating the
>working tree of a submodule learns to do a bit more than just
>creating an empty directory.  Instead, it creates the third kind
>of ".git" (we currently support two kinds of ".git", one that is
>a repository itself, and another that is points at a repository),
>that tells us that there is not (yet) a repository there.
> 
>  - the "discovering the root of the working tree" logic learns to
>notice the third kind of ".git" and stop with "Not a git
>repository".

Yeah, I thought about suggesting something like that earlier. It's
slightly more efficient than the "find the root and then complain" thing
Stefan and I were talking about, but I'd worry it comes with weird
corner cases. E.g., are there tools (or other parts of git) that care
about it literally being an empty directory? E.g., would parts of
"checkout" need to know that it's OK to blow it away?


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>>> And in my current understanding of submodules the check in
>>> .gitmodules ought to be enough, too.
>>
>> Yeah, that probably makes sense. You can have a gitlink without a
>> .gitmodules file, but I don't quite know what that would mean in terms
>> of submodules (I guess it's not a submodule but "something else").
>
> That may be a lot better than reading the index unconditionally, but
> I'd rather not to see "git rev-parse" read ".gitmodules" at all.  It
> would discourage scripted use of Git for no good reason.

Thinking about this more, I suspect that

cd sub && git anything

when the index of the top-level thinks "sub" must be a submodule and
the user is not interested in "sub" (hence it hasn't gone through
"git submodule init" or "update") should get the same error as you
would get if you did

cd /var/tmp/ && git anything

when none of /, /var, /var/tmp/ is controlled by any Git repository.
I.e. "fatal: Not a git repository".

Perhaps we can update two things and make it cheap.

 - checking out the top-level working tree without populating the
   working tree of a submodule learns to do a bit more than just
   creating an empty directory.  Instead, it creates the third kind
   of ".git" (we currently support two kinds of ".git", one that is
   a repository itself, and another that is points at a repository),
   that tells us that there is not (yet) a repository there.

 - the "discovering the root of the working tree" logic learns to
   notice the third kind of ".git" and stop with "Not a git
   repository".




Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Jeff King
On Fri, Jan 20, 2017 at 01:41:54PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> And in my current understanding of submodules the check in
> >> .gitmodules ought to be enough, too.
> >
> > Yeah, that probably makes sense. You can have a gitlink without a
> > .gitmodules file, but I don't quite know what that would mean in terms
> > of submodules (I guess it's not a submodule but "something else").
> 
> That may be a lot better than reading the index unconditionally, but
> I'd rather not to see "git rev-parse" read ".gitmodules" at all.  It
> would discourage scripted use of Git for no good reason.

Why is that? Just because it makes rev-parse seem more bloated?

I think Stefan's putting it into git.c is confusing the issue a bit.
This is fundamentally about figuring out which git repository we're in,
and that procedure is the right place to put the check.

IOW, when we call setup_git_repository() we are already walking up the
tree and looking at .git/HEAD, .git/config, etc to see if we are in a
valid git repository. It doesn't seem unreasonable to me to make this
part of that check. I.e.:

  - if we we walked up from the working tree (so we have a non-NULL
prefix); and

  - if there is a .gitmodules file; and

  - if the .gitmodules file shows that we were inside what _should_ have
been a submodule; then

  - complain and do not accept the outer repository as a valid repo.

That adds only an extra failed open() for people who do not use
submodules, and an extra config-file parse for people who do. And then
only when they are not in the top-level of the working tree (so scripts,
etc that cd_to_toplevel wouldn't pay per-invocation).

-Peff


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Junio C Hamano
Jeff King  writes:

>> And in my current understanding of submodules the check in
>> .gitmodules ought to be enough, too.
>
> Yeah, that probably makes sense. You can have a gitlink without a
> .gitmodules file, but I don't quite know what that would mean in terms
> of submodules (I guess it's not a submodule but "something else").

That may be a lot better than reading the index unconditionally, but
I'd rather not to see "git rev-parse" read ".gitmodules" at all.  It
would discourage scripted use of Git for no good reason.




Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Stefan Beller
On Fri, Jan 20, 2017 at 12:00 PM, Jeff King  wrote:
> On Fri, Jan 20, 2017 at 11:53:01AM -0800, Stefan Beller wrote:
>
>> > One alternative would be to make the check cheaper. Could we reliably
>> > tell from the submodule.foo.* block in the config that path "foo" is a
>> > submodule? I think that would work after "submodule init" but not right
>> > after "git clone". So the index really is the source of truth there.
>>
>> Well we can check if there is a .gitmodules file that has a
>> submodule.*.path equal to the last part of $CWD, no need to look
>> at the git config.
>>
>> And that would also work right after git clone (in an
>> unpopulated/uninitialized submodule as I call it).
>>
>> And in my current understanding of submodules the check in
>> .gitmodules ought to be enough, too.
>
> Yeah, that probably makes sense. You can have a gitlink without a
> .gitmodules file, but I don't quite know what that would mean in terms
> of submodules (I guess it's not a submodule but "something else").

yeah, I agree it could be git series[1] at work, or as you said
"something else", and we have no idea what to do.

I think this could actually be implemented top-down, because the
check is cheap as we're beginning with lstat(.gitmodules), and no further
pursue checking this corner case in case the .gitmodules is not found.

I'll see if I can make a patch that passes the test suite.

[1] https://github.com/git-series/git-series/blob/master/INTERNALS.md

>
> -Peff


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Jeff King
On Fri, Jan 20, 2017 at 11:53:01AM -0800, Stefan Beller wrote:

> > One alternative would be to make the check cheaper. Could we reliably
> > tell from the submodule.foo.* block in the config that path "foo" is a
> > submodule? I think that would work after "submodule init" but not right
> > after "git clone". So the index really is the source of truth there.
> 
> Well we can check if there is a .gitmodules file that has a
> submodule.*.path equal to the last part of $CWD, no need to look
> at the git config.
> 
> And that would also work right after git clone (in an
> unpopulated/uninitialized submodule as I call it).
> 
> And in my current understanding of submodules the check in
> .gitmodules ought to be enough, too.

Yeah, that probably makes sense. You can have a gitlink without a
.gitmodules file, but I don't quite know what that would mean in terms
of submodules (I guess it's not a submodule but "something else").

-Peff


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Stefan Beller
On Fri, Jan 20, 2017 at 11:42 AM, Jeff King  wrote:
> On Fri, Jan 20, 2017 at 11:33:45AM -0800, Stefan Beller wrote:
>
>> > I'd rather see it in the commands themselves. Especially given the
>> > "ideal" in your status example, which requires command-specific
>> > knowledge.
>>
>> So you rather want to go bottom up, i.e. add it to each command individually
>> for which it makes sense, instead of rather first having a catch-it-all like
>> this and then we can have a flag similar to RUN_SETUP, e.g.
>> ALLOW_IN_UNPOP_SUBMODULE, which allows commands to
>> take over the responsibility to act responsibly in this case?
>
> Yes. I know it's "less safe" in the sense that commands have to make an
> effort to detect the situation, but I feel like only they'll know what
> the sensible behavior is. And they can also do the check at a time when
> they would be reading the index anyway.
>
>> status may be the first command for going that route; I wonder if we'd
>> want to add this feature unconditionally or only in the porcelain case.
>> (In plumbing you're supposed to know what you're doing... so there is
>> no need as well as our promise to not change it)
>
> Yeah. The reason that it would be so painful to load the index
> for every rev-parse is not just that it probably doesn't otherwise need
> the index, but that scripts may make a _ton_ of rev-parse (or other
> plumbing) calls.
>
> One alternative would be to make the check cheaper. Could we reliably
> tell from the submodule.foo.* block in the config that path "foo" is a
> submodule? I think that would work after "submodule init" but not right
> after "git clone". So the index really is the source of truth there.

Well we can check if there is a .gitmodules file that has a
submodule.*.path equal to the last part of $CWD, no need to look
at the git config.

And that would also work right after git clone (in an
unpopulated/uninitialized submodule as I call it).

And in my current understanding of submodules the check in
.gitmodules ought to be enough, too.

>
> I guess there could be an index extension "these are the gitlinks I
> contain" and in theory we could read just that extension. I dunno.
>
> -Peff


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Jeff King
On Fri, Jan 20, 2017 at 11:33:45AM -0800, Stefan Beller wrote:

> > I'd rather see it in the commands themselves. Especially given the
> > "ideal" in your status example, which requires command-specific
> > knowledge.
> 
> So you rather want to go bottom up, i.e. add it to each command individually
> for which it makes sense, instead of rather first having a catch-it-all like
> this and then we can have a flag similar to RUN_SETUP, e.g.
> ALLOW_IN_UNPOP_SUBMODULE, which allows commands to
> take over the responsibility to act responsibly in this case?

Yes. I know it's "less safe" in the sense that commands have to make an
effort to detect the situation, but I feel like only they'll know what
the sensible behavior is. And they can also do the check at a time when
they would be reading the index anyway.

> status may be the first command for going that route; I wonder if we'd
> want to add this feature unconditionally or only in the porcelain case.
> (In plumbing you're supposed to know what you're doing... so there is
> no need as well as our promise to not change it)

Yeah. The reason that it would be so painful to load the index
for every rev-parse is not just that it probably doesn't otherwise need
the index, but that scripts may make a _ton_ of rev-parse (or other
plumbing) calls.

One alternative would be to make the check cheaper. Could we reliably
tell from the submodule.foo.* block in the config that path "foo" is a
submodule? I think that would work after "submodule init" but not right
after "git clone". So the index really is the source of truth there.

I guess there could be an index extension "these are the gitlinks I
contain" and in theory we could read just that extension. I dunno.

-Peff


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Stefan Beller
On Fri, Jan 20, 2017 at 11:17 AM, Jeff King  wrote:
> On Thu, Jan 19, 2017 at 11:30:23AM -0800, Stefan Beller wrote:
>
>> Now let's ask the same question for "git -C sub status ." (which is a
>> command that is only reading and not writing to the repository)
>>
>> 1) If the submodule is populated, the user clearly intended to know
>>more about the submodules status
>> 2) It is unclear if the user wanted to learn about the submodules state
>>(So ideally: "The submodule 'sub' is not initialized. To init ...")
>>or the status check should be applied to the superproject instead.
>>
>> Avoid the confusion in 2) as well and just error out for now. Later on
>> we may want to add another flag to git.c to allow commands to be run
>> inside unpopulated submodules and each command reacts appropriately.
>
> I like the general idea of catching commands in unpopulated submodules,
> but I'm somewhat uncomfortable with putting an unconditional check into
> git.c, for two reasons:
>
>   1. Reading the index can be expensive. You would not want "git
>  rev-parse" to incur this cost.

Well, I would want rev-parse to not be run in the wrong repo.
(intended to rev-parse something in the submodule, but got results for
the superproject).

Talking about rev-parse, I was about to propose an extension in reply to
"[PATCH] git-prompt.sh: add submodule indicator" that rev-parse could
learn a flag similar to --show-toplevel, named:
--show-superproject-if-any or
--indicate-if-in-submodule-possibly
which would help out there.

>
>   2. How does this interact with commands which do interact with the
>  index? Don't they expect to find the_index unpopulated?

That is another sloppiness in this RFC patch, as I haven't nailed down
the corner cases yet.

>
>  (I notice that it's effectively tied to RUN_SETUP, which is good.
>   But that also means that many commands, like "diff", won't get the
>   benefit. Not to mention non-builtins).
>
> I'd rather see it in the commands themselves. Especially given the
> "ideal" in your status example, which requires command-specific
> knowledge.

So you rather want to go bottom up, i.e. add it to each command individually
for which it makes sense, instead of rather first having a catch-it-all like
this and then we can have a flag similar to RUN_SETUP, e.g.
ALLOW_IN_UNPOP_SUBMODULE, which allows commands to
take over the responsibility to act responsibly in this case?

status may be the first command for going that route; I wonder if we'd
want to add this feature unconditionally or only in the porcelain case.
(In plumbing you're supposed to know what you're doing... so there is
no need as well as our promise to not change it)

Thanks,
Stefan

>
> -Peff


Re: [RFC/PATCH] Disallow commands from within unpopulated submodules.

2017-01-20 Thread Jeff King
On Thu, Jan 19, 2017 at 11:30:23AM -0800, Stefan Beller wrote:

> Now let's ask the same question for "git -C sub status ." (which is a
> command that is only reading and not writing to the repository)
> 
> 1) If the submodule is populated, the user clearly intended to know
>more about the submodules status
> 2) It is unclear if the user wanted to learn about the submodules state
>(So ideally: "The submodule 'sub' is not initialized. To init ...")
>or the status check should be applied to the superproject instead.
> 
> Avoid the confusion in 2) as well and just error out for now. Later on
> we may want to add another flag to git.c to allow commands to be run
> inside unpopulated submodules and each command reacts appropriately.

I like the general idea of catching commands in unpopulated submodules,
but I'm somewhat uncomfortable with putting an unconditional check into
git.c, for two reasons:

  1. Reading the index can be expensive. You would not want "git
 rev-parse" to incur this cost.

  2. How does this interact with commands which do interact with the
 index? Don't they expect to find the_index unpopulated?

 (I notice that it's effectively tied to RUN_SETUP, which is good.
  But that also means that many commands, like "diff", won't get the
  benefit. Not to mention non-builtins).

I'd rather see it in the commands themselves. Especially given the
"ideal" in your status example, which requires command-specific
knowledge.

-Peff