On Tue, 18 Sep 2018 19:12:57 +0200
SZEDER Gábor <szeder....@gmail.com> wrote:

[...]
> On Mon, Sep 17, 2018 at 04:09:40PM +0200, Antonio Ospite wrote:
> > When the .gitmodules file is not available in the working tree, try
> > using the content from the index and from the current branch.
> 
> "from the index and from the current branch" of which repository?
>

I took a look, some comments below.

> > diff --git a/submodule-config.c b/submodule-config.c
> > index 61a555e920..bdb1d0e2c9 100644
> > --- a/submodule-config.c
> > +++ b/submodule-config.c
> 
> > @@ -603,8 +604,21 @@ static void submodule_cache_check_init(struct 
> > repository *repo)
> >  static void config_from_gitmodules(config_fn_t fn, struct repository 
> > *repo, void *data)
> >  {
[...]
> > +           file = repo_worktree_path(repo, GITMODULES_FILE);
> > +           if (file_exists(file))
> > +                   config_source.file = file;
> > +           else if (get_oid(GITMODULES_INDEX, &oid) >= 0)
> > +                   config_source.blob = GITMODULES_INDEX;
> > +           else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
> > +                   config_source.blob = GITMODULES_HEAD;
> > +
> 
> The repository used in t7814 contains nested submodules, which means
> that config_from_gitmodules() is invoked three times.
> 
> Now, the first two of those calls look at the superproject and at
> 'submodule', and find the existing files '.../trash
> directory.t7814-grep-recurse-submodules/.gitmodules' and '.../trash
> directory.t7814-grep-recurse-submodules/submodule/.gitmodules',
> respectively.  So far so good.
> 
> The third call, however, looks at the nested submodule at
> 'submodule/sub', which doesn't contain a '.gitmodules' file.  So this
> function goes on with the second condition and calls
> get_oid(GITMODULES_INDEX, &oid), which then appears to find the blob
> in the _superproject's_ index.
>

You are correct.

This is a limitation of the object store in git, there is no equivalent
of get_oid() to get the oid from a specific repository and this affects
config_with_options too when the config source is a blob.

This does not affect commands called via "git -C submodule_dir cmd"
because in that case the chdir happens before the_repository is set up,
for instance "git-submodule $SOMETHING --recursive" commands seem to
change the working directory before the recursion.

> > +           config_with_options(fn, data, &config_source, &opts);
> > +
> >             free(file);
> >     }
> >  }
> I'm no expert on submodules, but my gut feeling says that this can't
> be right.  But if it _is_ right, then I would say that the commit
> message should explain in detail, why it is right.
> 

I agree it isn't right, I didn't consider the case of nested
submodules, but even if I did the current git design does not
allow to correctly solve the problem: "read config from blob of an
arbitrary repository".

So what to do for the time being?

The issue is there but in a "normal" scenario it is not causing any real
harm because:

  1. currently it is exposed "only" by git grep and nested submodules.

  2. the new mechanism works fine when reading the submodule config for
     the root repository.

  3. the new mechanism does not "usually" impact non-leaf nested
     submodules, because the .gitmodules file is "normally" there.

  4. git grep never *uses* the GITMODULES_INDEX erroneously read
     from the root project when scanning the _leaf_ nested submodule
     because there are no further submodules down the line, the
     following check fails in builtin/grep.c:

       if (recurse_submodules && S_ISGITLINK(entry.mode) ...
       ...

In fact, because of 4. the test suite passes even if the gitmodule
config is not correct for the leaf submodule.

Actually 4. makes me think that the repo_read_gitmodules() call in
builtin/grep.c might not be strictly necessary, its purpose seems to be
to *anticipate* reading the config of *child* submodules while still
processing the *current* submodule, the config for the *current*
submodule was already read from the superproject by the immediately
preceding repo_submodule_init(), via:

  repo_submodule_init()
    submodule_from_path()
      gitmodules_read_check()
        repo_read_gitmodules()

And this would happen anyway also for child submodules down the
recursion path if we removed repo_read_gitmodules() in builtin/grep.c,
the operation would be not protected by grep_read_lock() tho.

The test suite passes even after removing repo_read_gitmodules()
entirely from builtin/grep.c, but I am still not confident that I get
all the implication of why that call was originally added in commit
f9ee2fcdfa (grep: recurse in-process using 'struct repository',
2017-08-02).

Anyways, even if we removed the call we would prevent the problem from
happening in the test suite, but not in the real world, in case non-leaf
submodules without .gitmodules in their working tree.

To recap:
  - patch 9/9 exposes a problem with the object store but for now it's
    only a potential problem in the future case that someone wanted to
    use *nested* submodules without .gitmodules in the working tree.
  - the new code should not affect current users which
    assume .gitmodules to be in the working tree for nested submodules;
  - even if we removed the call to repo_read_gitmodules() call in
    builtin/grep.c we would not avoid the problem entirely, just avoid
    it for the the _leaf_ submodules in case of nested submodules.

Selfishly, I'd propose to still merge the changes (I can send a v6 with
the locking fix in) and I'll write a test_expect_failure snippet to
document the problem Gábor spotted so we remember about it and fix it
when the object store can be accessed per-repository.

I am afraid I cannot look into the core issue about the object store in
my free time, however if someone wanted to sponsor some time I might
consider taking a stab at it.

Ciao,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

Reply via email to