On 27 Nov 2013, at 11:43, Junio C Hamano <gits...@pobox.com> wrote:

> Nick Townsend <nick.towns...@mac.com> writes:
> 
>> On 26 Nov 2013, at 14:18, Junio C Hamano <gits...@pobox.com> wrote:
>> 
>>> Even if the code is run inside a repository with a working tree,
>>> when producing a tarball out of an ancient commit that had a
>>> submodule not at its current location, --recurse-submodules option
>>> should do the right thing, so asking for working tree location of
>>> that submodule to find its repository is wrong, I think.  It may
>>> happen to find one if the archived revision is close enough to what
>>> is currently checked out, but that may not necessarily be the case.
>>> 
>>> At that point when the code discovers an S_ISGITLINK entry, it
>>> should have both a pathname to the submodule relative to the
>>> toplevel and the commit object name bound to that submodule
>>> location.  What it should do, when it does not find the repository
>>> at the given path (maybe because there is no working tree, or the
>>> sudmodule directory has moved over time) is roughly:
>>> 
>>> - Read from .gitmodules at the top-level from the tree it is
>>>  creating the tarball out of;
>>> 
>>> - Find "submodule.$name.path" entry that records that path to the
>>>  submodule; and then
>>> 
>>> - Using that $name, find the stashed-away location of the submodule
>>>  repository in $GIT_DIR/modules/$name.
>>> 
>>> or something like that.
>>> 
>>> This is a related tangent, but when used in a repository that people
>>> often use as their remote, the repository discovery may have to
>>> interact with the relative URL.  People often ship .gitmodules with
>>> 
>>>     [submodule "bar"]
>>>             URL = ../bar.git
>>>             path = barDir
>>> 
>>> for a top-level project "foo" that can be cloned thusly:
>>> 
>>>     git clone git://site.xz/foo.git
>>> 
>>> and host bar.git to be clonable with
>>> 
>>>     git clone git://site.xz/bar.git barDir/
>>> 
>>> inside the working tree of the foo project.  In such a case, when
>>> "archive --recurse-submodules" is running, it would find the
>>> repository for the "bar" submodule at "../bar.git", I would think.
>>> 
>>> So this part needs a bit more thought, I am afraid.
>> 
>> I see that there is a lot of potential complexity around setting up a 
>> submodule:
> 
> No question about it.
> 
>> * The .gitmodules file can be dirty (easy to flag, but should we
>> allow archive to proceed?)
> 
> As we are discussing "archive", which takes a tree object from the
> top-level project that is recorded in the object database, the
> information _about_ the submodule in question should come from the
> given tree being archived.  There is no reason for the .gitmodules
> file that happens to be sitting in the working tree of the top-level
> project to be involved in the decision, so its dirtyness should not
> matter, I think.  If the tree being archived has a submodule whose
> name is "kernel" at path "linux/" (relative to the top-level
> project), its repository should be at .git/modules/kernel in the
> layout recent git-submodule prepares, and we should find that
> path-and-name mapping from .gitmodules recorded in that tree object
> we are archiving. The version that happens to be checked out to the
> working tree may have moved the submodule to a new path "linux-3.0/"
> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
> but when archiving a tree that has the submodule at "linux/", it
> would not help---we would not know to look at "linux-3.0/.git" to
> learn that information anyway because .gitmodules in the working
> tree would say that the submodule at path "linux-3.0/" is with name
> "kernel", and would not tell us anything about "linux/".
> 
>> * Users can mess with settings both prior to git submodule init
>> and before git submodule update.
> 
> I think this is irrelevant for exactly the same reason as above.
> 
> What makes this tricker, however, is how to deal with an old-style
> repository, where the submodule repositories are embedded in the
> working tree that happens to be checked out.  In that case, we may
> have to read .gitmodules from two places, i.e.
> 
> (1) We are archiving a tree with a submodule at "linux/";
> 
> (2) We read .gitmodules from that tree and learn that the submodule
>     has name "kernel";
> 
> (3) There is no ".git/modules/kernel" because the repository uses
>     the old layout (if the user never was interested in this
>     submodule, .git/modules/kernel may also be missing, and we
>     should tell these two cases apart by checking .git/config to
>     see if a corresponding entry for the "kernel" submodule exists
>     there);
> 
> (4) In a repository that uses the old layout, there must be the
>     repository somewhere embedded in the current working tree (this
>     inability to remove is why we use the new layout these days).
>     We can learn where it is by looking at .gitmodules in the
>     working tree---map the name "kernel" we learned earlier, and
>     map it to the current path ("linux-3.0/" if you have been
>     following this example so far).
> 
> And in that fallback context, I would say that reading from a dirty
> (or "messed with by the user") .gitmodules is the right thing to
> do.  Perhaps the user may be in the process of moving the submodule
> in his working tree with
> 
>    $ mv linux-3.0 linux-3.2
>    $ git config -f .gitmodules submodule.kernel.path linux-3.2
> 
> but hasn't committed the change yet.
> 
>> For those reasons I deliberately decided not to reproduce the
>> above logic all by myself.
> 
> As I already hinted, I agree that the "how to find the location of
> submodule repository, given a particular tree in the top-level
> project the submodule belongs to and the path to the submodule in
> question" deserves a separate thread to discuss with area experts.

As per my email to Heiko on this thread, I’m happy to start such 
a discussion - I’ll use your notes as a starting point. I’m much more 
comfortable
using a wiki for this - is this common or should I start a new mail thread
with RFC in the title or similar?

I did complete my work on my version of git-archive (for internal use) and 
added some regression tests
for current behaviour. Also the add_submodule_odb patch should IMHO be 
incorporated
anyway. I’ll resubmit those two for consideration in a new thread.

Kind Regards
Nick Townsend

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to