Re: gitmodules below root directory
On Wed, Sep 6, 2017 at 12:58 PM, Junio C Hamanowrote: > The current gitlink implementation records only the "what commit > from the subproject history is to be checked out at this path?" and > nothing else, by storing a single SHA-1 that happens to be the name > of the commit object (but the superproject does not even care the > fact that it is a commit or a random string). We could substitute > that with the name of a blob object that belongs to the superproject > history and records the information about the submodule at the path > (e.g. "which repository the upstream project recommends to clone the > subproject from?", "what commit object is to be checked out"). > > When you see a single tree of a superproject, you need to see what > commit is to be checked out from the tree object and everything else > needs to be read from the .gitmodules file in that tree in the > current system, but it does not have to be that way. > > IMO, this approach described here, (point the gitlink at a blob which describes the full contents, URL, etc) would make more sense. The trickiest parts I think are (a) it really requires tooling to change the git module vs just editing a file, and (b) we'd need to prevent the blobs from getting garbage collected. I think it makes each individual submodule a bit more robust, since the actual submodule pointer always points directly to the full data about that submodule (it's recommended URL, it's path, etc), and changes to those things *are* changes to the submodule pointer. Thanks, Jake
Re: gitmodules below root directory
Robert Daileywrites: > The gitmodules documentation[1] states that the .gitmodules file is at > the root. However, it would be nice if this could be supported in any > directory similar to how .gitignore works. I have a mild suspicion that there would be a huge impedance mismatch between what gitmodules file is meant to do and the way ignore/attribute setting is done. When the mechanism is primarily about expressing a few generic traits that are shared by things that can be grouped by paths (e.g. "all paths whose pathnames match '*.py' pattern contain text", "all paths in sub/ directory are ignored"), it may make sense to spread the information across multiple .gitignore files and make the closest one take precedence over the further ones. Even though allowing multiple sources of information spread over the tree leads to end-user confusion (e.g. "why is this path ignored?", which triggered the debugging aid "git check-ignore"), such a grouping by pattern matching on paths (which is what makes "closest file take precedence" meaningful) to assign generic traits (e.g. "it's text") makes it worthwhile by allowing to express the rules more concisely. Compared to that, what .gitmodules file expresses is more specific to each submodule---no two submodules in your single superproject would share the same URL, unless you are doing something quite unusual, for example. Having a single file also means that updating is much simpler---"git submodule add" and other things do not have to choose among .gitmodules, a/.gitmodules and a/b/.gitmodules when they update an entry for the submodule at path "a/b/c". Having said that, I do not think the current ".gitmodules must be at the top and nothing else matters" is ideal. A possible change that I suspect may make more sense is to get rid of .gitmodules file, instead of spreading more of them all over the tree. The current gitlink implementation records only the "what commit from the subproject history is to be checked out at this path?" and nothing else, by storing a single SHA-1 that happens to be the name of the commit object (but the superproject does not even care the fact that it is a commit or a random string). We could substitute that with the name of a blob object that belongs to the superproject history and records the information about the submodule at the path (e.g. "which repository the upstream project recommends to clone the subproject from?", "what commit object is to be checked out"). When you see a single tree of a superproject, you need to see what commit is to be checked out from the tree object and everything else needs to be read from the .gitmodules file in that tree in the current system, but it does not have to be that way.
Re: gitmodules below root directory
On Wed, Sep 6, 2017 at 6:53 AM, Robert Daileywrote: > The gitmodules documentation[1] states that the .gitmodules file is at > the root. However, it would be nice if this could be supported in any > directory similar to how .gitignore works. Right now git-subrepo does > not support submodules inside of a subrepo[2] (I suspect subtrees > would have the same problem, but I did not verify). I think this is a > limitation of git, rather than subrepo itself. Perhaps there are > reasons why .gitmodules must be at the root, but I at least wanted to > point it out and see if this could be supported. > > [1]: https://git-scm.com/docs/gitmodules > [2]: https://github.com/ingydotnet/git-subrepo/issues/262 I agree that subtree likely suffers the same problem. And at first it seems reasonable to want to have .gitmodules at deeper trees supported, as that would fix subtree and subrepo (and others) with ease. Historically the need to store submodule URLs were the motivation for having the .gitmodules file. An absolute URL for a submodule would work fine no matter where the .gitmodules file would be located. Relative URLs are currently defined as relative to the top level of the project, which we would need to inspect if the anchor is chosen well at the root or if we would want to allow anchoring the relative URL within the tree. (This is no reason against .gitmodules in deep trees, just pointing out the work required). But does the URL still make sense? For absolute URLs this is likely the case, for relative URLs my bets are off. Maybe? It turned out that people want to e.g. move, delete and re-introduce submodules, which is why the location of a submodule git directory was moved to be either inside the tree (to keep supporting existing git repos with submodules) as well as interned in the superproject. In the example given in [2], the git dir of the submodule ("folder B") may be located at .git/modules/nameB as seen from the root of RepoX: RepoX + folder A + folder B (submodule) + .gitmodules + .git # regular RepoX git dir + modules/ An important mechanism of the .gitmodules file is the resolution of the "name" and the "path" of the submodule. (Given the path of a gitlink entry, where do I find the git repository for the submodule? vice versa is slightly less relevant: Given this git repository deep inside my own git directory, where is the working tree) So in the example we'd have RepoY + RepoX (subrepo) + folder A + folder B (submodule) +.gitmodules The path entry in the .gitmodules file would not change via subtree/subrepo merge, such that Git would need to know that the actual path to the submodule is the concatenation of 'path to tree in which the .gitmodules file is' and the given path inside the .gitmodules file. Seems doable so far. What about the name of a submodule? The .gitmodules file follows the syntax of git config files, such that names cannot occur twice as the names are stored as the section name: [submodule "nameB"] path = "folder B" And I would think the property of having unique names is important, such that each submodule has its unique place to put its git dir inside the superprojects "$GIT_DIR/modules/". With multiple .gitmodules files, we would loose the uniqueness property. (It may not be too bad, maybe even a clever hack, haven't thought about it deeply, but it seems ugly at first) As said above, the name<->path resolution is important, (and shall be unique, deterministic and simple), so how do we do it? What about the case where we have .gitmodules "name" -> dir/path dir/.gitmodules "name" -> ./path In this case we'd have the same mapping, but using this mechanism we can map multiple names at the same path, and we could choose to resolve a given path in different .gitmodules files, which is cumbersome. anotherdir/.gitmodules "name" -> ../dir/path seems crazy, too. What about moving submodules? Consider the example as in [2] again: $ git mv RepoX/folderB dir/sub $ git commit -m "move submodule" # ok fine, we can come up with a plan # where to put the submodule configuration, # maybe in dir/.gitmodules? $ git rm RepoX $ git commit -m "don't need the rest of RepoX" # observation: we would not want # RepoX/.gitmodules to still have impact on # the submodule. $ git revert HEAD^^ # undo the initial move # we'd move the .gitmodules file back to RepoX/. tl;dr: I think this idea produces lots of interesting corner cases in the data model, let's not go there without having an idea how to solve them. >From an implementation stand point: The submodule-config API could easily enhanced to support reading multiple .gitmodules files (in case their location is well defined, we would not want to walk the whole tree recursively). This API is only easily accessible from within C, such that current implementing this idea in git-submodule.sh would be a hassle to do.
gitmodules below root directory
The gitmodules documentation[1] states that the .gitmodules file is at the root. However, it would be nice if this could be supported in any directory similar to how .gitignore works. Right now git-subrepo does not support submodules inside of a subrepo[2] (I suspect subtrees would have the same problem, but I did not verify). I think this is a limitation of git, rather than subrepo itself. Perhaps there are reasons why .gitmodules must be at the root, but I at least wanted to point it out and see if this could be supported. [1]: https://git-scm.com/docs/gitmodules [2]: https://github.com/ingydotnet/git-subrepo/issues/262