Re: [RFC/PATCH] submodules: overhaul documentation
On Mon, Jun 19, 2017 at 11:10 AM, Brandon Williamswrote: > I would probably change the first sentence to: > > A submodule is another Git repository tracked at an arbitrary place > inside the working tree. The tracking doesn't happen at an arbitrary place, but in the gitlink/.gitmodules file. The location of the submodule is at an arbitrary place within the working tree. In a resend, I'll reword it completely (again) to focus more on the structure.
Re: [RFC/PATCH] submodules: overhaul documentation
On Tue, Jun 20, 2017 at 11:18 AM, Jonathan Tanwrote: >> +DESCRIPTION >> +--- >> + >> +A submodule is another Git repository tracked in a subdirectory of your >> +repository. The tracked repository has its own history, which does not >> +interfere with the history of the current repository. >> + >> +Submodules are composed from a so-called `gitlink` tree entry >> +in the main repository that refers to a particular commit object >> +within the inner repository. >> + >> +Additionally to the gitlink entry the `.gitmodules` file (see >> +linkgit:gitmodules[5]) at the root of the source tree contains >> +information needed for submodules. The only required information >> +is the path setting, which estabishes a logical name for the submodule. > > I know that this was copied over, but this is confusing to me, so it > might be worthwhile to change it. In particular, `gitlink` is, as far as > I know, the type of the child object, correct > not any sort of tree entry. I Well a tree references (a) other trees (b) blobs or (c) gitlinks, so calling a gitlink a tree-entry is correct, but maybe confusing. > would rewrite this as: > > A submodule consists of a tracking subdirectory and an entry in the > `.gitmodules` file (see linkgit:gitmodules[5]). A submodule consists of a tracking subdirectory [in the working directory] and an entry in the `.gitmodules` file (see linkgit:gitmodules[5]). > The tracking subdirectory appears in the main repository as a > `gitlink` object (instead of a `tree` object). The parent of the > tracking subdirectory links to this `gitlink` object through its > hash, just like linking to a `tree` or `blob`. This `gitlink` object > contains the hash of a particular commit object of the submodule. > I think this is confusing, too. :) That is because a subdirectory exists on the FS, whereas the gitlink appears in gits representation of the world (in the tree). Maybe: The tracking subdirectory appears in the main repository at the point where it is tracked via the gitlink in the tree. It is empty when the submodule is not populated, otherwise it contains the content of the submodule repository. The gitlink object contains the hash of a particular commit object of the submodule. The .gitmodules file establishes a relationship between the path, which is where the gitlink is in the tree, and the logical name, which is used for the location of the submodules git directory. The .gitmodules file has the same syntax as the $Git_DIR/config file. The relationship mapping of path to name is done via setting submodule..path = . The submodules git directory is found in in the main repositories '$GIT_DIR/modules/' or inside the tracking subdirectory, but this is deprecated. > The entry in the `.gitmodules` file establishes the name of the > submodule (to be used as a reference by other entries and commands) > and references the tracking subdirectory as follows: > > submodule.my_submodule_name.path = path/to/my_submodule > > There might also be a need to mention that when the submodule is > populated (or whatever the right term is), the tracking subdirectory in > the working tree contains both the submodule's working tree and the > submodule's Git directory. See the alternative proposal above, the git directory is best kept outside the tracking subdirectory, but rather contained inside the superprojects git dir itself. > >> +The usual git configuration (see linkgit:git-config[1]) can be used to >> +override settings given by the `.gitmodules` file. > > Not sure if this is relevant here. This is relevant for overwriting e.g. the submodule.NAME.url setting. > >> +Submodules can be used for two different use cases: > > A creative person might come up with more, so this might be better as: > > Submodules can be used for at least these use cases: ok. >> +1. Using another project that stands on its own. >> + When you want to use a third party library, submodules allow you to >> + have a clean history for your own project as well as for the library. >> + This also allows for updating the third party library as needed. > > Probably better as: > > 1. Using another project while maintaining independent history. > Submodules allow you to contain the working tree of another project > within your own working tree while keeping the history of both > projects separate. Also, since submodules are fixed to a hash, the "fixed to an arbitrary version" instead? > other project can be independently developed without affecting the > parent project, allowing the parent project to fix itself to new > versions only whenever desired. >> +2. Artificially split a (logically single) project into multiple >> + repositories and tying them back together. This can be used to >> + overcome deficiences in the data model of Git, such as: >
Re: [RFC/PATCH] submodules: overhaul documentation
On Wed, 7 Jun 2017 11:53:54 -0700 Stefan Bellerwrote: [snip] > +DESCRIPTION > +--- > + > +A submodule is another Git repository tracked in a subdirectory of your > +repository. The tracked repository has its own history, which does not > +interfere with the history of the current repository. > + > +Submodules are composed from a so-called `gitlink` tree entry > +in the main repository that refers to a particular commit object > +within the inner repository. > + > +Additionally to the gitlink entry the `.gitmodules` file (see > +linkgit:gitmodules[5]) at the root of the source tree contains > +information needed for submodules. The only required information > +is the path setting, which estabishes a logical name for the submodule. I know that this was copied over, but this is confusing to me, so it might be worthwhile to change it. In particular, `gitlink` is, as far as I know, the type of the child object, not any sort of tree entry. I would rewrite this as: A submodule consists of a tracking subdirectory and an entry in the `.gitmodules` file (see linkgit:gitmodules[5]). The tracking subdirectory appears in the main repository as a `gitlink` object (instead of a `tree` object). The parent of the tracking subdirectory links to this `gitlink` object through its hash, just like linking to a `tree` or `blob`. This `gitlink` object contains the hash of a particular commit object of the submodule. The entry in the `.gitmodules` file establishes the name of the submodule (to be used as a reference by other entries and commands) and references the tracking subdirectory as follows: submodule.my_submodule_name.path = path/to/my_submodule There might also be a need to mention that when the submodule is populated (or whatever the right term is), the tracking subdirectory in the working tree contains both the submodule's working tree and the submodule's Git directory. > +The usual git configuration (see linkgit:git-config[1]) can be used to > +override settings given by the `.gitmodules` file. Not sure if this is relevant here. > +Submodules can be used for two different use cases: A creative person might come up with more, so this might be better as: Submodules can be used for at least these use cases: > +1. Using another project that stands on its own. > + When you want to use a third party library, submodules allow you to > + have a clean history for your own project as well as for the library. > + This also allows for updating the third party library as needed. Probably better as: 1. Using another project while maintaining independent history. Submodules allow you to contain the working tree of another project within your own working tree while keeping the history of both projects separate. Also, since submodules are fixed to a hash, the other project can be independently developed without affecting the parent project, allowing the parent project to fix itself to new versions only whenever desired. > +2. Artificially split a (logically single) project into multiple > + repositories and tying them back together. This can be used to > + overcome deficiences in the data model of Git, such as: This should match the gerund used in point 1: 2. Splitting a (logically single) project into multiple repositories. This can be used to overcome deficiencies in the data model of Git, such as: > +* To have finer grained access control. > + The design principles of Git do not allow for partial repositories to be > + checked out or transferred. A repository is the smallest unit that a user > + can be given access to. Submodules are separate repositories, such that > + you can restrict access to parts of your project via the use of submodules. Not sure about this point - if the project is logically single, you would probably need to see the entire project. If this is about different teams independently working on different subcomponents, this seems more like point 1 (inclusion of other projects). > +* In its current form Git scales up poorly for very large repositories that > + change a lot, as the history grows very large. For that you may want to > look > + at shallow clone, sparse checkout, or git-LFS. > + However you can also use submodules to e.g. hold large binary assets > + and these repositories are then shallowly cloned such that you do not > + have a large history locally. > + > +The data model > +-- > + > +A submodule can be considered its own autonomous repository, that has a > +worktree and a git directory at a different place than the superproject. Isn't the worktree inside the superproject's worktree? I would write this as: A submodule is its own repository, having its own working tree and Git directory. > +The superproject only records the commit object name in its tree, such that > +any other information, e.g. where to obtain a copy
Re: [RFC/PATCH] submodules: overhaul documentation
On 06/13, Stefan Beller wrote: > Adding two native speakers as we start word smithing. > > On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamanowrote: > > >> + > >> +A submodule is another Git repository tracked in a subdirectory of your > >> +repository. The tracked repository has its own history, which does not > >> +interfere with the history of the current repository. > > > > "tracked in a subdirectory" sounds as if your top-level superproject > > has a dedicated submodules/ directory and in it there live a bunch > > of submodules. Which obviously is not what you meant. If phrased > > "tracked as a subdirectory", I think the sentence makes sense. > > Given this explanation "as a" also sounds wrong[1], maybe we need to > separate (1) where it is put/mounted and (2) the fact that is tracked, > i.e. the superproject has an idea of what should be there at a given > revision. (I shortly thought about /s/as a/using/ in the above, but): > > A submodule is another Git repository at an arbitrary place inside > the working tree, and also tracked. The tracked repository has its > own history, which does not interfere with the history of the current > repository. I would probably change the first sentence to: A submodule is another Git repository tracked at an arbitrary place inside the working tree. > > [1] http://www.thesaurus.com/browse/as > > > > > While "which does not interfere" may be technically correct, I am > > not sure what the value of saying that is. > > I think we can drop it here. When writing I wanted to separate it from > subtrees, but this is the wrong place for that. > > > > >> +Submodules are composed from a so-called `gitlink` tree entry > >> +in the main repository that refers to a particular commit object > >> +within the inner repository. > > > > Correct, but it may be unclear to the readers why we do so. Perhaps > > > > ... and this way, the tree of each commit in the main repository > > "knows" which commit from the submodule's history is "tied" to it. > > > > or something like that? > > sounds good to me. > > > > >> +Additionally to the gitlink entry the `.gitmodules` file (see > >> +linkgit:gitmodules[5]) at the root of the source tree contains > >> +information needed for submodules. > > > > Is that really true? Each submodule do not *need* what is in > > .gitmodules; the top-level superproject needs to learn about > > its submodules from the contents of that file, though. > > Ha! The ediled words in my mind were: > > ... information needed for submodules [to work in the superproject]. > > But maybe we need to reword that as > > Additionally to the gitlink entry the `.gitmodules` file (see > linkgit:gitmodules[5]) at the root of the source tree contains > information on how to handle submodules. This sounds slightly awkward. Maybe: In addition to the gitlink entry, the `.gitmodules` file (see linkgit:gitmodules[5]) at the root of the source tree contains information on how to handle submodules. -- Brandon Williams
Re: [RFC/PATCH] submodules: overhaul documentation
Adding two native speakers as we start word smithing. On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamanowrote: >> + >> +A submodule is another Git repository tracked in a subdirectory of your >> +repository. The tracked repository has its own history, which does not >> +interfere with the history of the current repository. > > "tracked in a subdirectory" sounds as if your top-level superproject > has a dedicated submodules/ directory and in it there live a bunch > of submodules. Which obviously is not what you meant. If phrased > "tracked as a subdirectory", I think the sentence makes sense. Given this explanation "as a" also sounds wrong[1], maybe we need to separate (1) where it is put/mounted and (2) the fact that is tracked, i.e. the superproject has an idea of what should be there at a given revision. (I shortly thought about /s/as a/using/ in the above, but): A submodule is another Git repository at an arbitrary place inside the working tree, and also tracked. The tracked repository has its own history, which does not interfere with the history of the current repository. [1] http://www.thesaurus.com/browse/as > > While "which does not interfere" may be technically correct, I am > not sure what the value of saying that is. I think we can drop it here. When writing I wanted to separate it from subtrees, but this is the wrong place for that. > >> +Submodules are composed from a so-called `gitlink` tree entry >> +in the main repository that refers to a particular commit object >> +within the inner repository. > > Correct, but it may be unclear to the readers why we do so. Perhaps > > ... and this way, the tree of each commit in the main repository > "knows" which commit from the submodule's history is "tied" to it. > > or something like that? sounds good to me. > >> +Additionally to the gitlink entry the `.gitmodules` file (see >> +linkgit:gitmodules[5]) at the root of the source tree contains >> +information needed for submodules. > > Is that really true? Each submodule do not *need* what is in > .gitmodules; the top-level superproject needs to learn about > its submodules from the contents of that file, though. Ha! The ediled words in my mind were: ... information needed for submodules [to work in the superproject]. But maybe we need to reword that as Additionally to the gitlink entry the `.gitmodules` file (see linkgit:gitmodules[5]) at the root of the source tree contains information on how to handle submodules. I'd like to keep this part short and not go into detail. > >> +The only required information >> +is the path setting, which estabishes a logical name for the submodule. > > The phrase "the path setting" feels a bit unfortunate. Is that > "only" thing we need? Without URL we have no way to populate it, > no? git config -f .gitmodules submodule.foo.path foo git config submodule.foo.url example.org/foo git submodule update --init ought to work just fine. It is not the recommended way of working, but it should work. I think (in the far future) we actually should only have the path information in-tree and *any* other information outside the tree, which includes the URL, See[2], where I state how I'd like to shape the future: $ cat .gitmodules [submodule "sub42"] path = foo # path only in tree! $ cat .git/config ... [submodule] active = . active = :(exclude)Irrelevant/submodules/for/my/usecase/* # note how this is user centric $ git show refs/meta/magic/for/refs/heads/master:.gitmodules [submodule "sub42"] url = https://example.org/foo branch = . # Note how this is neither centering on the in-tree # contents, nor the user. Instead it focuses on the # project or group. It is *workflow* centric. # Workflows may change over time, e.g. the url could # be repointed to k.org or an in-house mirror without tree # changes. Jonathan pointed out the ref name is chosen poorly, but conceptually I would want to keep the URL setting outside the tree. The URL may change over time, independently from the history currently checked out (think of bisect, that includes an "submodule update --init" to bisect across a fully populated superproject 'at the time') [2] https://public-inbox.org/git/cagz79kbbtwqicvkrs51fv91r_7zhdtc+fr8z-sqzrpf2cjf...@mail.gmail.com/ > >> +The usual git configuration (see linkgit:git-config[1]) can be used to >> +override settings given by the `.gitmodules` file. >> + >> +Submodules can be used for two different use cases: >> + >> +1. Using another project that stands on its own. >> + When you want to use a third party library, submodules allow you to >> + have a clean history for your own project as well as for the library. >> + This also allows for updating the third party library as needed. >> + >> +2. Artificially split a (logically single) project into multiple >> + repositories and tying them back together. This can be used to >> +
Re: [RFC/PATCH] submodules: overhaul documentation
Stefan Bellerwrites: > @@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] ...):: > tree. Further calls to `git submodule update`, `git submodule foreach` > and `git submodule sync` will skip any unregistered submodules until > they are initialized again, so use this command if you don't want to > - have a local checkout of the submodule in your working tree anymore. If > - you really want to remove a submodule from the repository and commit > - that use linkgit:git-rm[1] instead. > + have a local checkout of the submodule in your working tree anymore. > + > When the command is run without pathspec, it errors out, > instead of deinit-ing everything, to prevent mistakes. > + > If `--force` is specified, the submodule's working tree will > be removed even if it contains local modifications. > ++ > +If you really want to remove a submodule from the repository and commit > +that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal > +options. Good reorganization. > diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt > new file mode 100644 > index 00..2bf3149b68 > --- /dev/null > +++ b/Documentation/gitsubmodules.txt > @@ -0,0 +1,214 @@ > +gitsubmodules(7) > + > + > +NAME > + > +gitsubmodules - mounting one repository inside another > + > +SYNOPSIS > + > +.gitmodules, $GIT_DIR/config > +-- > +git submodule > +git --recurse-submodules > +-- > + > +DESCRIPTION > +--- > + > +A submodule is another Git repository tracked in a subdirectory of your > +repository. The tracked repository has its own history, which does not > +interfere with the history of the current repository. "tracked in a subdirectory" sounds as if your top-level superproject has a dedicated submodules/ directory and in it there live a bunch of submodules. Which obviously is not what you meant. If phrased "tracked as a subdirectory", I think the sentence makes sense. While "which does not interfere" may be technically correct, I am not sure what the value of saying that is. > +Submodules are composed from a so-called `gitlink` tree entry > +in the main repository that refers to a particular commit object > +within the inner repository. Correct, but it may be unclear to the readers why we do so. Perhaps ... and this way, the tree of each commit in the main repository "knows" which commit from the submodule's history is "tied" to it. or something like that? > +Additionally to the gitlink entry the `.gitmodules` file (see > +linkgit:gitmodules[5]) at the root of the source tree contains > +information needed for submodules. Is that really true? Each submodule do not *need* what is in .gitmodules; the top-level superproject needs to learn about its submodules from the contents of that file, though. > +The only required information > +is the path setting, which estabishes a logical name for the submodule. The phrase "the path setting" feels a bit unfortunate. Is that "only" thing we need? Without URL we have no way to populate it, no? > +The usual git configuration (see linkgit:git-config[1]) can be used to > +override settings given by the `.gitmodules` file. > + > +Submodules can be used for two different use cases: > + > +1. Using another project that stands on its own. > + When you want to use a third party library, submodules allow you to > + have a clean history for your own project as well as for the library. > + This also allows for updating the third party library as needed. > + > +2. Artificially split a (logically single) project into multiple > + repositories and tying them back together. This can be used to > + overcome deficiences in the data model of Git, such as: s/deficiences in the data model/current limitations/ perhaps? > +* To have finer grained access control. > + The design principles of Git do not allow for partial repositories to be > + checked out or transferred. A repository is the smallest unit that a user > + can be given access to. Submodules are separate repositories, such that > + you can restrict access to parts of your project via the use of submodules. Some servers implement per-branch access control that seems to work rather well. Given that "shallow history" is possible (i.e. you could give one commit without exposing older parts of the history), I think the limitation this paragrah refers to is that "a tree is the smallest unit that the user can be given access to." > +* In its current form Git scales up poorly for very large repositories that > + change a lot, as the history grows very large. For that you may want to > look > + at shallow clone, sparse checkout, or git-LFS. > + However you can also use submodules to e.g. hold large binary assets > + and these repositories are then shallowly cloned such that you do not > + have a large history locally. This is why I suggest
[RFC/PATCH] submodules: overhaul documentation
This patch aims to detangle (a) the usage of `git-submodule` from (b) the concept of submodules and (c) how the actual implementation looks like, such as where they are configured and (d) what the best practices are. To do so, move the conceptual parts of the 'git-submodule' man page to a new man page gitsubmodules(7). This new page is just like gitmodules(5), gitattributes(5), gitcredentials(7), gitnamespaces(7), gittutorial(7), which introduce a concept rather than explaining a specific command. The moved part of text has been slightly restructured: * Rewrite first paragraph ("allows" is wrong. For example you can keep untracked repos as well, submodules enable tracking across versions) (Also remove short example as we have examples later) * Remove "that is completely separate" from the second sentence as that was said in the first sentence. * Introduce the gitmodules file in the third paragraph, mention name as the basic requirement. The URL is optional though strongly suggested. Leave it out as gitmodules(5) explains the url. * The paragraphs about other mechanisms and implementation details are moved further down, as they are not as relevant to the concept of gitmodules. Signed-off-by: Stefan Beller--- This is kind of a resend from [RFC-PATCHv2] submodules: add a background story https://public-inbox.org/git/20170209020855.23486-1-sbel...@google.com/ but the new man page is completely reworked, so I'd expect it go over better for the first half at least. (In the "data model" section it begins to differ from reality, as it mentions a new not-yet-implemented place where to put submodule related config) Thanks, Stefan Documentation/Makefile | 1 + Documentation/git-rm.txt| 4 +- Documentation/git-submodule.txt | 44 ++--- Documentation/gitsubmodules.txt | 214 4 files changed, 227 insertions(+), 36 deletions(-) create mode 100644 Documentation/gitsubmodules.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index b5be2e2d3f..2415e0d657 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt MAN7_TXT += gitglossary.txt MAN7_TXT += gitnamespaces.txt MAN7_TXT += gitrevisions.txt +MAN7_TXT += gitsubmodules.txt MAN7_TXT += gittutorial-2.txt MAN7_TXT += gittutorial.txt MAN7_TXT += gitworkflows.txt diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt index f1efc116eb..db444693dd 100644 --- a/Documentation/git-rm.txt +++ b/Documentation/git-rm.txt @@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a submodule's work tree from being removed. If you only want to remove the local checkout of a submodule from your -work tree without committing the removal, -use linkgit:git-submodule[1] `deinit` instead. +work tree without committing the removal, use linkgit:git-submodule[1] `deinit` +instead. Also see linkgit:gitsubmodules[7] for details on submodule removal. EXAMPLES diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index 74bc6200d5..032590d828 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -24,37 +24,7 @@ DESCRIPTION --- Inspects, updates and manages submodules. -A submodule allows you to keep another Git repository in a subdirectory -of your repository. The other repository has its own history, which does not -interfere with the history of the current repository. This can be used to -have external dependencies such as third party libraries for example. - -When cloning or pulling a repository containing submodules however, -these will not be checked out by default; the 'init' and 'update' -subcommands will maintain submodules checked out and at -appropriate revision in your working tree. - -Submodules are composed from a so-called `gitlink` tree entry -in the main repository that refers to a particular commit object -within the inner repository that is completely separate. -A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the -root of the source tree assigns a logical name to the submodule and -describes the default URL the submodule shall be cloned from. -The logical name can be used for overriding this URL within your -local repository configuration (see 'submodule init'). - -Submodules are not to be confused with remotes, which are other -repositories of the same project; submodules are meant for -different projects you would like to make part of your source tree, -while the history of the two projects still stays completely -independent and you cannot modify the contents of the submodule -from within the main project. -If you want to merge the project histories and want to treat the -aggregated whole as a single project from then on, you may want to -add a remote for the other project and use the 'subtree' merge strategy, -instead of treating the other project as a submodule.