Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-20 Thread Stefan Beller
On Mon, Jun 19, 2017 at 11:10 AM, Brandon Williams  wrote:

> I would probably change the first sentence to:
>
>   A submodule is another Git repository tracked at an arbitrary place
>   inside the working tree.

The tracking doesn't happen at an arbitrary place, but in
the gitlink/.gitmodules file. The location of the submodule is
at an arbitrary place within the working tree.

In a resend, I'll reword it completely (again) to focus more on the structure.


Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 11:18 AM, Jonathan Tan  wrote:
>> +DESCRIPTION
>> +---
>> +
>> +A submodule is another Git repository tracked in a subdirectory of your
>> +repository. The tracked repository has its own history, which does not
>> +interfere with the history of the current repository.
>> +
>> +Submodules are composed from a so-called `gitlink` tree entry
>> +in the main repository that refers to a particular commit object
>> +within the inner repository.
>> +
>> +Additionally to the gitlink entry the `.gitmodules` file (see
>> +linkgit:gitmodules[5]) at the root of the source tree contains
>> +information needed for submodules. The only required information
>> +is the path setting, which estabishes a logical name for the submodule.
>
> I know that this was copied over, but this is confusing to me, so it
> might be worthwhile to change it. In particular, `gitlink` is, as far as
> I know, the type of the child object,

correct

> not any sort of tree entry. I

Well a tree references (a) other trees (b) blobs or (c) gitlinks, so calling
a gitlink a tree-entry is correct, but maybe confusing.

> would rewrite this as:
>
> A submodule consists of a tracking subdirectory and an entry in the
> `.gitmodules` file (see linkgit:gitmodules[5]).

A submodule consists of a tracking subdirectory [in the working directory]
and an entry in the `.gitmodules` file (see linkgit:gitmodules[5]).

> The tracking subdirectory appears in the main repository as a
> `gitlink` object (instead of a `tree` object). The parent of the
> tracking subdirectory links to this `gitlink` object through its
> hash, just like linking to a `tree` or `blob`. This `gitlink` object
> contains the hash of a particular commit object of the submodule.
>

I think this is confusing, too. :)
That is because a subdirectory exists on the FS, whereas the gitlink
appears in gits representation of the world (in the tree). Maybe:

The tracking subdirectory appears in the main repository at
the point where it is tracked via the gitlink in the tree.
It is empty when the submodule is not populated, otherwise
it contains the content of the submodule repository.

The gitlink object contains the hash of a particular commit
object of the submodule.

The .gitmodules file establishes a relationship between the
path, which is where the gitlink is in the tree, and the logical
name, which is used for the location of the submodules git
directory. The .gitmodules file has the same syntax as the
$Git_DIR/config file. The relationship mapping of path to name
is done via setting submodule..path = .

The submodules git directory is found in in the main repositories
'$GIT_DIR/modules/' or inside the tracking subdirectory,
but this is deprecated.

> The entry in the `.gitmodules` file establishes the name of the
> submodule (to be used as a reference by other entries and commands)
> and references the tracking subdirectory as follows:
>
> submodule.my_submodule_name.path = path/to/my_submodule
>
> There might also be a need to mention that when the submodule is
> populated (or whatever the right term is), the tracking subdirectory in
> the working tree contains both the submodule's working tree and the
> submodule's Git directory.

See the alternative proposal above, the git directory is best kept outside
the tracking subdirectory, but rather contained inside the superprojects
git dir itself.

>
>> +The usual git configuration (see linkgit:git-config[1]) can be used to
>> +override settings given by the `.gitmodules` file.
>
> Not sure if this is relevant here.

This is relevant for overwriting e.g. the submodule.NAME.url setting.

>
>> +Submodules can be used for two different use cases:
>
> A creative person might come up with more, so this might be better as:
>
> Submodules can be used for at least these use cases:

ok.

>> +1. Using another project that stands on its own.
>> +  When you want to use a third party library, submodules allow you to
>> +  have a clean history for your own project as well as for the library.
>> +  This also allows for updating the third party library as needed.
>
> Probably better as:
>
> 1. Using another project while maintaining independent history.
> Submodules allow you to contain the working tree of another project
> within your own working tree while keeping the history of both
> projects separate. Also, since submodules are fixed to a hash, the

"fixed to an arbitrary version" instead?

> other project can be independently developed without affecting the
> parent project, allowing the parent project to fix itself to new
> versions only whenever desired.

>> +2. Artificially split a (logically single) project into multiple
>> +   repositories and tying them back together. This can be used to
>> +   overcome deficiences in the data model of Git, such as:
>

Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-20 Thread Jonathan Tan
On Wed,  7 Jun 2017 11:53:54 -0700
Stefan Beller  wrote:

[snip]

> +DESCRIPTION
> +---
> +
> +A submodule is another Git repository tracked in a subdirectory of your
> +repository. The tracked repository has its own history, which does not
> +interfere with the history of the current repository.
> +
> +Submodules are composed from a so-called `gitlink` tree entry
> +in the main repository that refers to a particular commit object
> +within the inner repository.
> +
> +Additionally to the gitlink entry the `.gitmodules` file (see
> +linkgit:gitmodules[5]) at the root of the source tree contains
> +information needed for submodules. The only required information
> +is the path setting, which estabishes a logical name for the submodule.

I know that this was copied over, but this is confusing to me, so it
might be worthwhile to change it. In particular, `gitlink` is, as far as
I know, the type of the child object, not any sort of tree entry. I
would rewrite this as:

A submodule consists of a tracking subdirectory and an entry in the
`.gitmodules` file (see linkgit:gitmodules[5]).

The tracking subdirectory appears in the main repository as a
`gitlink` object (instead of a `tree` object). The parent of the
tracking subdirectory links to this `gitlink` object through its
hash, just like linking to a `tree` or `blob`. This `gitlink` object
contains the hash of a particular commit object of the submodule.

The entry in the `.gitmodules` file establishes the name of the
submodule (to be used as a reference by other entries and commands)
and references the tracking subdirectory as follows:

submodule.my_submodule_name.path = path/to/my_submodule

There might also be a need to mention that when the submodule is
populated (or whatever the right term is), the tracking subdirectory in
the working tree contains both the submodule's working tree and the
submodule's Git directory.

> +The usual git configuration (see linkgit:git-config[1]) can be used to
> +override settings given by the `.gitmodules` file.

Not sure if this is relevant here.

> +Submodules can be used for two different use cases:

A creative person might come up with more, so this might be better as:

Submodules can be used for at least these use cases:

> +1. Using another project that stands on its own.
> +  When you want to use a third party library, submodules allow you to
> +  have a clean history for your own project as well as for the library.
> +  This also allows for updating the third party library as needed.

Probably better as:

1. Using another project while maintaining independent history.
Submodules allow you to contain the working tree of another project
within your own working tree while keeping the history of both
projects separate. Also, since submodules are fixed to a hash, the
other project can be independently developed without affecting the
parent project, allowing the parent project to fix itself to new
versions only whenever desired.

> +2. Artificially split a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome deficiences in the data model of Git, such as:

This should match the gerund used in point 1:

2. Splitting a (logically single) project into multiple
repositories. This can be used to overcome deficiencies in the data
model of Git, such as:

> +* To have finer grained access control.
> +  The design principles of Git do not allow for partial repositories to be
> +  checked out or transferred. A repository is the smallest unit that a user
> +  can be given access to. Submodules are separate repositories, such that
> +  you can restrict access to parts of your project via the use of submodules.

Not sure about this point - if the project is logically single, you
would probably need to see the entire project. If this is about
different teams independently working on different subcomponents, this
seems more like point 1 (inclusion of other projects).

> +* In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large. For that you may want to 
> look
> +  at shallow clone, sparse checkout, or git-LFS.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.
> +
> +The data model
> +--
> +
> +A submodule can be considered its own autonomous repository, that has a
> +worktree and a git directory at a different place than the superproject.

Isn't the worktree inside the superproject's worktree? I would write
this as:

A submodule is its own repository, having its own working tree and
Git directory.

> +The superproject only records the commit object name in its tree, such that
> +any other information, e.g. where to obtain a copy 

Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-19 Thread Brandon Williams
On 06/13, Stefan Beller wrote:
> Adding two native speakers as we start word smithing.
> 
> On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamano  wrote:
> 
> >> +
> >> +A submodule is another Git repository tracked in a subdirectory of your
> >> +repository. The tracked repository has its own history, which does not
> >> +interfere with the history of the current repository.
> >
> > "tracked in a subdirectory" sounds as if your top-level superproject
> > has a dedicated submodules/ directory and in it there live a bunch
> > of submodules.  Which obviously is not what you meant.  If phrased
> > "tracked as a subdirectory", I think the sentence makes sense.
> 
> Given this explanation "as a" also sounds wrong[1], maybe we need to
> separate (1) where it is put/mounted and (2) the fact that is tracked,
> i.e. the superproject has an idea of what should be there at a given
> revision. (I shortly thought about /s/as a/using/ in the above, but):
> 
>   A submodule is another Git repository at an arbitrary place inside
>   the working tree, and also tracked. The tracked repository has its
>   own history, which does not interfere with the history of the current
>   repository.

I would probably change the first sentence to:

  A submodule is another Git repository tracked at an arbitrary place
  inside the working tree.

> 
> [1] http://www.thesaurus.com/browse/as
> 
> >
> > While "which does not interfere" may be technically correct, I am
> > not sure what the value of saying that is.
> 
> I think we can drop it here. When writing I wanted to separate it from
> subtrees, but this is the wrong place for that.
> 
> >
> >> +Submodules are composed from a so-called `gitlink` tree entry
> >> +in the main repository that refers to a particular commit object
> >> +within the inner repository.
> >
> > Correct, but it may be unclear to the readers why we do so.  Perhaps
> >
> > ... and this way, the tree of each commit in the main repository
> > "knows" which commit from the submodule's history is "tied" to it.
> >
> > or something like that?
> 
> sounds good to me.
> 
> >
> >> +Additionally to the gitlink entry the `.gitmodules` file (see
> >> +linkgit:gitmodules[5]) at the root of the source tree contains
> >> +information needed for submodules.
> >
> > Is that really true?  Each submodule do not *need* what is in
> > .gitmodules; the top-level superproject needs to learn about
> > its submodules from the contents of that file, though.
> 
> Ha! The ediled words in my mind were:
> 
>  ... information needed for submodules [to work in the superproject].
> 
> But maybe we need to reword that as
> 
>   Additionally to the gitlink entry the `.gitmodules` file (see
>   linkgit:gitmodules[5]) at the root of the source tree contains
>   information on how to handle submodules.

This sounds slightly awkward.  Maybe:

In addition to the gitlink entry, the `.gitmodules` file (see
linkgit:gitmodules[5]) at the root of the source tree contains
information on how to handle submodules.


-- 
Brandon Williams


Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-13 Thread Stefan Beller
Adding two native speakers as we start word smithing.

On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamano  wrote:

>> +
>> +A submodule is another Git repository tracked in a subdirectory of your
>> +repository. The tracked repository has its own history, which does not
>> +interfere with the history of the current repository.
>
> "tracked in a subdirectory" sounds as if your top-level superproject
> has a dedicated submodules/ directory and in it there live a bunch
> of submodules.  Which obviously is not what you meant.  If phrased
> "tracked as a subdirectory", I think the sentence makes sense.

Given this explanation "as a" also sounds wrong[1], maybe we need to
separate (1) where it is put/mounted and (2) the fact that is tracked,
i.e. the superproject has an idea of what should be there at a given
revision. (I shortly thought about /s/as a/using/ in the above, but):

  A submodule is another Git repository at an arbitrary place inside
  the working tree, and also tracked. The tracked repository has its
  own history, which does not interfere with the history of the current
  repository.

[1] http://www.thesaurus.com/browse/as

>
> While "which does not interfere" may be technically correct, I am
> not sure what the value of saying that is.

I think we can drop it here. When writing I wanted to separate it from
subtrees, but this is the wrong place for that.

>
>> +Submodules are composed from a so-called `gitlink` tree entry
>> +in the main repository that refers to a particular commit object
>> +within the inner repository.
>
> Correct, but it may be unclear to the readers why we do so.  Perhaps
>
> ... and this way, the tree of each commit in the main repository
> "knows" which commit from the submodule's history is "tied" to it.
>
> or something like that?

sounds good to me.

>
>> +Additionally to the gitlink entry the `.gitmodules` file (see
>> +linkgit:gitmodules[5]) at the root of the source tree contains
>> +information needed for submodules.
>
> Is that really true?  Each submodule do not *need* what is in
> .gitmodules; the top-level superproject needs to learn about
> its submodules from the contents of that file, though.

Ha! The ediled words in my mind were:

 ... information needed for submodules [to work in the superproject].

But maybe we need to reword that as

  Additionally to the gitlink entry the `.gitmodules` file (see
  linkgit:gitmodules[5]) at the root of the source tree contains
  information on how to handle submodules.

I'd like to keep this part short and not go into detail.

>
>> +The only required information
>> +is the path setting, which estabishes a logical name for the submodule.
>
> The phrase "the path setting" feels a bit unfortunate.  Is that
> "only" thing we need?  Without URL we have no way to populate it,
> no?

git config -f .gitmodules submodule.foo.path foo
git config submodule.foo.url example.org/foo
git submodule update --init

ought to work just fine. It is not the recommended way of working,
but it should work.

I think (in the far future) we actually should only have the path information
in-tree and *any* other information outside the tree, which includes the URL,

See[2], where I state how I'd like to shape the future:

  $ cat .gitmodules
  [submodule "sub42"]
path = foo
  # path only in tree!

  $ cat .git/config
  ...
  [submodule]
active = .
active = :(exclude)Irrelevant/submodules/for/my/usecase/*
  # note how this is user centric

  $ git show refs/meta/magic/for/refs/heads/master:.gitmodules
  [submodule "sub42"]
url = https://example.org/foo
branch = .
  # Note how this is neither centering on the in-tree
  # contents, nor the user. Instead it focuses on the
  # project or group. It is *workflow* centric.
  # Workflows may change over time, e.g. the url could
  # be repointed to k.org or an in-house mirror without tree
  # changes.

Jonathan pointed out the ref name is chosen poorly, but conceptually
I would want to keep the URL setting outside the tree. The URL may
change over time, independently from the history currently checked out
(think of bisect, that includes an "submodule update --init" to bisect across
a fully populated superproject 'at the time')

[2] 
https://public-inbox.org/git/cagz79kbbtwqicvkrs51fv91r_7zhdtc+fr8z-sqzrpf2cjf...@mail.gmail.com/




>
>> +The usual git configuration (see linkgit:git-config[1]) can be used to
>> +override settings given by the `.gitmodules` file.
>> +
>> +Submodules can be used for two different use cases:
>> +
>> +1. Using another project that stands on its own.
>> +  When you want to use a third party library, submodules allow you to
>> +  have a clean history for your own project as well as for the library.
>> +  This also allows for updating the third party library as needed.
>> +
>> +2. Artificially split a (logically single) project into multiple
>> +   repositories and tying them back together. This can be used to
>> +   

Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-13 Thread Junio C Hamano
Stefan Beller  writes:

> @@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] ...)::
>   tree. Further calls to `git submodule update`, `git submodule foreach`
>   and `git submodule sync` will skip any unregistered submodules until
>   they are initialized again, so use this command if you don't want to
> - have a local checkout of the submodule in your working tree anymore. If
> - you really want to remove a submodule from the repository and commit
> - that use linkgit:git-rm[1] instead.
> + have a local checkout of the submodule in your working tree anymore.
>  +
>  When the command is run without pathspec, it errors out,
>  instead of deinit-ing everything, to prevent mistakes.
>  +
>  If `--force` is specified, the submodule's working tree will
>  be removed even if it contains local modifications.
> ++
> +If you really want to remove a submodule from the repository and commit
> +that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
> +options.

Good reorganization.

> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> new file mode 100644
> index 00..2bf3149b68
> --- /dev/null
> +++ b/Documentation/gitsubmodules.txt
> @@ -0,0 +1,214 @@
> +gitsubmodules(7)
> +
> +
> +NAME
> +
> +gitsubmodules - mounting one repository inside another
> +
> +SYNOPSIS
> +
> +.gitmodules, $GIT_DIR/config
> +--
> +git submodule
> +git  --recurse-submodules
> +--
> +
> +DESCRIPTION
> +---
> +
> +A submodule is another Git repository tracked in a subdirectory of your
> +repository. The tracked repository has its own history, which does not
> +interfere with the history of the current repository.

"tracked in a subdirectory" sounds as if your top-level superproject
has a dedicated submodules/ directory and in it there live a bunch
of submodules.  Which obviously is not what you meant.  If phrased
"tracked as a subdirectory", I think the sentence makes sense.

While "which does not interfere" may be technically correct, I am
not sure what the value of saying that is.

> +Submodules are composed from a so-called `gitlink` tree entry
> +in the main repository that refers to a particular commit object
> +within the inner repository.

Correct, but it may be unclear to the readers why we do so.  Perhaps

... and this way, the tree of each commit in the main repository
"knows" which commit from the submodule's history is "tied" to it.

or something like that?

> +Additionally to the gitlink entry the `.gitmodules` file (see
> +linkgit:gitmodules[5]) at the root of the source tree contains
> +information needed for submodules.

Is that really true?  Each submodule do not *need* what is in
.gitmodules; the top-level superproject needs to learn about
its submodules from the contents of that file, though.

> +The only required information
> +is the path setting, which estabishes a logical name for the submodule.

The phrase "the path setting" feels a bit unfortunate.  Is that
"only" thing we need?  Without URL we have no way to populate it,
no?

> +The usual git configuration (see linkgit:git-config[1]) can be used to
> +override settings given by the `.gitmodules` file.
> +
> +Submodules can be used for two different use cases:
> +
> +1. Using another project that stands on its own.
> +  When you want to use a third party library, submodules allow you to
> +  have a clean history for your own project as well as for the library.
> +  This also allows for updating the third party library as needed.
> +
> +2. Artificially split a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome deficiences in the data model of Git, such as:

s/deficiences in the data model/current limitations/ perhaps?

> +* To have finer grained access control.
> +  The design principles of Git do not allow for partial repositories to be
> +  checked out or transferred. A repository is the smallest unit that a user
> +  can be given access to. Submodules are separate repositories, such that
> +  you can restrict access to parts of your project via the use of submodules.

Some servers implement per-branch access control that seems to work
rather well.  Given that "shallow history" is possible (i.e. you
could give one commit without exposing older parts of the history),
I think the limitation this paragrah refers to is that "a tree is
the smallest unit that the user can be given access to."

> +* In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large. For that you may want to 
> look
> +  at shallow clone, sparse checkout, or git-LFS.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.

This is why I suggest 

[RFC/PATCH] submodules: overhaul documentation

2017-06-07 Thread Stefan Beller
This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.

To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.

The moved part of text has been slightly restructured:
* Rewrite first paragraph ("allows" is wrong. For example you can keep
  untracked repos as well, submodules enable tracking across versions)
  (Also remove short example as we have examples later)

* Remove "that is completely separate" from the second sentence as
  that was said in the first sentence.

* Introduce the gitmodules file in the third paragraph, mention name
  as the basic requirement. The URL is optional though strongly
  suggested. Leave it out as gitmodules(5) explains the url.

* The paragraphs about other mechanisms and implementation details
  are moved further down, as they are not as relevant to the concept of
  gitmodules.

Signed-off-by: Stefan Beller 
---

This is kind of a resend from [RFC-PATCHv2] submodules: add a background story
https://public-inbox.org/git/20170209020855.23486-1-sbel...@google.com/
but the new man page is completely reworked, so I'd expect it go over better
for the first half at least.

(In the "data model" section it begins to differ from reality,
as it mentions a new not-yet-implemented place where to put submodule
related config)

Thanks,
Stefan

 Documentation/Makefile  |   1 +
 Documentation/git-rm.txt|   4 +-
 Documentation/git-submodule.txt |  44 ++---
 Documentation/gitsubmodules.txt | 214 
 4 files changed, 227 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..2415e0d657 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..db444693dd 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a 
submodule's work
 tree from being removed.
 
 If you only want to remove the local checkout of a submodule from your
-work tree without committing the removal,
-use linkgit:git-submodule[1] `deinit` instead.
+work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
+instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
 
 EXAMPLES
 
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 74bc6200d5..032590d828 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 ---
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule.