Re: [RFD] Long term plan with submodule refs?

2017-11-10 Thread Jacob Keller
On Fri, Nov 10, 2017 at 12:01 PM, Stefan Beller  wrote:
>>
 Basically, a workflow where it's easier to have each submodule checked
 out at master, and we can still keep track of historical relationship
 of what commit was the submodule at some time ago, but without causing
 some of these headaches.
>>>
>>> So essentially a repo or otherwise parallel workflow just with the 
>>> versioning
>>> happening magically behind your back?
>>
>> Ideally, my developers would like to just have each submodule checked
>> out at master.
>>
>> Ideally, I'd like to be able to checkout an old version of the parent
>> project and have it recorded what version of the shared submodule was
>> at at the time.
>
> This sounds as if a "passive superproject" would work best for you, i.e.
> each commit in a submodule is bubbled up into the superproject,
> making a commit potentially even behind the scenes, such that the
> user interaction with the superproject would be none.
>
> However this approach also sounds careless, as there is no precondition
> that e.g. the superproject builds with all the submodules as is; it is a mere
> tracking of "at this time we have the submodules arranged as such",
> whereas for the versioning aspect, you would want to have commit messages
> in the superproject saying *why* you bumped up a specific submodule.
> The user may not like to give such an explanation as they already wrote
> a commit message for the individual project.
>
> Also this approach sounds like a local approach, as it is not clear to me,
> why you'd want to share the superproject history.
>
>> Ideally, my developers don't want to have to worry about knowing that
>> they shouldn't "git add -a" or "git commit -a" when they have a
>> submodule checked out at a different location from the parent projects
>> gitlink.
>>
>> Thanks,
>> Jake
>>


It doesn't need to be totally passive, in that some (or one
maintainer) can manage when the submodule pointer is actually updated,
but ideally other users don't have to worry about that and can
"pretend" to always keep each submodule at master, as they have always
done in the past.

Thanks,
Jake


Re: [RFD] Long term plan with submodule refs?

2017-11-10 Thread Stefan Beller
>
>>> Basically, a workflow where it's easier to have each submodule checked
>>> out at master, and we can still keep track of historical relationship
>>> of what commit was the submodule at some time ago, but without causing
>>> some of these headaches.
>>
>> So essentially a repo or otherwise parallel workflow just with the versioning
>> happening magically behind your back?
>
> Ideally, my developers would like to just have each submodule checked
> out at master.
>
> Ideally, I'd like to be able to checkout an old version of the parent
> project and have it recorded what version of the shared submodule was
> at at the time.

This sounds as if a "passive superproject" would work best for you, i.e.
each commit in a submodule is bubbled up into the superproject,
making a commit potentially even behind the scenes, such that the
user interaction with the superproject would be none.

However this approach also sounds careless, as there is no precondition
that e.g. the superproject builds with all the submodules as is; it is a mere
tracking of "at this time we have the submodules arranged as such",
whereas for the versioning aspect, you would want to have commit messages
in the superproject saying *why* you bumped up a specific submodule.
The user may not like to give such an explanation as they already wrote
a commit message for the individual project.

Also this approach sounds like a local approach, as it is not clear to me,
why you'd want to share the superproject history.

> Ideally, my developers don't want to have to worry about knowing that
> they shouldn't "git add -a" or "git commit -a" when they have a
> submodule checked out at a different location from the parent projects
> gitlink.
>
> Thanks,
> Jake
>


Re: [RFD] Long term plan with submodule refs?

2017-11-09 Thread Jacob Keller
On Thu, Nov 9, 2017 at 12:16 PM, Stefan Beller  wrote:
> On Wed, Nov 8, 2017 at 10:54 PM, Jacob Keller  wrote:
>> On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller  wrote:
 The relationship is indeed currently useful, but if the long term plan
 is to strongly discourage detached submodule HEAD, then I would think
 that these patches are in the wrong direction. (If the long term plan is
 to end up supporting both detached and linked submodule HEAD, then these
 patches are fine, of course.) So I think that the plan referenced in
 Junio's email (that you linked above) still needs to be discussed.
>>>
>>
>>> New type of symbolic refs
>>> =
>>> A symbolic ref can currently only point at a ref or another symbolic ref.
>>> This proposal showcases different scenarios on how this could change in the
>>> future.
>>>
>>> HEAD pointing at the superprojects index
>>> 
>>> Introduce a new symbolic ref that points at the superprojects
>>> index of the gitlink. The format is
>>>
>>>   "repo:"  '\0'  '\0'
>>>
>>> Just like existing symrefs, the content of the ref will be read and 
>>> followed.
>>> On reading "repo:", the sha1 will be obtained equivalent to:
>>>
>>> git -C  ls-files -s  | awk '{ print $2}'
>>>
>>> Ref write operations driven by the submodule, affecting symrefs
>>>   e.g. git checkout  (in the submodule)
>>>
>>> In this scenario only the HEAD is optionally attached to the superproject,
>>> so we can rewrite the HEAD to be anything else, such as a branch just fine.
>>> Once the HEAD is not pointing at the superproject any more, we'll leave the
>>> submodule alone in operations driven by the superproject.
>>> To get back on the superproject branch, we’d need to invent new UX, such as
>>>git checkout --attach-superproject
>>> as that is similar to --detach
>>>
>>
>> Some of the idea trimmed for brevity, but I like this aspect the most.
>> Currently, I work on several projects which have multiple
>> repositories, which are essentially submodules.
>>
>> However, historically, we kept them separate. 99% of the time, you can
>> use all 3 projects on "master" and everything works. But if you go
>> back in time, there's no correlation to "what did the parent project
>> want this "COMMON" folder to be at?
>
> So an environment where "git submodule update --remote" is not that
> harmful, but rather brings the joy of being up to date in each project?
>
>> I started promoting using submodules for this, since it seemed quite natural.
>>
>> The core problem, is that several developers never quite understood or
>> grasped how submodules worked. There's problems like "but what if I
>> wanna work on master?" or people assume submodules need to be checked
>> out at master instead of in a detached HEAD state.
>
> So the documentation sucks?
>
> It is intentional that from the superprojects perspective the gitlink
> must be one
> exact value, and rely on the submodule to get to and keep that state.
>
> (I think we once discussed if setting the gitlink value to 00...00 or 
> otherwise
> signal that we actually want "the most recent tip of the X branch" would be
> a good idea, but I do not think it as it misses the point of versioning)
>
>> So we often get people who don't run git submodule update and thus are
>> confused about why their submodules are often out of date. (This can
>> be solved by recursive options to commands to more often recurse into
>> submodules and checkout and update them).
>>
>> We also often get people who accidentally commit the old version of
>> the repository, or commit an update to the parent project pointing the
>> submodule at some commit which isn't yet in the upstream of the common
>> repository.
>
> Would an upstream prereceive hook (maybe even builtin and accessible via
> 'receive.denyUnreachableSubmodules') help? (It would require submodules
> to be defined with relative URLs in the .gitmodules file and then the receive
> command can check for the gitlink value present in this other repository)
>
>> The proposal here seems to match the intuition about how submodules
>> should work, with the ability to "attach" or "detach" the submodule
>> when working on the submodule directly.
>
> Well I think the big picture discussion is how easy this attaching or
> detaching is. Whether only the HEAD is attached or detached, or if we
> invent a new refstore that is a complete new submodule thing, which
> cannot be detached from the superproject at all.
>
>> Ideally, I'd like for more ways to say "ignore what my submodule is
>> checked out at, since I will have something else checked out, and
>> don't intend to commit just yet."
>
> This is in the superproject, when doing a git add . ?
>

Yes.

>> Basically, a workflow where it's easier to have each submodule checked
>> out at master, and we can still keep track of historical relationship
>> of what commit was the submodule at some time ago, but without causing
>> som

Re: [RFD] Long term plan with submodule refs?

2017-11-09 Thread Stefan Beller
On Wed, Nov 8, 2017 at 10:54 PM, Jacob Keller  wrote:
> On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller  wrote:
>>> The relationship is indeed currently useful, but if the long term plan
>>> is to strongly discourage detached submodule HEAD, then I would think
>>> that these patches are in the wrong direction. (If the long term plan is
>>> to end up supporting both detached and linked submodule HEAD, then these
>>> patches are fine, of course.) So I think that the plan referenced in
>>> Junio's email (that you linked above) still needs to be discussed.
>>
>
>> New type of symbolic refs
>> =
>> A symbolic ref can currently only point at a ref or another symbolic ref.
>> This proposal showcases different scenarios on how this could change in the
>> future.
>>
>> HEAD pointing at the superprojects index
>> 
>> Introduce a new symbolic ref that points at the superprojects
>> index of the gitlink. The format is
>>
>>   "repo:"  '\0'  '\0'
>>
>> Just like existing symrefs, the content of the ref will be read and followed.
>> On reading "repo:", the sha1 will be obtained equivalent to:
>>
>> git -C  ls-files -s  | awk '{ print $2}'
>>
>> Ref write operations driven by the submodule, affecting symrefs
>>   e.g. git checkout  (in the submodule)
>>
>> In this scenario only the HEAD is optionally attached to the superproject,
>> so we can rewrite the HEAD to be anything else, such as a branch just fine.
>> Once the HEAD is not pointing at the superproject any more, we'll leave the
>> submodule alone in operations driven by the superproject.
>> To get back on the superproject branch, we’d need to invent new UX, such as
>>git checkout --attach-superproject
>> as that is similar to --detach
>>
>
> Some of the idea trimmed for brevity, but I like this aspect the most.
> Currently, I work on several projects which have multiple
> repositories, which are essentially submodules.
>
> However, historically, we kept them separate. 99% of the time, you can
> use all 3 projects on "master" and everything works. But if you go
> back in time, there's no correlation to "what did the parent project
> want this "COMMON" folder to be at?

So an environment where "git submodule update --remote" is not that
harmful, but rather brings the joy of being up to date in each project?

> I started promoting using submodules for this, since it seemed quite natural.
>
> The core problem, is that several developers never quite understood or
> grasped how submodules worked. There's problems like "but what if I
> wanna work on master?" or people assume submodules need to be checked
> out at master instead of in a detached HEAD state.

So the documentation sucks?

It is intentional that from the superprojects perspective the gitlink
must be one
exact value, and rely on the submodule to get to and keep that state.

(I think we once discussed if setting the gitlink value to 00...00 or otherwise
signal that we actually want "the most recent tip of the X branch" would be
a good idea, but I do not think it as it misses the point of versioning)

> So we often get people who don't run git submodule update and thus are
> confused about why their submodules are often out of date. (This can
> be solved by recursive options to commands to more often recurse into
> submodules and checkout and update them).
>
> We also often get people who accidentally commit the old version of
> the repository, or commit an update to the parent project pointing the
> submodule at some commit which isn't yet in the upstream of the common
> repository.

Would an upstream prereceive hook (maybe even builtin and accessible via
'receive.denyUnreachableSubmodules') help? (It would require submodules
to be defined with relative URLs in the .gitmodules file and then the receive
command can check for the gitlink value present in this other repository)

> The proposal here seems to match the intuition about how submodules
> should work, with the ability to "attach" or "detach" the submodule
> when working on the submodule directly.

Well I think the big picture discussion is how easy this attaching or
detaching is. Whether only the HEAD is attached or detached, or if we
invent a new refstore that is a complete new submodule thing, which
cannot be detached from the superproject at all.

> Ideally, I'd like for more ways to say "ignore what my submodule is
> checked out at, since I will have something else checked out, and
> don't intend to commit just yet."

This is in the superproject, when doing a git add . ?

> Basically, a workflow where it's easier to have each submodule checked
> out at master, and we can still keep track of historical relationship
> of what commit was the submodule at some time ago, but without causing
> some of these headaches.

So essentially a repo or otherwise parallel workflow just with the versioning
happening magically behind your back?

> I've often tried to use the "--skip-worktree" bit 

Re: [RFD] Long term plan with submodule refs?

2017-11-09 Thread Stefan Beller
On Wed, Nov 8, 2017 at 9:08 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> The relationship is indeed currently useful, but if the long term plan
>>> is to strongly discourage detached submodule HEAD, then I would think
>>> that these patches are in the wrong direction. (If the long term plan is
>>> to end up supporting both detached and linked submodule HEAD, then these
>>> patches are fine, of course.) So I think that the plan referenced in
>>> Junio's email (that you linked above) still needs to be discussed.
>>
>> This email presents different approaches.
>>
>> Objective
>> =
>> This document should summarize the current situation of Git submodules
>> and start a discussion of where it can be headed long term.
>> Show different ways in which submodule refs could evolve.
>>
>> Background
>> ==
>> Submodules in Git are considered as an independet repository currently.
>> This is okay for current workflows, such as utilizing a library that is
>> rarely updated. Other workflows that require a tighter integration between
>> submodule and superproject are possible, but cumbersome as there is an
>> additional step that has to be performed, which is the update of the gitlink
>> pointer in the superproject.
>
> I do not think "rarely updaed" is an issue.
>
> The problem is that we may want to make it easier to use a
> superproject and its submodules as if the combined whole were a
> single project, which currently is not easy, primarily because
> submodules are separate entities with different set of branches that
> can be checked out independently from what branch the superproject
> is working on.

Well and this fact seems to be not a problem in the current use of submodules,
precisely because the workflow either (a) is not too cumbersome or (b)
is executed
not too often to bother enough.

> These are good starting points for copying such a combined whole to
> your local machine and start working on it.  The more interesting,
> important, and potentially difficult part is how the result of such
> work is shared back to where you started from.  "push --recursive"
> may be a simple phrase, but a sensible definition of how it should
> work won't be that simple.
...
>
> We should make detached HEAD safe against gc if it is not,
> regardless of the use of submodules.  I thought it already was made
> safe long time ago.

The detached HEAD itself is protected via its reflog (which is around
for say 2 weeks?)

If I were to develop using detached HEAD only in todays world of
submodules using different branches in the superproject, I run the risk
of loosing some commits in the submodule, as they are not the detached
HEAD all the time, but might even be loose tips.

This combined with the previous paragraph brings in another important
concern:
Some projects would have a very different history when used as a
submodule compared to when used as a stand alone project.
Other projects may be closely aligned between their branches and
what the superproject records.

So the more we deviate from the traditional branch model, the easier
we make it to have the submodule tips be very different from the
standalone tips, which may overexpose us to the gc issues as well as
the general question how much these projects have in common.

>> Use replicate refs in submodules
>> 
>> This approach will replicate the superproject refs into the submodule
>> ref namespace, e.g. git-branch learns about --recurse-submodules, which
>> creates a branch of a given name in all submodules. These (topic) branches
>> should be kept in sync with the superproject
>>
>> Pros:
>>  * This seemed intuitive to Gerrit users
>>  * 'quick' to implement, most of the commands are already there,
>>just git-branch is needed to have the workflows mentioned above complete.
>> Cons:
>>  * What does "git checkout -b A B" mean? (special case: B == HEAD)
>
> The command ran at which level?  In the superproject, or in a single
> submodule?

In the superproject, with --recurse-submodules, as the A and B would recurse
as strings, and not change meaning depending on the gitlink value.

>
>>Is the branch name replicated as a string into the submodule operation,
>>or do we dereference the superprojects gitlink and walk from there?
>
> If they are "kept in sync with the superproject", then there should
> be no difference between the two, so I do not see any room for
> wondering about that.

Except you can still break out by issuing commands in the submodule
to change the submodule refs to be different from the superproject.

This was also more along the lines of thinking about the (Gerrit) remote,
which does and okay, but not stellar job in keeping the remote branches
for superproject and submodule in sync. I'd expect glitches there.

> In other words, if there is need to worry
> about the differences between the above two, then it probably is
> fundamentally impossible to keep these in sync, and a design that
>

Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Jacob Keller
On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller  wrote:
>> The relationship is indeed currently useful, but if the long term plan
>> is to strongly discourage detached submodule HEAD, then I would think
>> that these patches are in the wrong direction. (If the long term plan is
>> to end up supporting both detached and linked submodule HEAD, then these
>> patches are fine, of course.) So I think that the plan referenced in
>> Junio's email (that you linked above) still needs to be discussed.
>

> New type of symbolic refs
> =
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
>
> HEAD pointing at the superprojects index
> 
> Introduce a new symbolic ref that points at the superprojects
> index of the gitlink. The format is
>
>   "repo:"  '\0'  '\0'
>
> Just like existing symrefs, the content of the ref will be read and followed.
> On reading "repo:", the sha1 will be obtained equivalent to:
>
> git -C  ls-files -s  | awk '{ print $2}'
>
> Ref write operations driven by the submodule, affecting symrefs
>   e.g. git checkout  (in the submodule)
>
> In this scenario only the HEAD is optionally attached to the superproject,
> so we can rewrite the HEAD to be anything else, such as a branch just fine.
> Once the HEAD is not pointing at the superproject any more, we'll leave the
> submodule alone in operations driven by the superproject.
> To get back on the superproject branch, we’d need to invent new UX, such as
>git checkout --attach-superproject
> as that is similar to --detach
>

Some of the idea trimmed for brevity, but I like this aspect the most.
Currently, I work on several projects which have multiple
repositories, which are essentially submodules.

However, historically, we kept them separate. 99% of the time, you can
use all 3 projects on "master" and everything works. But if you go
back in time, there's no correlation to "what did the parent project
want this "COMMON" folder to be at?

I started promoting using submodules for this, since it seemed quite natural.

The core problem, is that several developers never quite understood or
grasped how submodules worked. There's problems like "but what if I
wanna work on master?" or people assume submodules need to be checked
out at master instead of in a detached HEAD state.

So we often get people who don't run git submodule update and thus are
confused about why their submodules are often out of date. (This can
be solved by recursive options to commands to more often recurse into
submodules and checkout and update them).

We also often get people who accidentally commit the old version of
the repository, or commit an update to the parent project pointing the
submodule at some commit which isn't yet in the upstream of the common
repository.

The proposal here seems to match the intuition about how submodules
should work, with the ability to "attach" or "detach" the submodule
when working on the submodule directly.

Ideally, I'd like for more ways to say "ignore what my submodule is
checked out at, since I will have something else checked out, and
don't intend to commit just yet."

Basically, a workflow where it's easier to have each submodule checked
out at master, and we can still keep track of historical relationship
of what commit was the submodule at some time ago, but without causing
some of these headaches.

I've often tried to use the "--skip-worktree" bit to have people set
their repository to ignore the submodule. Unfortunately, this is
pretty complex, and most of the time, developers never remember to do
this again on a fresh clone.

Thanks,
Jake


Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Junio C Hamano
Jonathan Tan  writes:

> What if, in the submodule, we have a new ref backend that mirrors the
> superproject? When initializing the submodule, its original refs are not
> cloned at all, but instead virtual refs are used.
> ...
> These rules seem straightforward to me (although I have been working
> with Git for a while, so perhaps I'm not the best judge), and I think
> leads to a good workflow, as discussed below.

Indeed this is intriguing.

> The above rules allow the following workflow:
>  - "checkout --recurse-submodules" the branch you want on the
>superproject
>  - make whatever changes you want in each submodule
>  - commit each individual submodule (which updates the index of the
>superproject), then commit the superproject (we can introduce a
>commit --recurse-submodules to make this more convenient)

The "recurse" option would also give users an extra atomicity, and
would not be merely for convenience; when a user wants to treat a
superproject and its two submodules as if the combined whole were a
single repository, there shouldn't be two separate commits in the
history of the superproject only because two submodules made one
commit each to work on a single theme that spans all of them.



Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Junio C Hamano
Stefan Beller  writes:

>> The relationship is indeed currently useful, but if the long term plan
>> is to strongly discourage detached submodule HEAD, then I would think
>> that these patches are in the wrong direction. (If the long term plan is
>> to end up supporting both detached and linked submodule HEAD, then these
>> patches are fine, of course.) So I think that the plan referenced in
>> Junio's email (that you linked above) still needs to be discussed.
>
> This email presents different approaches.
>
> Objective
> =
> This document should summarize the current situation of Git submodules
> and start a discussion of where it can be headed long term.
> Show different ways in which submodule refs could evolve.
>
> Background
> ==
> Submodules in Git are considered as an independet repository currently.
> This is okay for current workflows, such as utilizing a library that is
> rarely updated. Other workflows that require a tighter integration between
> submodule and superproject are possible, but cumbersome as there is an
> additional step that has to be performed, which is the update of the gitlink
> pointer in the superproject.

I do not think "rarely updaed" is an issue.

The problem is that we may want to make it easier to use a
superproject and its submodules as if the combined whole were a
single project, which currently is not easy, primarily because
submodules are separate entities with different set of branches that
can be checked out independently from what branch the superproject
is working on.

> Workflows
> =
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  
> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?
> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?
> * Working on a bug fix (Changing the feature that you currently work on, 
> branches)
>   -> How does switching branches in the superproject affect submodules

These are good starting points for copying such a combined whole to
your local machine and start working on it.  The more interesting,
important, and potentially difficult part is how the result of such
work is shared back to where you started from.  "push --recursive"
may be a simple phrase, but a sensible definition of how it should
work won't be that simple.

> Possible data models and workflow implications
> ==
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.
>
> Keep everything as is, superproject and submodule have their own refs
> -
> ...
> Cons:
>  * Current tools that manage multiple repositories (e.g. repo, git-slave)
>have "branches in parallel", i.e. each repo has a branch of the same
>name, instead of using a superproject to manage the state of all repos
>involved. So users of such tools may be confused by submodules.
>  * when using a detached HEAD in the submodule, we may run into git-gc issues.

We should make detached HEAD safe against gc if it is not,
regardless of the use of submodules.  I thought it already was made
safe long time ago.

> Use replicate refs in submodules
> 
> This approach will replicate the superproject refs into the submodule
> ref namespace, e.g. git-branch learns about --recurse-submodules, which
> creates a branch of a given name in all submodules. These (topic) branches
> should be kept in sync with the superproject
>
> Pros:
>  * This seemed intuitive to Gerrit users
>  * 'quick' to implement, most of the commands are already there,
>just git-branch is needed to have the workflows mentioned above complete.
> Cons:
>  * What does "git checkout -b A B" mean? (special case: B == HEAD)

The command ran at which level?  In the superproject, or in a single
submodule?

>Is the branch name replicated as a string into the submodule operation,
>or do we dereference the superprojects gitlink and walk from there?

If they are "kept in sync with the superproject", then there should
be no difference between the two, so I do not see any room for
wondering about that.  In other words, if there is need to worry
about the differences between the above two, then it probably is
fundamentally impossible to keep these in sync, and a design that
assumes it is possible would have to expose glitches to the end-user
experience.

I do not know if glitches resulting from there would be so severe to
be show-stoppers, though.  It might be possible t

Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Jonathan Tan
On Wed,  8 Nov 2017 16:10:07 -0800
Stefan Beller  wrote:

I thought of a possible alternative and how it would work.

> Possible data models and workflow implications
> ==
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.

What if, in the submodule, we have a new ref backend that mirrors the
superproject? When initializing the submodule, its original refs are not
cloned at all, but instead virtual refs are used.

Creation of brand-new refs is forbidden in the submodule.

When reading a ref in the submodule, if that ref is the current branch
in the superproject, read the corresponding gitlink entry in the index
(which may be dirty); otherwise read the gitlink in the tree of the tip
commit.

When updating a ref in the submodule, if that ref is the current branch
in the superproject, update the index; otherwise, create a commit on top
of the tip and update the ref to point to the new tip.

No synchronicity is enforced between superproject and submodule in terms
of HEAD, though: If a submodule is currently checked out to a branch,
and the gitlink for that branch is updated through whatever means, that
is equivalent to a "git reset --soft" in the submodule.

These rules seem straightforward to me (although I have been working
with Git for a while, so perhaps I'm not the best judge), and I think
leads to a good workflow, as discussed below.

> Workflows
> =
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  

(skipping the above, since they are either solved or not a core issue)

> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?

This is perhaps one weak point of my proposal - you can't work on a
submodule as if it were independent. You can checkout a branch and make
commits, but (i) they will automatically affect the superproject, and
(ii) the "origin/foo" etc. branches are those of the superproject. (But
if you checkout a detached HEAD, everything should still work.)

> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?

The above rules allow the following workflow:
 - "checkout --recurse-submodules" the branch you want on the
   superproject
 - make whatever changes you want in each submodule
 - commit each individual submodule (which updates the index of the
   superproject), then commit the superproject (we can introduce a
   commit --recurse-submodules to make this more convenient)
 - a "push --recurse-submodules" can be implemented to push the
   superproject and its submodules independently (and the same refspec
   can be legitimately used both when pushing the superproject and when
   pushing a submodule, since the ref names are the same, and not by
   coincidence)

If the user insists on making changes on a non-current branch (i.e. by
creating commits in submodules then using "git update-ref" or
equivalent), possibly multiple commits would be created in the
superproject, but the user can still squash them later if desired.

> * Working on a bug fix (Changing the feature that you currently work on, 
> branches)
>   -> How does switching branches in the superproject affect submodules

You will have to stash or commit your changes. (Which reminds me...GC in
the subproject will need to consult the revlog of the superproject too.)

> New type of symbolic refs
> =
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
> 
> HEAD pointing at the superprojects index
> 

Assuming we don't need synchronicity, the existing HEAD format can be
retained. To clarify what happens during ref writes, I'll reuse the
scenarios Stefan described:

> Ref write operations driven by the submodule, affecting target ref
>   e.g. git commit, reset --hard, update-ref (in the submodule)
> 
> The HEAD stays the same, pointing at the superproject.
> The gitlink is changed to the target sha1, using
> 
>   git -C  update-index --add \
>   --cacheinfo 16,$SHA1,
> 
> This will affect the superprojects index, such that then a commit in
> the superproject is needed.

In this proposal, the HEAD also stays the same (pointing at the branch).

Either the index is updated or a commit is needed. If a commit is
needed, it is automatically performed.

> Ref write operations driven by the superproject, changing the gitlink
>   e.g. git checkout , git reset --hard (in the superproject)
> 
> This will

[RFD] Long term plan with submodule refs?

2017-11-08 Thread Stefan Beller
> The relationship is indeed currently useful, but if the long term plan
> is to strongly discourage detached submodule HEAD, then I would think
> that these patches are in the wrong direction. (If the long term plan is
> to end up supporting both detached and linked submodule HEAD, then these
> patches are fine, of course.) So I think that the plan referenced in
> Junio's email (that you linked above) still needs to be discussed.

This email presents different approaches.

Objective
=
This document should summarize the current situation of Git submodules
and start a discussion of where it can be headed long term.
Show different ways in which submodule refs could evolve.

Background
==
Submodules in Git are considered as an independet repository currently.
This is okay for current workflows, such as utilizing a library that is
rarely updated. Other workflows that require a tighter integration between
submodule and superproject are possible, but cumbersome as there is an
additional step that has to be performed, which is the update of the gitlink
pointer in the superproject.

Other discussions of the past:
"Re-attach HEAD?"
  https://public-inbox.org/git/20170501180058.8063-1-sbel...@google.com/
"Semantics of checkout --recursive for submodules on a branch"
  https://public-inbox.org/git/20170630003851.17288-1-sbel...@google.com/
"A new type of symref?"
  https://public-inbox.org/git/xmqqvamqg2fy@gitster.mtv.corp.google.com/

Workflows
=
* Obtaining a copy of the Superproject tightly coupled with submodules
  solved via git clone --recurse-submodules=
* Changing the submodule selection
  solved via submodule.active flags
* Changing the remote / Interacting with a different remote for all submodules
  -> need to be solved, not core issue of this discussion
* Syncing to the latest upstream
  solved via git pull --recurse  
* Working on a local feature in one submodule
  -> How do refs work spanning superproject/submodule?
* Working on a feature spanning multiple submodules
  -> How do refs work spanning multiple repos?
* Working on a bug fix (Changing the feature that you currently work on, 
branches)
  -> How does switching branches in the superproject affect submodules

This discussion should resolve around refs are handled in submodules in
relation to a superproject.

Possible data models and workflow implications
==
In the following different data models are presented, which aid a submodule
heavy workflow each giving pros and cons.

Keep everything as is, superproject and submodule have their own refs
-
In this alternative we'd just make existing commands nicer, e.g.
git-status, git-log would give information about the superprojects
gitlink similar as they give information about a remote branch.

We might want to introduce an option that triggers adding the submodule
to the superproject once a commit is done in the submodule.

Pros:
 * easiest to implement
 * easy to understand when having a git background already
 
Cons:
 * Current tools that manage multiple repositories (e.g. repo, git-slave)
   have "branches in parallel", i.e. each repo has a branch of the same
   name, instead of using a superproject to manage the state of all repos
   involved. So users of such tools may be confused by submodules.
 * when using a detached HEAD in the submodule, we may run into git-gc issues.
 

Use replicate refs in submodules

This approach will replicate the superproject refs into the submodule
ref namespace, e.g. git-branch learns about --recurse-submodules, which
creates a branch of a given name in all submodules. These (topic) branches
should be kept in sync with the superproject

Pros:
 * This seemed intuitive to Gerrit users
 * 'quick' to implement, most of the commands are already there,
   just git-branch is needed to have the workflows mentioned above complete.
Cons:
 * What does "git checkout -b A B" mean? (special case: B == HEAD)
   Is the branch name replicated as a string into the submodule operation,
   or do we dereference the superprojects gitlink and walk from there?
   When taking the superprojects gitlink, then why do we have the branches
   in the submodule in the first place? When taking the string as-is,
   then it might confuse users.
 * non-atomic of refs between superproject and submodule by design;
   This relies on superproject and submodule to stay in sync via hope.

No submodule refstore at all

Use refs and commits in the superproject to stitch submodule changes
together. Disallow branches in the submodule. This is only restricted
to the working tree inside the superproject, such that the output of git-branch
changes depending whether the working tree is in- or outside the superproject
working tree.

The messages of git-status inside the superproject working tree are changed
as "detached