Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-21 Thread Stefan Xenos
>   I don't have a strong opinion about whether this would go in the
> design doc.  I suppose the doc could have an "implementation plan"
> section describing temporary stopping points on the way to the final
> result, but it's not necessary to include that.

As long as this is something I'm just doing for fun and nobody needs
to coordinate anything with me, I was planning to just document the
endpoint and then work on whatever seems interesting at any given
moment. Of course, if I found a job/team that would let me do this as
my day job, I'd be more willing to commit to deliverables.

  - Stefan
On Tue, Nov 20, 2018 at 5:33 PM Jonathan Nieder  wrote:
>
> Stefan Xenos wrote:
> > Jonathan Nieder wrote:
>
> >> putting it in the commit message is a way to
> >> experiment with the workflow without changing the object format
> >
> > As long as we're talking about a temporary state of affairs for users
> > that have opted in, and we're explicit about the fact that future
> > versions of git won't understand the change graphs that are produced
> > during that temporary state of affairs, I'm fine with using the commit
> > message. We can move it to the header prior to enabling the feature by
> > default.
>
> Yay!  I think that addresses both my and Ævar's concerns.  Also, if
> you run into an issue that requires changing the object format
> earlier, that's fine and we can handle the situation when it comes.
>
> I don't have a strong opinion about whether this would go in the
> design doc.  I suppose the doc could have an "implementation plan"
> section describing temporary stopping points on the way to the final
> result, but it's not necessary to include that.
>
> Thanks for the quick and thoughtful replies.
>
> Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-21 Thread Phillip Wood
Hi Stefan

On 20/11/2018 20:24, Stefan Xenos wrote:
>> If a merge has been cherry-picked we cannot update it as we don't record
>> which parent was used for the pick, however it is probably not a problem
>> in practice - I think it is unusual to amend merges.
> 
> I've read and reread that sentence several times and don't fully
> understand it. Could you elaborate?

Sorry if I wasn't very clear. To cherry-pick (or revert) a merge commit
one has to specify a parent of the commit being picked with -m for
cherry-pick to use as the merge base for the three way merge that
creates the new commit. If the original merge is updated then evolve
wont know which parent to use as the merge base when evolving the
cherry-picked version of the merge as the parent is not recorded in the
meta commit.

> It sounds scary, though. With the evolve command, amending merges will
> need to be supported.

Evolving a merge should be fine, I was just referring to merges that
have been cherry-picked.


Best Wishes

Phillip

(Thanks for your reply to my other message, I'm still digesting it at
the moment, once I've done that and found the references to mercurial
using commit obsolescence information in rebase and pull I'll reply.)

> If you create a merge and then amend one of its
> parent commits, the evolve command will need to rebase the merge and
> point one or both parents to the replacement instead.
> 
>   - Stefan
> On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood  
> wrote:
>>
>> On 15/11/2018 00:55, sxe...@google.com wrote:
>>> From: Stefan Xenos 
>>>
>>> +Obsolescence across cherry-picks
>>> +
>>> +By default the evolve command will treat cherry-picks and squash merges as 
>>> being
>>> +completely separate from the original. Further amendments to the original 
>>> commit
>>> +will have no effect on the cherry-picked copy. However, this behavior may 
>>> not be
>>> +desirable in all circumstances.
>>> +
>>> +The evolve command may at some point support an option to look for cases 
>>> where
>>> +the source of a cherry-pick or squash merge has itself been amended, and
>>> +automatically apply that same change to the cherry-picked copy. In such 
>>> cases,
>>> +it would traverse origin edges rather than ignoring them, and would treat a
>>> +commit with origin edges as being obsolete if any of its origins were 
>>> obsolete.
>>
>> If a merge has been cherry-picked we cannot update it as we don't record
>> which parent was used for the pick, however it is probably not a problem
>> in practice - I think it is unusual to amend merges.
>>
>> Best Wishes
>>
>> Phillip



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Jonathan Nieder
Stefan Xenos wrote:
> Jonathan Nieder wrote:

>> putting it in the commit message is a way to
>> experiment with the workflow without changing the object format
>
> As long as we're talking about a temporary state of affairs for users
> that have opted in, and we're explicit about the fact that future
> versions of git won't understand the change graphs that are produced
> during that temporary state of affairs, I'm fine with using the commit
> message. We can move it to the header prior to enabling the feature by
> default.

Yay!  I think that addresses both my and Ævar's concerns.  Also, if
you run into an issue that requires changing the object format
earlier, that's fine and we can handle the situation when it comes.

I don't have a strong opinion about whether this would go in the
design doc.  I suppose the doc could have an "implementation plan"
section describing temporary stopping points on the way to the final
result, but it's not necessary to include that.

Thanks for the quick and thoughtful replies.

Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> putting it in the commit message is a way to
> experiment with the workflow without changing the object format

As long as we're talking about a temporary state of affairs for users
that have opted in, and we're explicit about the fact that future
versions of git won't understand the change graphs that are produced
during that temporary state of affairs, I'm fine with using the commit
message. We can move it to the header prior to enabling the feature by
default.

- Stefan



On Tue, Nov 20, 2018 at 2:06 PM Jonathan Nieder  wrote:
>
> Stefan Xenos wrote:
> > On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
> >  wrote:
>
> >> I think it sounds better to just make it, in the header:
> >>
> >> x-evolve-pt content
> >> x-evolve-pt obsolete
> >> x-evolve-pt origin
> >>
> >> Where "pt = parent-type", we could of course spell that out too, but in
> >> this case it's "x-evolve-pt" is the exact same number of bytes as
> >> "parent-type", so nobody can object that it takes more space:)
> >>
> >> We'd then carry some documentation where we say everything except "x-*-"
> >> is reserved, and that we'd like to know about new "*" there before it's
> >> used, so it can be documented.
> [...]
> >  that should
> > probably be the subject of a separate proposal (who owns the content
> > of a namespace, what is the process for adding a new namespace or a
> > new attribute within a namespace, what order should the header
> > attributes appear in, what problem is namespacing there to solve, when
> > do we use a namespaced attribute versus a "reserved" attribute, etc.).
>
> Agreed.  There are reasons that I prefer not to go in this direction,
> but regardless, it would be the subject of a separate thread if you want
> to pursue it.
>
> >> Putting it in the commit message just sounds like a hack around not
> >> having namespaced headers. If we'd like to keep this then tools would
> >> need to parse both (potentially unpacking a lot of the commit message
> >> object, it can be quite big in some cases...).
>
> On the contrary: putting it in the commit message is a way to
> experiment with the workflow without changing the object format at
> all.
>
> I don't think we should underestimate the value of that ability.
>
> I don't understand what you're referring to by parsing both.  Are you
> saying that if the experiment proves successful, we wouldn't be able
> to migrate completely to a new format?  That sounds worrying to me ---
> I want the ability to experiment and to act on what we learn from an
> experiment, including when it touches on formats.
>
> Thanks,
> Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Jonathan Nieder
Stefan Xenos wrote:
> On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
>  wrote:

>> I think it sounds better to just make it, in the header:
>>
>> x-evolve-pt content
>> x-evolve-pt obsolete
>> x-evolve-pt origin
>>
>> Where "pt = parent-type", we could of course spell that out too, but in
>> this case it's "x-evolve-pt" is the exact same number of bytes as
>> "parent-type", so nobody can object that it takes more space:)
>>
>> We'd then carry some documentation where we say everything except "x-*-"
>> is reserved, and that we'd like to know about new "*" there before it's
>> used, so it can be documented.
[...]
>  that should
> probably be the subject of a separate proposal (who owns the content
> of a namespace, what is the process for adding a new namespace or a
> new attribute within a namespace, what order should the header
> attributes appear in, what problem is namespacing there to solve, when
> do we use a namespaced attribute versus a "reserved" attribute, etc.).

Agreed.  There are reasons that I prefer not to go in this direction,
but regardless, it would be the subject of a separate thread if you want
to pursue it.

>> Putting it in the commit message just sounds like a hack around not
>> having namespaced headers. If we'd like to keep this then tools would
>> need to parse both (potentially unpacking a lot of the commit message
>> object, it can be quite big in some cases...).

On the contrary: putting it in the commit message is a way to
experiment with the workflow without changing the object format at
all.

I don't think we should underestimate the value of that ability.

I don't understand what you're referring to by parsing both.  Are you
saying that if the experiment proves successful, we wouldn't be able
to migrate completely to a new format?  That sounds worrying to me ---
I want the ability to experiment and to act on what we learn from an
experiment, including when it touches on formats.

Thanks,
Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> If a merge has been cherry-picked we cannot update it as we don't record
> which parent was used for the pick, however it is probably not a problem
> in practice - I think it is unusual to amend merges.

I've read and reread that sentence several times and don't fully
understand it. Could you elaborate?

It sounds scary, though. With the evolve command, amending merges will
need to be supported. If you create a merge and then amend one of its
parent commits, the evolve command will need to rebase the merge and
point one or both parents to the replacement instead.

  - Stefan
On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood  wrote:
>
> On 15/11/2018 00:55, sxe...@google.com wrote:
> > From: Stefan Xenos 
> >
> > +Obsolescence across cherry-picks
> > +
> > +By default the evolve command will treat cherry-picks and squash merges as 
> > being
> > +completely separate from the original. Further amendments to the original 
> > commit
> > +will have no effect on the cherry-picked copy. However, this behavior may 
> > not be
> > +desirable in all circumstances.
> > +
> > +The evolve command may at some point support an option to look for cases 
> > where
> > +the source of a cherry-pick or squash merge has itself been amended, and
> > +automatically apply that same change to the cherry-picked copy. In such 
> > cases,
> > +it would traverse origin edges rather than ignoring them, and would treat a
> > +commit with origin edges as being obsolete if any of its origins were 
> > obsolete.
>
> If a merge has been cherry-picked we cannot update it as we don't record
> which parent was used for the pick, however it is probably not a problem
> in practice - I think it is unusual to amend merges.
>
> Best Wishes
>
> Phillip


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> This explains why we have 'origin' fields in the meta commits, it might
> be worth putting a forward reference or note earlier on to explain why
> recording the origin is useful. (I didn't find gerrit needs it very
> convincing on its own but it is actually more general than gerrit's
> specific use case)

I'll add the forward reference.

TBH, gerrit is the main reason I added it - so I'm interested in why
you didn't find the gerrit use-case convincing. Can you elaborate? (If
there's some other way around the gerrit requirement, we might not
need the origin parents)

> Should this be meta/mychange:refs/for/master or have I missed something?

It should be metas/mychange/ It's already fixed in the v2 patch.

I really wanted to use the namespace "changes", but gerrit is
squatting on that. I tried "change", but that brakes the plural naming
scheme and may get confused with gerrit's namespace, so I settled on
"metas".

> I think it would make sense to have this next to the sections on commit
> --amend and merge I was wondering what about rebase when I was reading
> those sections.

Will do.

> I'm a bit confused why it is creating a meta ref per commit rather than
> one for the current branch.

I tried to explain that later in the doc. meta refs serve two purposes
- they act as stable names for changes (or at least the commits at the
head of each change) and they point to the metacommits that are
currently in use. For both purposes, we need a ref per commit. For the
"stable name" case, this should be obvious - something that just
points to a branch couldn't provide different names for each commit on
that branch. The metacommit case is less obvious - the set of
metacommits for one change aren't connected to the metacommits for any
other change. The "parents" of a metacommit are older versions of the
same change. They don't point to the metacommits from the parent
change. That means that there is no single ref we could create for a
branch that would reach all the necessary metacommits.

> I got the impression they had put quite a lot of effort
> into having evolve automatically run and resolve divergences when
> pulling and rebasing, is there a long term plan for git to do the same?

IMO, we should add anything to the plan if doing so improves the
workflow of our users... but it sounds like you're referring to
mercurial features I've never used. Could you point me to specific
docs on the feature you want and/or make a concrete suggestion about
how it might work?

I never use pull so it slipped my mind. It would probably make sense
to have the option of doing an automatic evolve after pull (actually,
once the feature is stable, most users would probably want it to be
the default). How do you think it should be triggered? "git pull
--evolve"? or perhaps "git pull --rebase=evolve"? We should probably
also introduce a new "evolve" enum value to branch..rebase
config value. I'll use "--evolve" for now. If may make sense to add
"--evolve" to every git command that performs an automatic evolve when
done.

> What happens if the original commit are currently checked out with local
> changes?

For a start, I'll probably just display an error message if the
current working tree is dirty ("Please stash"). Long term, I'd like it
to work like rebase --autostash. It should stash your changes, do the
evolve, return to the evolved version of the original change, and
reapply the stash. I'll add this to the doc.

> Can I suggest using refs/remote//metas. I

Ooh! Great idea! I'll update the doc.

> I think this could be useful (although I guess you can get the branches
> you've been working on recently from HEAD's reflog quite easily).

The changes list is different from the reflog. It's a list of all your
unsubmitted patches - regardless of their age or what branch they're
on. They may not have corresponding branches: you may have been
working on them with a detached head, or there may be multiple changes
on the same branch. You might not have visited them recently, in which
case they wouldn't be in the reflog at all. You may have reset to an
older version of the change, in which case they'd be in the reflog but
the reflog and change point to different places. If you've used gerrit
before, the "changes" list will contain pretty much the same content
as the gerrit dashboard, except that it works locally.

>> +Much like a merge conflict, divergence is a situation that requires user
>> +intervention to resolve. The evolve command will stop when it encounters
>> +divergence and prompt the user to resolve the problem. Users can solve the
>> +problem in several ways:
>> +
>> +- Discard one of the changes (by deleting its change branch).
>> +- Merge the two changes (producing a single change branch).
>
>I assume this wont create merge commits for the actual commits though,
>just merge the meta branches and create some new commits that are each
>the result of something like 'merge-recursive original-commit
>our-new-version 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
This sounds like a proposal for general namespacing. I like it - that
would pave the way for other header extensions - but that should
probably be the subject of a separate proposal (who owns the content
of a namespace, what is the process for adding a new namespace or a
new attribute within a namespace, what order should the header
attributes appear in, what problem is namespacing there to solve, when
do we use a namespaced attribute versus a "reserved" attribute, etc.).

x-evolve-pt seems reasonable to me. If you're keen on this and want to
document the namespacing proposal, I'll conform to it. However, if
don't have formal rules for namespaces in place yet it might be better
to avoid the use of an x- prefix for now, just in case I accidentally
squat on a name that breaks whatever namespacing rules we eventually
come up with.

Since we're talking bytes, a more compact representation of
parent-type could use single-letter codes:
x-evolve-pt c r o
(where c=content, r=replace/obsolete, o=origin)

  - Stefan
On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Tue, Nov 20 2018, Jonathan Nieder wrote:
>
> > Ævar Arnfjörð Bjarmason wrote:
> >> On Thu, Nov 15 2018, sxe...@google.com wrote:
> >
> >>> +Parent-type
> >>> +---
> >>> +The “parent-type” field in the commit header identifies a commit as a
> >>> +meta-commit and indicates the meaning for each of its parents. It is 
> >>> never
> >>> +present for normal commits.
> > [...]
> >> I think it's worth pointing out for those that are rusty on commit
> >> object details (but I checked) is that the reason for it not being:
> >>
> >> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> >> parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> >> parent-type content
> >> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> >> parent-type obsolete
> >> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> >> parent-type origin
> >> author Stefan Xenos  1540841596 -0700
> >> committer Stefan Xenos  1540841596 -0700
> >>
> >> Which would be easier to read, is that we're very sensitive to the order
> >> of the first few fields (tree -> parent -> author -> committer) and fsck
> >> will error out if we interjected a new field.
> >
> > By the way, in the spirit of limiting the initial scope, I wonder
> > whether the parent-type fields can be stored in the commit message
> > initially.
> >
> > Elsewhere in this thread it was mentioned that the parent-type is a
> > field to allow tools like "git fsck" to understand the meaning of
> > these parent relationships (for example, to forbid a commit
> > referencing a meta-commit).  The same could be done using special
> > commit message text, though.
> >
> > The advantage of such an approach would be that we could experiment
> > without changing the official object format at all.  If experiments
> > revealed a different set of information to store, we could update the
> > format without having to maintain the memory of the older format in
> > "git fsck"'s understanding of commit object fields.  So even though I
> > think that in the end we would want to put this information in the
> > commit object header, I'm tempted to suspect that the benefits of
> > putting it in the commit message to start outweigh the costs (in
> > particular, of having to migrate to another format later).
>
> I think it sounds better to just make it, in the header:
>
> x-evolve-pt content
> x-evolve-pt obsolete
> x-evolve-pt origin
>
> Where "pt = parent-type", we could of course spell that out too, but in
> this case it's "x-evolve-pt" is the exact same number of bytes as
> "parent-type", so nobody can object that it takes more space:)
>
> We'd then carry some documentation where we say everything except "x-*-"
> is reserved, and that we'd like to know about new "*" there before it's
> used, so it can be documented.
>
> Putting it in the commit message just sounds like a hack around not
> having namespaced headers. If we'd like to keep this then tools would
> need to parse both (potentially unpacking a lot of the commit message
> object, it can be quite big in some cases...).


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> I was trying to see if this is something we can leave out to limit the 
> initial scope.

Oh, in that case, "yes". :-) If there's a need to cut something,
origin parents would be a viable candidate.

I was thinking that this file could document the final goal so that if
anyone else wanted to contribute to the implementation, we would be
heading in the same direction. It seems reasonable that an early
implementation may omit origin parents. Since the actual
implementation will lag behind the spec, I'll add a status section to
the top of the document where we can describe the delta between plan
and implementation.

Also, I'm now convinced we're talking about the same thing. :-)

> > Are you claiming that this is undesirable, or are you claiming that
> > this could be accomplished without origin parents?
>
> I was trying to see if this is something we can leave out to limit
> the initial scope.


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 

+Obsolescence across cherry-picks
+
+By default the evolve command will treat cherry-picks and squash merges as 
being
+completely separate from the original. Further amendments to the original 
commit
+will have no effect on the cherry-picked copy. However, this behavior may not 
be
+desirable in all circumstances.
+
+The evolve command may at some point support an option to look for cases where
+the source of a cherry-pick or squash merge has itself been amended, and
+automatically apply that same change to the cherry-picked copy. In such cases,
+it would traverse origin edges rather than ignoring them, and would treat a
+commit with origin edges as being obsolete if any of its origins were obsolete.


If a merge has been cherry-picked we cannot update it as we don't record 
which parent was used for the pick, however it is probably not a problem 
in practice - I think it is unusual to amend merges.


Best Wishes

Phillip


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

On 20/11/2018 12:18, Phillip Wood wrote:

On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 
+Divergence
+--
+From the user’s perspective, two changes are divergent if they both 
ask for
+different replacements to the same commit. More precisely, a target 
commit is
+considered divergent if there is more than one commit at the head of 
a change in
+refs/metas that leads to the target commit via an unbroken chain of 
“obsolete”

+edges.
+
+Much like a merge conflict, divergence is a situation that requires user
+intervention to resolve. The evolve command will stop when it encounters
+divergence and prompt the user to resolve the problem. Users can 
solve the

+problem in several ways:
+
+- Discard one of the changes (by deleting its change branch).
+- Merge the two changes (producing a single change branch).


I assume this wont create merge commits for the actual commits though, 
just merge the meta branches and create some new commits that are each 
the result of something like 'merge-recursive original-commit 
our-new-version their-new-version'


That should have been

merge-recursive original-commit^ -- our-new-version their-new-version

Best Wishes

Phillip



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

Hi Stefan

Thanks for working on this, I think it could be a really useful addition 
to git. I'd echo Gábor's comments about making commands descriptive and 
easy for the user to find, git has aliases, accepts abbreviated option 
names and has shell completion so I don't think typing is really such a 
problem. From your reply it looks like you've taken those concerns on 
board. I've got some more comments below.


On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.

Signed-off-by: Stefan Xenos 
---
  Documentation/technical/evolve.txt | 885 +
  1 file changed, 885 insertions(+)
  create mode 100644 Documentation/technical/evolve.txt

diff --git a/Documentation/technical/evolve.txt 
b/Documentation/technical/evolve.txt
new file mode 100644
index 00..88470eada3
--- /dev/null
+++ b/Documentation/technical/evolve.txt
@@ -0,0 +1,885 @@
+Git Obsolescence Graph
+==
+
+Objective
+-
+Track the edits to a commit over time in an obsolescence graph.
+
+Background
+--
+Imagine you have three dependent changes up for review and you receive feedback
+that requires editing all three changes. While you're editing one, more 
feedback
+arrives on one of the others. What do you do?
+
+The evolve command is a convenient way to work with chains of commits that are
+under review. Whenever you rebase or amend a commit, the repository remembers
+that the old commit is obsolete and has been replaced by the new one. Then, at
+some point in the future, you can run "git evolve" and the correct sequence of
+rebases will occur in the correct order such that no commit has an obsolete
+parent.
+
+Part of making the "evolve" command work involves tracking the edits to a 
commit
+over time, which is why we need an obsolescence graph. However, the 
obsolescence
+graph will also bring other benefits:
+
+- Users can view the history of a commit directly (the sequence of amends and
+  rebases it has undergone, orthogonal to the history of the branch it is on).
+- It will be possible to quickly locate and list all the changes the user
+  currently has in progress.
+- It can be used as part of other high-level commands that combine or split
+  changes.
+- It can be used to decorate commits (in git log, gitk, etc) that are either
+  obsolete or are the tip of a work in progress.
+- By pushing and pulling the obsolescence graph, users can collaborate more
+  easily on changes-in-progress. This is better than pushing and pulling the
+  changes themselves since the obsolescence graph can be used to locate a more
+  specific merge base, allowing for better merges between different versions of
+  the same change.
+- It could be used to correctly rebase local changes and other local branches
+  after running git-filter-branch.
+- It can replace the change-id footer used by gerrit.
+
+Similar technologies
+
+There are some other technologies that address the same end-user problem.
+
+Rebase -i can be used to solve the same problem, but users can't easily switch
+tasks midway through an interactive rebase or have more than one interactive
+rebase going on at the same time. It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased
+and won't let you collaborate with others on resolving a complicated 
interactive
+rebase. You can think of rebase -i as a top-down approach and the evolve 
command
+as the bottom-up approach to the same problem.
+
+Several patch queue managers have been built on top of git (such as topgit,
+stgit, and quilt). They address the same user need. However they also rely on
+state managed outside git that needs to be kept in sync. Such state can be
+easily damaged when running a git native command that is unaware of the patch
+queue. They also typically require an explicit initialization step to be done 
by
+the user which creates workflow problems.
+
+Replacements (refs/replace) are superficially similar to obsolescences in that
+they describe that one commit should be replaced by another. However, they
+differ in both how they are created and how they are intended to be used.
+Obsolescences are created automatically by the commands a user runs, and they
+describe the user’s intent to perform a future rebase. Obsolete commits still
+appear in branches, logs, etc like normal commits (possibly with an extra
+decoration that marks them as obsolete). Replacements are typically created
+explicitly by the user, they are meant to be kept around for a long time, and
+they describe a replacement to be applied at read-time rather than as the input
+to a future operation. When a replaced commit is queried, it is typically 
hidden
+and swapped out with its replacement as though the replacement has already
+occurred.
+
+Goals

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 20 2018, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Nov 15 2018, sxe...@google.com wrote:
>
>>> +Parent-type
>>> +---
>>> +The “parent-type” field in the commit header identifies a commit as a
>>> +meta-commit and indicates the meaning for each of its parents. It is never
>>> +present for normal commits.
> [...]
>> I think it's worth pointing out for those that are rusty on commit
>> object details (but I checked) is that the reason for it not being:
>>
>> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
>> parent aa7ce55545bf2c14bef48db91af1a74e2347539a
>> parent-type content
>> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
>> parent-type obsolete
>> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
>> parent-type origin
>> author Stefan Xenos  1540841596 -0700
>> committer Stefan Xenos  1540841596 -0700
>>
>> Which would be easier to read, is that we're very sensitive to the order
>> of the first few fields (tree -> parent -> author -> committer) and fsck
>> will error out if we interjected a new field.
>
> By the way, in the spirit of limiting the initial scope, I wonder
> whether the parent-type fields can be stored in the commit message
> initially.
>
> Elsewhere in this thread it was mentioned that the parent-type is a
> field to allow tools like "git fsck" to understand the meaning of
> these parent relationships (for example, to forbid a commit
> referencing a meta-commit).  The same could be done using special
> commit message text, though.
>
> The advantage of such an approach would be that we could experiment
> without changing the official object format at all.  If experiments
> revealed a different set of information to store, we could update the
> format without having to maintain the memory of the older format in
> "git fsck"'s understanding of commit object fields.  So even though I
> think that in the end we would want to put this information in the
> commit object header, I'm tempted to suspect that the benefits of
> putting it in the commit message to start outweigh the costs (in
> particular, of having to migrate to another format later).

I think it sounds better to just make it, in the header:

x-evolve-pt content
x-evolve-pt obsolete
x-evolve-pt origin

Where "pt = parent-type", we could of course spell that out too, but in
this case it's "x-evolve-pt" is the exact same number of bytes as
"parent-type", so nobody can object that it takes more space:)

We'd then carry some documentation where we say everything except "x-*-"
is reserved, and that we'd like to know about new "*" there before it's
used, so it can be documented.

Putting it in the commit message just sounds like a hack around not
having namespaced headers. If we'd like to keep this then tools would
need to parse both (potentially unpacking a lot of the commit message
object, it can be quite big in some cases...).


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Thu, Nov 15 2018, sxe...@google.com wrote:

>> +Parent-type
>> +---
>> +The “parent-type” field in the commit header identifies a commit as a
>> +meta-commit and indicates the meaning for each of its parents. It is never
>> +present for normal commits.
[...]
> I think it's worth pointing out for those that are rusty on commit
> object details (but I checked) is that the reason for it not being:
>
> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> parent-type content
> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> parent-type obsolete
> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> parent-type origin
> author Stefan Xenos  1540841596 -0700
> committer Stefan Xenos  1540841596 -0700
>
> Which would be easier to read, is that we're very sensitive to the order
> of the first few fields (tree -> parent -> author -> committer) and fsck
> will error out if we interjected a new field.

By the way, in the spirit of limiting the initial scope, I wonder
whether the parent-type fields can be stored in the commit message
initially.

Elsewhere in this thread it was mentioned that the parent-type is a
field to allow tools like "git fsck" to understand the meaning of
these parent relationships (for example, to forbid a commit
referencing a meta-commit).  The same could be done using special
commit message text, though.

The advantage of such an approach would be that we could experiment
without changing the official object format at all.  If experiments
revealed a different set of information to store, we could update the
format without having to maintain the memory of the older format in
"git fsck"'s understanding of commit object fields.  So even though I
think that in the end we would want to put this information in the
commit object header, I'm tempted to suspect that the benefits of
putting it in the commit message to start outweigh the costs (in
particular, of having to migrate to another format later).

Thanks,
Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Jonathan Nieder
Hi,

Stefan Xenos wrote:

> But since several comments have focused on the commands, let's brainstorm!
>
> Here's some options that occur to me:
>
> 1. Three commands: evolve + change + obslog as top-level commands (the
> current proposal). Change gets a bunch of subcommands for manipulating
> the change graph and metas/ namespace.
>
> 2. All top-level: evolve + lschange + mkchange + rmchange +
> prunechange + obslog. None of the commands get subcommands.
>
> 3. Everything under change: "change evolve", "change obslog" become
> new change subcommands.
>
> 4. Evolve as a rebase argument, obslog as a log argument. Use "rebase
> --evolve" to initiate evolve and use "log --obslog" to initiate
> obslog. The change command stays as it is in the proposal.
>
> 5. Two commands: evolve + change. obslog becomes a "log" argument.
>
> Note that there will be more "evolve"-specific arguments in the
> future. For most transformations that evolve uses, there will be a
> matching argument to enable or disable that transformation and as we
> add transformations we'll also add arguments.
>
> If we go with option 3, it would make for a very cluttered help page.
> For example, if you're looking for information on how to use evolve,
> you'd have to scroll past a bunch of formatting information that are
> only relevant to obslog... and if you're looking for the formatting
> options, you'd have to scroll past a bunch of
> transformation-enablement options that are only relevant to evolve.
>
> Based on your log feedback above, I'm thinking #5 may make sense.

(5) sounds good to me, too.  Thanks, both, for your thoughtfulness.

Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Junio C Hamano
Stefan Xenos  writes:

>> But it is not immediately obvious to me how it would help to have
>> "Z was cherry-picked from W" in "evolve".
>
> The evolve command would use it for handling the
> obsolescence-over-cherry-pick (OOCP) feature. If someone cherry-picks
> a commit and then amends the original, the evolve command would give
> you the option of applying the same amendment to the cherry-picked
> version.

Yeah, I missed that case when I was formulating my thought on how we
can start smaller and simpler to get the ball rolling.  And for
"this commit and anything built on top of it need to be adjusted
since that other commit, which this commit was made by cherry-picking
it, has been obsoleted" to work, the "origin" commit pointed at by
the meta commit must be made available.

> Are you claiming that this is undesirable, or are you claiming that
> this could be accomplished without origin parents?

I was trying to see if this is something we can leave out to limit
the initial scope.


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Stefan Xenos
> Subcommand names and --long-options are just as descriptive.

Yeah, I'm convinced about the descriptiveness. If you check the latest
version of the patch, I've already updated the "change" command to use
subcommands rather than lettered arguments.

> If a user wants to deal with reflogs, then there is 'git help reflog'

I guess it depends on whether you prefer having a single big help page
(risk: user may see irrelevant content), or a number of small help
pages (risk: user may need to follow cross-references). My guess is
that we should probably try to hit the sweet spot that minimizes the
amount of irrelevant information on a help page, the probability of
needing to follow a cross-reference to understand context, and the
amount of content that needs to be duplicated between pages.

But assuming we add a bunch of formatting options to obslog that match
log, it may make sense to just have an "--obslog" argument to log.

> I think 'git obslog' should allow the same when showing the log of a change.

Sounds good. We should probably also change the default formatting for
the obslog command to be some sort of description of what changed
since the commit message will probably be very similar for every
entry. I'll update the proposal to mention formatting options once we
sort out where obslog will actually live.

> By adding several new commands users will have to consult the manpages of 
> 'evolve',
> 'change', 'obslog', etc., even though the commands and the concepts are 
> closely related.

I'm not sure that's the case. There is some common background material
you'd need to understand in order to use any of those commands ("what
are changes?"), but the same could be said of pretty much any git
command ("what are commits?"). Assuming the user knows what a change
is, I'm pretty sure I could write a self-contained description for
evolve, change, or obslog that doesn't require cross-referencing any
of the other commands... and the evolve command could probably be
understood even without understanding changes.

But since several comments have focused on the commands, let's brainstorm!

Here's some options that occur to me:
1. Three commands: evolve + change + obslog as top-level commands (the
current proposal). Change gets a bunch of subcommands for manipulating
the change graph and metas/ namespace.
2. All top-level: evolve + lschange + mkchange + rmchange +
prunechange + obslog. None of the commands get subcommands.
3. Everything under change: "change evolve", "change obslog" become
new change subcommands.
4. Evolve as a rebase argument, obslog as a log argument. Use "rebase
--evolve" to initiate evolve and use "log --obslog" to initiate
obslog. The change command stays as it is in the proposal.
5. Two commands: evolve + change. obslog becomes a "log" argument.

Note that there will be more "evolve"-specific arguments in the
future. For most transformations that evolve uses, there will be a
matching argument to enable or disable that transformation and as we
add transformations we'll also add arguments.

If we go with option 3, it would make for a very cluttered help page.
For example, if you're looking for information on how to use evolve,
you'd have to scroll past a bunch of formatting information that are
only relevant to obslog... and if you're looking for the formatting
options, you'd have to scroll past a bunch of
transformation-enablement options that are only relevant to evolve.

Based on your log feedback above, I'm thinking #5 may make sense.

  - Stefan
On Mon, Nov 19, 2018 at 7:55 AM SZEDER Gábor  wrote:
>
> On Sat, Nov 17, 2018 at 12:30:58PM -0800, Stefan Xenos wrote:
> > > Further, I see that this document tries to suggest a proliferation of new 
> > > commands
> >
> > It does. Let me explain a bit about the reasoning behind this
> > breakdown of commands. My main priority was to keep the commands as
> > consistent with existing git commands as possible. Secondary goals
> > were:
> > - Mapping a single intent to a single command where possible makes it
> > easier to explain what that command does.
> > - Having lots of simpler commands as opposed to a few complex commands
> > makes them easier to type.
> > - Command names are more descriptive than lettered arguments.
>
> Subcommand names and --long-options are just as descriptive.
>
>
> > Git already has a "log" and "reflog" command for displaying two
> > different types of log,
>
> No, there is 'git log' for displaying logs in various ways, and there
> is 'git reflog' which not only displays reflogs, but also operates on
> them, e.g. deletes specific reflog entries or expires old entries,
> necessitating and justifying the dedicated 'git reflog' command.
>
> > so putting "obslog" on its own command makes
> > it consistent with the existing logs, easier to type, and keeps the
> > command simple.
>
> > - We could turn "obslog" into an extra option on the "log" command,
> > but that would be inconsistent with reflog and would complicate the
> > 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Jonathan Nieder
Hi,

Xenos wrote:

> Lets explore the "when" question. I think there's a compelling reason
> to add them as soon as possible - namely, gerrit. If and when we come
> to some sort of agreement on this proposal, gerrit could start adding
> tooling to understand change graphs as an alternative to change-id
> footers. That work could proceed in parallel with the work in git-core
> once we know what the data structures look like, but it can't start
> until the data structures are sufficient to address all the use cases
> that were previously covered by change-id. At the moment, meta-commits
> without origin parents would not cover all of gerrit's use-cases so
> this would block adoption in gerrit.

By this, are you referring to the "Cherry-picks" list in the Gerrit
web UI?

Thanks,
Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Stefan Xenos
> But it is not immediately obvious to me how it would help to have "Z was 
> cherry-picked from W" in "evolve".

The evolve command would use it for handling the
obsolescence-over-cherry-pick (OOCP) feature. If someone cherry-picks
a commit and then amends the original, the evolve command would give
you the option of applying the same amendment to the cherry-picked
version.

Are you claiming that this is undesirable, or are you claiming that
this could be accomplished without origin parents?

> the developer wanted to use the change between W^ and W in a context that is 
> quite different from

I guess that depends on the reason for doing the cherry-pick. A very
common scenario I see for cherry-picks is cherry-picking a bugfix from
a development branch to a maintenance branch. In that situation, if
there was a better version of the original bugfix you'd also want to
update the cherry-pick on the maintenance branch to use the better
version of the fix. That's what OOCP does.

> make no sense to "evolve" anything that was built on top of W on top of Z.

Agreed. But that's not what evolve would do with the origin edges. It
would be looking for amendments of W, not children of W.

> It is of course OK to build a different feature that can take advantage of 
> the cherry-pick information on top of the same meta commit concept in later 
> steps

All valid points - we could build a useful "evolve" command without
origin edges (and without OOCP), we could easily add origin parents
later to a design that just supported obsolete and content parents,
and the decision about /when/ to add origin parents is orthogonal to
the decision about /if/ to add them.

Lets explore the "when" question. I think there's a compelling reason
to add them as soon as possible - namely, gerrit. If and when we come
to some sort of agreement on this proposal, gerrit could start adding
tooling to understand change graphs as an alternative to change-id
footers. That work could proceed in parallel with the work in git-core
once we know what the data structures look like, but it can't start
until the data structures are sufficient to address all the use cases
that were previously covered by change-id. At the moment, meta-commits
without origin parents would not cover all of gerrit's use-cases so
this would block adoption in gerrit.

  - Stefan
On Sun, Nov 18, 2018 at 8:15 PM Junio C Hamano  wrote:
>
> Stefan Xenos  writes:
>
> > The scenario you describe would not produce an origin edge in the
> > metacommit graph. If the user amended X, there would be no origin
> > edges - just a replacement. If you cherry-picked Z you'd get no
> > replacements and just an origin. In neither case would you get both
> > types of parent.
>
> OK, that makes things a lot simpler.
>
> I can see why we want to record "commit X obsoletes commit Y" to
> help the "evolve" feature, which was the original motivation this
> started the whole discussion.  But it is not immediately obvious to
> me how it would help to have "Z was cherry-picked from W" in
> "evolve".
>
> The whole point of cherry-picking an old commit W to produce a new
> commit Z is because the developer wanted to use the change between
> W^ and W in a context that is quite different from W^, so it would
> make no sense to "evolve" anything that was built on top of W on top
> of Z.
>
> It is of course OK to build a different feature that can take
> advantage of the cherry-pick information on top of the same meta
> commit concept in later steps, and to ensure that is doable, the
> initial meta commit design must be done in a way that is flexible
> enough to be extended, but it is not clear to me if this "origin"
> thing is "while this does not have much to do with 'evolve', let's
> throw in fields that would help another feature while we are at it"
> or "in addition to 'X obsoletes Y', we need the cherry-pick
> information for 'evolve' feature because..." (and because it is not
> clear, I am assuming that it is the former).  If we can design the
> "evolve" thing with only the "contents" and "obsoletes", that would
> allow us to limit the scope of discussion we need to have around
> meta commit and have something that works earlier, wouldn't it?
>
> Thanks.


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread SZEDER Gábor
On Sat, Nov 17, 2018 at 12:30:58PM -0800, Stefan Xenos wrote:
> > Further, I see that this document tries to suggest a proliferation of new 
> > commands
> 
> It does. Let me explain a bit about the reasoning behind this
> breakdown of commands. My main priority was to keep the commands as
> consistent with existing git commands as possible. Secondary goals
> were:
> - Mapping a single intent to a single command where possible makes it
> easier to explain what that command does.
> - Having lots of simpler commands as opposed to a few complex commands
> makes them easier to type.
> - Command names are more descriptive than lettered arguments.

Subcommand names and --long-options are just as descriptive.


> Git already has a "log" and "reflog" command for displaying two
> different types of log,

No, there is 'git log' for displaying logs in various ways, and there
is 'git reflog' which not only displays reflogs, but also operates on
them, e.g. deletes specific reflog entries or expires old entries,
necessitating and justifying the dedicated 'git reflog' command.

> so putting "obslog" on its own command makes
> it consistent with the existing logs, easier to type, and keeps the
> command simple.

> - We could turn "obslog" into an extra option on the "log" command,
> but that would be inconsistent with reflog and would complicate the
> already-complex log command.

On one hand, it's unclear to me what additional operations the
proposed 'git obslog' command will perform besides showing the log of
a change.  If there are no such operations, then it can't really be
compared to 'git reflog' to justify a dedicated 'git obslog' command.

OTOH, note that 'git log' already has a '--walk-reflogs' option, and
indeed 'git reflog [show]' is implemented via the common log
machinery.  And this is not a mere implementation detail, because "git
reflog show accepts any of the options accepted by git log" (quoting
its manpage), making it possible to filter, limit and format reflog
entries, e.g.:

  git reflog --format='%h %cd %s' --author=szeder -5 branch file

I think 'git obslog' should allow the same when showing the log of a
change.


> Personally, I don't
> consider a proliferation of new commands to be inherently bad (or
> inherently good, really). Is there a reason new commands should be
> avoided?

If a user wants to deal with reflogs, then there is 'git help reflog'
which in one manpage describes the concept, and how to list and
expire reflogs and delete individual entries from a reflog using the
various subcommands.  If a user wants to deal with stashes, then there
is 'git help stash', which in one manpage describes the concept, and
how to create, list, show, apply, delete, etc. stashes using the
various subcommands.  See where this is going?  The same applies to
bisect, notes, remotes, rerere, submodules, worktree; maybe there are
more.  This is a Good Thing.

By adding several new commands users will have to consult the manpages
of 'evolve', 'change', 'obslog', etc., even though the commands and
the concepts are closely related.




Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Junio C Hamano
Stefan Xenos  writes:

> The scenario you describe would not produce an origin edge in the
> metacommit graph. If the user amended X, there would be no origin
> edges - just a replacement. If you cherry-picked Z you'd get no
> replacements and just an origin. In neither case would you get both
> types of parent.

OK, that makes things a lot simpler.

I can see why we want to record "commit X obsoletes commit Y" to
help the "evolve" feature, which was the original motivation this
started the whole discussion.  But it is not immediately obvious to
me how it would help to have "Z was cherry-picked from W" in
"evolve".

The whole point of cherry-picking an old commit W to produce a new
commit Z is because the developer wanted to use the change between
W^ and W in a context that is quite different from W^, so it would
make no sense to "evolve" anything that was built on top of W on top
of Z.

It is of course OK to build a different feature that can take
advantage of the cherry-pick information on top of the same meta
commit concept in later steps, and to ensure that is doable, the
initial meta commit design must be done in a way that is flexible
enough to be extended, but it is not clear to me if this "origin"
thing is "while this does not have much to do with 'evolve', let's
throw in fields that would help another feature while we are at it"
or "in addition to 'X obsoletes Y', we need the cherry-pick
information for 'evolve' feature because..." (and because it is not
clear, I am assuming that it is the former).  If we can design the
"evolve" thing with only the "contents" and "obsoletes", that would
allow us to limit the scope of discussion we need to have around
meta commit and have something that works earlier, wouldn't it?

Thanks.


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Junio C Hamano
Stefan Xenos  writes:

>> I meant the project's history, not the meta-graph thing.
>
> In that case, we agree. The proposal suggests that "origin" should be
> reachable from the meta-graph for the cherry-picked commit, NOT the
> cherry-picked commit itself. Does that resolve our disagreement, or is
> reachability from the meta-graph also undesirable for you?

Sorry, I confused myself.

Yes, I do mind that the "origin" thing in the meta history to pin
the old commit whose contents were cherry picked to create a new
commit, which is separate from the old commit that was rewritten to
create a new commit.  The latter (i.e. the old one) I do not mind to
get retrieved when such a meta commit is fetched, and all of us of
course would want the new one, too (which is the whole point of
adding the meta commit to help other commits built on the old one
migrate to the new one).  But I simply do not see the point of
having to drag the history leading to "origin", and that is why I am
moderately against recording "the change in this came from that
commit via cherry-pick" in a meta commit.


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Stefan Xenos
> I meant the project's history, not the meta-graph thing.

In that case, we agree. The proposal suggests that "origin" should be
reachable from the meta-graph for the cherry-picked commit, NOT the
cherry-picked commit itself. Does that resolve our disagreement, or is
reachability from the meta-graph also undesirable for you?

> By having a "this was cherry-picked from that commit" in a commit
> that is not GC'ed, the original commit that has no longer have any
> relevance (because the newer one that is the result of the
> cherry-pick is the surviving version people will be building on) is
> kept reachable.  It is very much delierate that "cherry-pick -x"
> does not make the "origin" reachable and merely notes it in the
> human readable form that is ignored by the reachablity machinery.

Hmm. It sounds like you may be arguing against reachability from the
cherry-picked commit (which we agree on). I'm arguing for reachability
ONLY from the meta-graph. From your reply it's not completely clear to
me whether you also disapprove of reachability from the meta-graph or
if you thought the origin edges would be present on the cherry-picked
commit itself. Could you clarify? I suspect it may be the latter,
which suggests ambiguity in the proposal. If you could point to the
text that gave the impression origin parents would be present in the
cherry-picked commits themselves, I'll fix it.

> This is where we differ.  If commit X was rewritten (perhaps with
> help from change cherry-picked from commit Z, or without any) to
> produce Y, I do agree that it would be logical to keep X around
> until every dependent commit on it are migrated to be on top of Y.

The scenario you describe would not produce an origin edge in the
metacommit graph. If the user amended X, there would be no origin
edges - just a replacement. If you cherry-picked Z you'd get no
replacements and just an origin. In neither case would you get both
types of parent. I'd suggest we focus on the cherry-pick scenario
since it's the simplest real-world use case that produces origin
parents. All the more complex scenarios involving both parent types
only occur if you start from that simple case, so if you convince me
that the origin-only use case is unnecessary or undesirable, it would
also follow that the more complex origin-plus-obsolete-parent use case
is unnecessary.

So, if you don't mind - let me simplify that use-case: "If commit Z is
cherry-picked to produce Y, is there any need to keep Z around?". I
don't think we need X in the example to answer that question.

> But we do not need Z to transplant what used to be on X on top of Y,
> do we?

That's correct. The origin parent would be used to incorporate amended
versions of Z into Y, not to transplant things. It would also be used
to locate ancestors when merging code based on Z with code based on Y.

> So I do agree that in such a situation they want the
> relevant parts of the history retained, but I do not agree that
> "origin" is among them.

You may be entirely right, but at this point I'm not certain whether
we're disagreeing or miscommunicating. :-(


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Junio C Hamano
Stefan Xenos  writes:

>> And the other half is that while I consider the "origin" thing is
>> unnecessary for the above reasons, having it means we need to not
>> just transfer the history reading to aa7ce555 and d664309ee (which
>> are necessary anyway while we have histories to transplant from
>> d664309ee to aa7ce555) but also have to pull in the history leading
>> to 7e1bbcd and we cannot discard it.
>
> I'll assume that by "history" you're referring to the change graph
> (the metacommits) and not the branches (the commits), which would have
> no origin edges or connection between replacements.

I meant the project's history, not the meta-graph thing.

By having a "this was cherry-picked from that commit" in a commit
that is not GC'ed, the original commit that has no longer have any
relevance (because the newer one that is the result of the
cherry-pick is the surviving version people will be building on) is
kept reachable.  It is very much delierate that "cherry-pick -x"
does not make the "origin" reachable and merely notes it in the
human readable form that is ignored by the reachablity machinery.

> If the user has kept a change around in their metas namespace, it's an
> indication that they (or their collaborators) are still working on it
> and want its history to be retained.

This is where we differ.  If commit X was rewritten (perhaps with
help from change cherry-picked from commit Z, or without any) to
produce Y, I do agree that it would be logical to keep X around
until every dependent commit on it are migrated to be on top of Y.
But we do not need Z to transplant what used to be on X on top of Y,
do we?  So I do agree that in such a situation they want the
relevant parts of the history retained, but I do not agree that
"origin" is among them.

Side note.  As long as we have commits yet to be migrated to
be on Y that still is on X, ew do not need the meta-commit
to be protecting from getting GC'ed, as X is reachable from
these "need to be updated" branch tips anyway.  What we gain
from extra reachability brought in by the meta commits is
that by fetching the "change", we get Y (and its anestors),
even if we are not following any branch that contains it, so
that we can migrate those that are still based on X to it.






Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Stefan Xenos
> Am I correct to understand that the reason why a commit object is
> (ab|re)used to represent a meta-commit is because by doing so we
> would get connectivity (i.e. fetching & pushing would transfer all
> the associated objects along) for free, and by not representing it
> as a new and different object type, existing implementations can
> just pass them along without understanding what they are, and as
> long as these are not mixed as parts of the main history of the
> project (e.g. when enumerating commits that has aa7ce5 as its
> parents, because somebody else obsoleted aa7ce5 and you want to
> evolve anything that built on it, you do not want to mistake the
> above "meta" commit as a commit that is part of the ordinary history
> and rebuild on top of the new version of aa7ce5, which would lead to
> a disaster), everything would work just fine?

Yes, sir. That's it exactly. My first draft of the proposal suggested
creating a new top-level object type, but when I started digging
through the code I realized that the new object was so similar to a
commit that there was no need for a new type.

> Perhaps you'd use something like "presence of parent-type header
> marks that a commit is a meta-commit and not part of the main
> history".

Yes, that's called out explicitly as part of the proposal (see the
first sentence in the Parent-type subsection). Fsck would enforce this
invariant.

> How are these meta commits anchored so that it won't be reclaimed by
> repack?

They would either be anchored by a ref in the metas/ namespace (for
active changes currently under consideration by evolve) or by the
reflog (for recently deleted changes).

> I do not see any "parent" field used to chain them together,

They point to one another using the usual "parent" field present in
all commit objects. For an example of what the raw struct would look
like with parent pointers, see the top of the "Detailed design"
section or search the doc for the string . For
examples of how the metacommits in a change graph would be connected
after various operations, see the "Commit" section and the "Merge"
section. Please let me know if any of these examples are
insufficiently explained or if there's any other examples you'd like
to see.

> but I do not think we can afford to spend one ref per meta
> commit, as refs are not designed to point into each and every object
> in the repository.

Agreed. This is actually one of the reasons I'm proposing the use of
chains of meta-commits as opposed to using a purely ref-based
approach. I describe several other ref-based approaches in the "Other
options considered" section, and I made essentially the same point
there. We only create refs in the metas/ namespace to point to the
head of each change, and the rest of the commits and metacommits used
by the graph are reached via the parent pointers in the metacommits.

> I have a moderately strong opposition against "origin" thing.  If
> aa7ce555 replaces d664309ee, in order for the tool to be able to
> "evolve" other histories that build on top of d664309ee, it only
> needs the history between aa7ce555 and d664309ee and it would not
> matter how aa7ce555 was built relative to its parent.

I see I haven't justified the "origin" thing well enough. I'll
elaborate in the document, but here's the short version. The "origin"
edges are needed to address several use-cases:

1. Gerrit needs to know about cherry picks.

This is one of the lesser-known things that it uses the change-id
footers for and if we want to be able to eliminate the gerrit
change-id footers we need to record and communicate information about
cherry-picks somehow. This is the main reason for the origin edges -
the early drafts of this proposal didn't have them but it came up when
I asked a kind Gerrit maintainer to whether the proposal would be
sufficient to eliminate gerrit's change-ids. However there may be
alternatives I didn't think of. If we were to omit the origin edges,
can you suggest an alternative way for git to record the fact that one
commit was cherry-picked from another and communicate this fact to
gerrit?

I see that I forgot to call out "replacing gerrit change-ids" as an
explicit goal. I'll add that to the doc.

2. Obsolescence across cherry-picks.

In your example, it *may* actually matter how aa7ce55 was constructed.
One such scenario is what I'm calling obsolescence across
cherry-picks. Let me describe the use-case for it:

Alice creates commit A1.

Bob cherry-picks A1 to another branch, producing B1. At this point,
Bob has a metacommit saying that A1 is the origin of B1.

Alice amends A1, producing A2. She shares this with Bob.

At this point, Bob probably wants to amend B1 to include whatever
bugfix Alice did in A2 since the thing he cherry-picked is now out of
date. That's what the obsolescence across cherry-picks feature does.
If bob runs evolve with this option enabled, the evolve command will
produce B2 by amending B1 with whatever diff Alice did between A1 and

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Junio C Hamano
Stefan Xenos  writes:

>> I don't think this counts as a typical modification and is probably hard to 
>> detect automatically.
>
> Clever use of commands! (side: wouldn't it just be easier to just use
> git commit --amend, though?)

When an original commit is mostly an early part of a feature, mixed
with a small but an urgent bugfix, it is not unusual to start your
work from "reset HEAD^" (or "reset --soft HEAD^") and recreate a
commit that has the main part of the change from the original,
leaving the remainder in the working tree to be worked into another
bugfix commit, most likely to be on a new branch forked from an
earlier point in the history, i.e.

git reset HEAD^
git add -p
git commit -c @{1}
git checkout -m -b a-small-bugfix-split-out master
edit
git commit -a

I agree with both of you that we want to have a way to mark that the
first commit we made by partially committing what was in the
original came from the original one, and also that the second one
has contents from the same original one.

It is unclear, without human involvement, if we can mechanically
infer that anything that used to be built on top of the original
commit would want to be rebuilt on top of the first half of the
split commit (i.e. the early part of the feature with the bugfix
separated out) but not on the other half (i.e. the bugfix alone).


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Stefan Xenos
> This breaks the "git change" symmetry with "git branch", but after
> responding to other messages regarding that command, I'm starting to
> think that's not really a problem.

Sorry, I appended that sentence to the wrong paragraph. It should have
gone with the previous one that regarding the "git change" command.
On Sun, Nov 18, 2018 at 2:27 PM Stefan Xenos  wrote:
>
> > I don't think this counts as a typical modification and is probably hard to 
> > detect automatically.
>
> Clever use of commands! (side: wouldn't it just be easier to just use
> git commit --amend, though?)
>
> Either way, I agree that there should be a way to manually create a
> change graph or modify one into any possible shape. I've updated the
> "change" command to do what you want - the new version will have
> subcommands for creating arbitrary change graphs.
>
> > subject line will change over time and the original one may become 
> > irrelevant.
>
> There's a section on change naming further down the document. My
> criteria for name selection was that good names should be unique,
> short to type, and memorable to the user. Being relevant to the commit
> wasn't actually a requirement for me except insofar as it helps make
> them memorable. If we agree that these are reasonable criteria, commit
> hashes wouldn't be as good a choice since they'd satisfy the
> uniqueness criteria but would not be short or memorable. I expect that
> whatever criteria we select probably won't be optimal for all users
> which is why the design also includes a new hook for name selection. I
> believe that selected words from the commit comment should cover all
> three criteria in the majority of cases, and that the hook and the
> "change rename" command should cover the remaining corner cases. This
> breaks the "git change" symmetry with "git branch", but after
> responding to other messages regarding that command, I'm starting to
> think that's not really a problem.
>
> > How do we group changes of a topic together? I think branch-diff could take 
> > advantage of that.
>
> Could you clarify your use-case for me? I'm not sure what you mean by
> "changes of a topic". Are you referring to gerrit topics here? Topic
> branches? Or are you asking for some way for end-users to classify and
> organize their unsubmitted changes?
>
> > Could we just organize it like a normal history?
> > Basically all commits will be linked in a new merge history.
>
> From what I can tell, you're suggesting the following changes:
> 1. Reorder the parents such that the content parent comes last rather
> than first.
> 2. Move parent-type from the structured portion of the header to the
> unstructured portion of the commit message.
>
> I'm fine with 1 if that makes something easier.
>
> Regarding 2, I can see some good reasons to put parent-type in the
> header rather than the user-readable portion of the commit message
> - fsck can rely on them when checking the database for validity (for
> example, it can assert that the current repository version doesn't
> attach a non-empty tree, that the content parent always points to a
> real commit, the commit message is empty, that the number of
> parent-types matches the number of parents, that the enum values are
> valid, that the parent orders are correct, etc.).
> - accidental collisions are impossible (users can't accidentally
> corrupt their database by adding or removing the word "parent-type" in
> a commit message).
> - it doesn't spam the user-readable region with machine-readable
> repository internals.
>
> > This makes it possible to just use "git log --first-parent
> > --patch" (or "git log --oneline --graph") to examine the change.
>
> The "git log --oneline --graph" thing should work fine with the
> proposal as it currently is, but I'm not sure that the --first-parent
> --patch thing would be very useful no matter how we order the parents.
> The metacommits have empty trees and commit messages, so such a log
> would just list the metacommit hashes and nothing else. That certainly
> has some utility, but I'd guess it's probably not what you were going
> for. Were you intending to suggest that the metacommit should also use
> the same tree and commit message as its content commit? If so, we
> briefly considered this option while preparing this proposal. That
> would make some commands do approximately the right thing for free.
> However, when we started working through the use-cases (for example,
> checking out a metacommit) we found that all the ones we looked at
> would still need special cases for metacommits and those special cases
> wouldn't be much simpler than they'd be with an empty tree and
> message. Admittedly, git log wasn't one of the use-cases we worked
> through.
>
>   - Stefan
>
> On Fri, Nov 16, 2018 at 10:07 PM Duy Nguyen  wrote:
> >
> > On Thu, Nov 15, 2018 at 2:00 AM  wrote:
> > > +Goals
> > > +-
> > > +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> > > +attempted unless 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-18 Thread Stefan Xenos
> I don't think this counts as a typical modification and is probably hard to 
> detect automatically.

Clever use of commands! (side: wouldn't it just be easier to just use
git commit --amend, though?)

Either way, I agree that there should be a way to manually create a
change graph or modify one into any possible shape. I've updated the
"change" command to do what you want - the new version will have
subcommands for creating arbitrary change graphs.

> subject line will change over time and the original one may become irrelevant.

There's a section on change naming further down the document. My
criteria for name selection was that good names should be unique,
short to type, and memorable to the user. Being relevant to the commit
wasn't actually a requirement for me except insofar as it helps make
them memorable. If we agree that these are reasonable criteria, commit
hashes wouldn't be as good a choice since they'd satisfy the
uniqueness criteria but would not be short or memorable. I expect that
whatever criteria we select probably won't be optimal for all users
which is why the design also includes a new hook for name selection. I
believe that selected words from the commit comment should cover all
three criteria in the majority of cases, and that the hook and the
"change rename" command should cover the remaining corner cases. This
breaks the "git change" symmetry with "git branch", but after
responding to other messages regarding that command, I'm starting to
think that's not really a problem.

> How do we group changes of a topic together? I think branch-diff could take 
> advantage of that.

Could you clarify your use-case for me? I'm not sure what you mean by
"changes of a topic". Are you referring to gerrit topics here? Topic
branches? Or are you asking for some way for end-users to classify and
organize their unsubmitted changes?

> Could we just organize it like a normal history?
> Basically all commits will be linked in a new merge history.

>From what I can tell, you're suggesting the following changes:
1. Reorder the parents such that the content parent comes last rather
than first.
2. Move parent-type from the structured portion of the header to the
unstructured portion of the commit message.

I'm fine with 1 if that makes something easier.

Regarding 2, I can see some good reasons to put parent-type in the
header rather than the user-readable portion of the commit message
- fsck can rely on them when checking the database for validity (for
example, it can assert that the current repository version doesn't
attach a non-empty tree, that the content parent always points to a
real commit, the commit message is empty, that the number of
parent-types matches the number of parents, that the enum values are
valid, that the parent orders are correct, etc.).
- accidental collisions are impossible (users can't accidentally
corrupt their database by adding or removing the word "parent-type" in
a commit message).
- it doesn't spam the user-readable region with machine-readable
repository internals.

> This makes it possible to just use "git log --first-parent
> --patch" (or "git log --oneline --graph") to examine the change.

The "git log --oneline --graph" thing should work fine with the
proposal as it currently is, but I'm not sure that the --first-parent
--patch thing would be very useful no matter how we order the parents.
The metacommits have empty trees and commit messages, so such a log
would just list the metacommit hashes and nothing else. That certainly
has some utility, but I'd guess it's probably not what you were going
for. Were you intending to suggest that the metacommit should also use
the same tree and commit message as its content commit? If so, we
briefly considered this option while preparing this proposal. That
would make some commands do approximately the right thing for free.
However, when we started working through the use-cases (for example,
checking out a metacommit) we found that all the ones we looked at
would still need special cases for metacommits and those special cases
wouldn't be much simpler than they'd be with an empty tree and
message. Admittedly, git log wasn't one of the use-cases we worked
through.

  - Stefan

On Fri, Nov 16, 2018 at 10:07 PM Duy Nguyen  wrote:
>
> On Thu, Nov 15, 2018 at 2:00 AM  wrote:
> > +Goals
> > +-
> > +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> > +attempted unless they interfere with goals marked with Pn-1.
> > +
> > +P0. All commands that modify commits (such as the normal commit --amend or
> > +rebase command) should mark the old commit as being obsolete and 
> > replaced by
> > +the new one. No additional commands should be required to keep the
> > +obsolescence graph up-to-date.
>
> I sometimes "modify" a commit by "git reset @^", pick up the changes
> then "git commit -c @{1}". I don't think this counts as a typical
> modification and is probably hard to detect automatically. But I 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-17 Thread Stefan Xenos
Resending this without HTML enabled... sorry if you receive it twice.

> The file name and the title are in a mismatch.

Good point. However, the focus of this proposal really is supposed to
be on the underlying data structure, not just the evolve command
(which is the driving use-case for the graph but not the only
important one). I think I'll fix the mismatch by renaming both the
title and document to "change graph" if that seems acceptable. I'll
also expand the "objective" paragraph to mention the evolve command.

> Perhaps"three sequential patches"?

I've added a quick informal definition of the word "change", along
with a cross-reference to the precise definition later in the
document.

> These two paragraphs could be moved lower, under a "Semi-Related Work"

Good point. I'll keep the patch queue managers here since they really
can be used to solve the same problem that evolve addresses, but I'll
move replacements paragraph down to a new section on semi-related
work. There was also a request to discuss git-imerge which I'll insert
there.

> Instead, I would try to use the term "patch" to describe a change to the 
> codebase

I know you didn't finish the document but later on I define the term
"change" to have essentially this meaning. I've moved the definition
earlier in the document to make the earlier sections easier to
understand. Given the choice of the word "patch" or "change" for this
definition, I prefer to use "change" since gerrit already defines it
in this way and the word "patch" already has a meaning in git (a file
containing a diff).

> Making a note so I come back to this. I hope to learn what you mean by this 
> "more specific merge base".)

Lets say we have commits:

P <- C

Then two people amend C in different ways producing:

P <- C
P <- C1
P <- C2

...then we try to resolve the divergence by merging C1 and C2. Without
the change graph, the closest merge-base (ancestor) would be P. With
the change graph, the closest merge base would be C.

> If we GC'd commit A but still have the newer A', I can either thinkthat

I'm not sure I followed that. Are you suggesting a change to the
proposal or asking for a clarification?

On Fri, Nov 16, 2018 at 1:36 PM Derrick Stolee  wrote:
>
> On 11/14/2018 7:55 PM, sxe...@google.com wrote:
> > From: Stefan Xenos 
> >
> > This document describes what an obsolescence graph for
> > git would look like, the behavior of the evolve command,
> > and the changes planned for other commands.
>
> Thanks for putting this together!
>
> > diff --git a/Documentation/technical/evolve.txt 
> > b/Documentation/technical/evolve.txt
> ...
> > +Git Obsolescence Graph
> > +==
> > +
> > +Objective
> > +-
> > +Track the edits to a commit over time in an obsolescence graph.
>
> The file name and the title are in a mismatch.
>
> I'd prefer if the title was "Git Evolve Design Document" and this
> opening paragraph
> was about the reasons we want a 'git evolve' command. Here is my attempt:
>
>The proposed 'git evolve' command will help users craft a
> high-quality commit
>history in their topic branches. By working to improve commits one at
> a time,
>then running 'git evolve', users can rewrite recent history with more
> options
>than interactive rebase. The core benefit is that users can pause
> their progress
>and move to other branches before returning to where they left off.
> Users can
>also share progress with others using standard 'push', 'fetch', and
> 'format-patch'
>commands.
>
> > +Background
> > +--
>
> Perhaps you can call this "Example"?
>
> > +Imagine you have three dependent changes up for review and you receive 
> > feedback
> > +that requires editing all three changes. While you're editing one, more 
> > feedback
> > +arrives on one of the others. What do you do?
>
> "three dependent changes" sounds a bit vague enough to me to possibly
> confuse readers. Perhaps
> "three sequential patches"?
>
> > +- Users can view the history of a commit directly (the sequence of amends 
> > and
> > +  rebases it has undergone, orthogonal to the history of the branch it is 
> > on).
>
> "the history of a commit" doesn't semantically work, as a commit is an
> immutable Git object.
>
> Instead, I would try to use the term "patch" to describe a change to the
> codebase, and that
> takes the form as a list of commits that are improving on each other
> (but don't actually
> have each other in their commit history). This means that the lifetime
> of a patch is described
> by the commits that are amended or rebased.
>
> > +- By pushing and pulling the obsolescence graph, users can collaborate more
> > +  easily on changes-in-progress. This is better than pushing and pulling 
> > the
> > +  changes themselves since the obsolescence graph can be used to locate a 
> > more
> > +  specific merge base, allowing for better merges between different 
> > versions of
> > +  the same change.
>
> (Making a note so I come back to 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-17 Thread Stefan Xenos
> I am not sure that we necessarily need this to be a graph. I think part
> of the problems with not being able to GC *any* of this is by this
> requirement to have it stored in a graph, rather than having mappings from
> which you could reconstruct any non-GC'ed parts of that graph, if you
> really want.

Sorry, I'm not sure what GC problem you're alluding to here. As far as
I'm aware, this proposal should permit us to GC or retain any subset
of commits that we want. We create a chain of metacommits pointing to
the commits we want to retain, and put a ref in the metas namespace to
cause the chain itself to be retained. If we want to GC a different
subset, we can build a different chain of metacommits and move the ref
(or delete the ref entirely to permit the whole chain to be gc'd).
Could you be more specific about which use-case is problematic?

> Why is this missing most notably `hg evolve`?

Good point. I'll add a brief description and comparison to the doc.

> Also, please do not forget `git imerge`.

Thanks for directing me to this. It looks fantastic! I'm not sure it's
really an alternative to this work, but I could see adding an argument
to "git evolve" that allows you to use imerge for resolving merge
conflicts at any given step.

> Further, I see that this document tries to suggest a proliferation of new 
> commands

It does. Let me explain a bit about the reasoning behind this
breakdown of commands. My main priority was to keep the commands as
consistent with existing git commands as possible. Secondary goals
were:
- Mapping a single intent to a single command where possible makes it
easier to explain what that command does.
- Having lots of simpler commands as opposed to a few complex commands
makes them easier to type.
- Command names are more descriptive than lettered arguments.

Git already has a "log" and "reflog" command for displaying two
different types of log, so putting "obslog" on its own command makes
it consistent with the existing logs, easier to type, and keeps the
command simple.

The "evolve" command updates changes to give them up-to-date parents.
This is a new type of user intent that didn't exist previously in git,
so putting it on its own command keeps things simpler for users. The
relationship between the evolve and change commands is a lot like the
the relationship between the rebase command and the branch commands.
They could technically be combined into one command but I'm not sure
this would help with usability.

The "change" command combines many user intents (create a change,
rename a change, delete a change, etc.) If I were to design it from
scratch, I'd prefer to have all of these things on separate commands.
However, since changes are very similar to branches and users are
presumably already familiar with the branch command, I intentionally
made the change command as close as possible to the branch command -
using the same arguments for the same purpose. In this case, I
sacrificed the single-intent and simple commands goals in order to
retain consistency.

Anyway, that was my reasoning behind the selection of commands. Of
course, I'd welcome feedback - a good UX is the one that was built by
listening to feedback from its intended users. Personally, I don't
consider a proliferation of new commands to be inherently bad (or
inherently good, really). Is there a reason new commands should be
avoided?

Some other alternatives to consider:

- We could turn "obslog" into an extra option on the "log" command,
but that would be inconsistent with reflog and would complicate the
already-complex log command.
- If we were to combine "evolve" with another command, "git rebase
--evolve" would probably be the best candidate. However, this is
longer to type and I tend to prefer lots of simple commands over a few
complex ones. Also, the evolve command will get additional options in
the future (to enable stuff like amend-over-cherry-pick, various
automatic resolution strategies for divergence, etc.)... and putting
it on rebase would mean we'd end up with a lot of extra arguments
whose doc says "this argument is only used if you're also using
--evolve".
- We could break the "change" command into a bunch of simpler ones
"lschange", "mkchange", "rmchange", "mvchange", etc. I actually like
this a lot, but this would make it diverge from the "branch" command
so I'm not sure we should do it unless enough of us feel the same way.
- We could combine the "change" command with the "branch" command. The
branch command could look for the "metas" prefix to determine whether
its argument is a branch or a change -- or it could just search one
namespace followed by the other. This would make for fewer commands,
but I'm concerned it may create confusion by making changes resemble
branches too closely. If you're not already familiar with the
distinction, you may see unexpected behavior when the "branch" you
think you're manipulating turns out to be a change.

  - Stefan

On Thu, Nov 15, 2018 at 4:52 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-16 Thread Junio C Hamano
sxe...@google.com writes:

> +Detailed design
> +===
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit 
> is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p 
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos  1540841596 -0700
> +committer Stefan Xenos  1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git 
> may
> +attach other trees here. For forward-compatibility fsck should ignore such 
> trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should 
> ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.

Am I correct to understand that the reason why a commit object is
(ab|re)used to represent a meta-commit is because by doing so we
would get connectivity (i.e. fetching & pushing would transfer all
the associated objects along) for free, and by not representing it
as a new and different object type, existing implementations can
just pass them along without understanding what they are, and as
long as these are not mixed as parts of the main history of the
project (e.g. when enumerating commits that has aa7ce5 as its
parents, because somebody else obsoleted aa7ce5 and you want to
evolve anything that built on it, you do not want to mistake the
above "meta" commit as a commit that is part of the ordinary history
and rebuild on top of the new version of aa7ce5, which would lead to
a disaster), everything would work just fine?

Perhaps you'd use something like "presence of parent-type header
marks that a commit is a meta-commit and not part of the main
history".

How are these meta commits anchored so that it won't be reclaimed by
repack?  I do not see any "parent" field used to chain them
together, but I do not think we can afford to spend one ref per meta
commit, as refs are not designed to point into each and every object
in the repository.

I have a moderately strong opposition against "origin" thing.  If
aa7ce555 replaces d664309ee, in order for the tool to be able to
"evolve" other histories that build on top of d664309ee, it only
needs the history between aa7ce555 and d664309ee and it would not
matter how aa7ce555 was built relative to its parent.  The user may
have typed/developed it from scratch, the user may have borrowed 70%
of its change from 7e1bbcd while remaining 30% was done from
scratch, or it was a concatenation of the change made in 7e1bbcd and
another commit.  

One half of my point being that we can do _without_ it, and in all
cases, aa7ce555, if leaving the fact that it was derived from
7e1bbcd is so important, can mention that in its log message how it
relates to the "origin" thing.

And the other half is that while I consider the "origin" thing is
unnecessary for the above reasons, having it means we need to not
just transfer the history reading to aa7ce555 and d664309ee (which
are necessary anyway while we have histories to transplant from
d664309ee to aa7ce555) but also have to pull in the history leading
to 7e1bbcd and we cannot discard it.



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-16 Thread Duy Nguyen
On Thu, Nov 15, 2018 at 2:00 AM  wrote:
> +Goals
> +-
> +Legend: Goals marked with P0 are required. Goals marked with Pn should be
> +attempted unless they interfere with goals marked with Pn-1.
> +
> +P0. All commands that modify commits (such as the normal commit --amend or
> +rebase command) should mark the old commit as being obsolete and 
> replaced by
> +the new one. No additional commands should be required to keep the
> +obsolescence graph up-to-date.

I sometimes "modify" a commit by "git reset @^", pick up the changes
then "git commit -c @{1}". I don't think this counts as a typical
modification and is probably hard to detect automatically. But I hope
there's some way for me to tell git "yes this is a modified commit of
that one, record that!".

> +Example usage
> +-
> +# First create three dependent changes
> +$ echo foo>bar.txt && git add .
> +$ git commit -m "This is a test"
> +created change metas/this_is_a_test

I guess as an example, how the name metas/this_is_a_test is
constructed does not matter much. But it's probably better to stick
with some sort of id because subject line will change over time and
the original one may become irrelevant. Perhaps we could use the
original commit id as name.

> +$ echo foo2>bar2.txt && git add .
> +$ git commit -m "This is also a test"
> +created change metas/this_is_also_a_test
> +$ echo foo3>bar3.txt && git add .
> +$ git commit -m "More testing"
> +created change metas/more_testing
> +
> +# List all our changes in progress
> +$ git change -l
> +metas/this_is_a_test
> +metas/this_is_also_a_test
> +* metas/more_testing
> +metas/some_change_already_merged_upstream
> +
> +# Now modify the earliest change, using its stable name
> +$ git reset --hard metas/this_is_a_test
> +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit
> +
> +# Use git-evolve to fix up any dependent changes
> +$ git evolve
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +Done
> +
> +# Use git-obslog to view the history of the this_is_a_test change
> +$ git obslog
> +93f110 metas/this_is_a_test@{0} commit (amend): This is a test
> +930219 metas/this_is_a_test@{1} commit: This is a test
> +
> +# Now create an unrelated change
> +$ git reset --hard origin/master
> +$ echo newchange>unrelated.txt && git add .
> +$ git commit -m "Unrelated change"
> +created change metas/unrelated_change
> +
> +# Fetch the latest code from origin/master and use git-evolve
> +# to rebase all dependent changes.
> +$ git fetch origin master
> +$ git evolve origin/master
> +deleting metas/some_change_already_merged_upstream
> +rebasing metas/this_is_a_test onto origin/master
> +rebasing metas/this_is_also_a_test onto metas/this_is_a_test
> +rebasing metas/more_testing onto metas/this_is_also_a_test
> +rebasing metas/unrelated_change onto origin/master
> +Conflict detected! Resolve it and then use git evolve --continue to resume.
> +
> +# Sort out the conflict
> +$ git mergetool
> +$ git evolve --continue
> +Done
> +
> +# Share the full history of edits for the this_is_a_test change
> +# with a review server
> +$ git push origin metas/this_is_a_test:refs/for/master
> +# Share the lastest commit for “Unrelated change”, without history
> +$ git push origin HEAD:refs/for/master

How do we group changes of a topic together? I think branch-diff could
take advantage of that.

> +Detailed design
> +===
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit 
> is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p 
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos  1540841596 -0700
> +committer Stefan Xenos  1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.

This feels a bit forced. Could we just organize it like a normal
history? Something like

*
|\
| * last version of the commit
*
|\
| * second last version of the commit
*
|\

Basically all commits will be linked in a new merge history. Real
commits are on the second parent, first parent is to link changes
together. This makes it possible to just use "git log --first-parent
--patch" (or "git log --oneline --graph") to examine the change. More
details (e.g. parent-type) could be stored as normal trailers in the
commit message of these merges.
-- 
Duy


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-16 Thread Derrick Stolee

On 11/14/2018 7:55 PM, sxe...@google.com wrote:

From: Stefan Xenos 

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.


Thanks for putting this together!


diff --git a/Documentation/technical/evolve.txt 
b/Documentation/technical/evolve.txt

...

+Git Obsolescence Graph
+==
+
+Objective
+-
+Track the edits to a commit over time in an obsolescence graph.


The file name and the title are in a mismatch.

I'd prefer if the title was "Git Evolve Design Document" and this 
opening paragraph

was about the reasons we want a 'git evolve' command. Here is my attempt:

  The proposed 'git evolve' command will help users craft a 
high-quality commit
  history in their topic branches. By working to improve commits one at 
a time,
  then running 'git evolve', users can rewrite recent history with more 
options
  than interactive rebase. The core benefit is that users can pause 
their progress
  and move to other branches before returning to where they left off. 
Users can
  also share progress with others using standard 'push', 'fetch', and 
'format-patch'

  commands.


+Background
+--


Perhaps you can call this "Example"?


+Imagine you have three dependent changes up for review and you receive feedback
+that requires editing all three changes. While you're editing one, more 
feedback
+arrives on one of the others. What do you do?


"three dependent changes" sounds a bit vague enough to me to possibly 
confuse readers. Perhaps

"three sequential patches"?


+- Users can view the history of a commit directly (the sequence of amends and
+  rebases it has undergone, orthogonal to the history of the branch it is on).


"the history of a commit" doesn't semantically work, as a commit is an 
immutable Git object.


Instead, I would try to use the term "patch" to describe a change to the 
codebase, and that
takes the form as a list of commits that are improving on each other 
(but don't actually
have each other in their commit history). This means that the lifetime 
of a patch is described

by the commits that are amended or rebased.


+- By pushing and pulling the obsolescence graph, users can collaborate more
+  easily on changes-in-progress. This is better than pushing and pulling the
+  changes themselves since the obsolescence graph can be used to locate a more
+  specific merge base, allowing for better merges between different versions of
+  the same change.


(Making a note so I come back to this. I hope to learn what you mean by 
this "more specific

merge base".)


+
+Similar technologies
+
... It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased


Perhaps this could be made more concrete by describing commit history 
and a specific workflow

change using 'git evolve'.

Suppose we have two topic branches, topic1 and topic2, that point to 
commits A and B,
respectively.Suppose further that A and B have a common parent C with 
parent D. If we rebase
topic1 relativeto D, then we create new commits C' and A' that are newer 
versions of commits
C and A. It would benice to easily update topic2 to be on a new commit 
B' with parent C'.
Currently, a user needs to knowthat C updated to C', and use 'git rebase 
--onto C' C topic2'.
Instead, if we have a marker showing thatC' is an updated version of C, 
'git log topic2'
would show that topic2 can be updated, and the 'gitevolve' command would 
perform the correct

action to make B' with parent C'.

(This paragraph above is an example of "what can happen now is 
complicated and demands that
the user keep some information in their memory" and "the new workflow is 
simpler and helps
users make the right decision". I think we could use more of these at 
the start to sell the

idea.)



+and won't let you collaborate with others on resolving a complicated 
interactive
+rebase.


In the same sentence, we have an even more complicated workflow 
mentioned as an aside. This
could be fleshed out more concretely. It could help describing that the 
current model is for
usersto share "!fixup" commits and then one performs an interactive 
rebase to apply those
fixups inthe correct order. If a user instead shares an amended commit, 
then we are in a
difficult state toapply those changes. The new workflow would be to 
share amended commits

and 'git evolve'inserts the correct amended commits in the right order.

I'm a big proponent of the teaching philosophy of "examples first". It's 
easier to talk

abstractlyafter going through some concrete examples.


  You can think of rebase -i as a top-down approach and the evolve command
+as the bottom-up approach to the same problem.


This comparison is important. Perhaps it is more specific to say 
"interactive rebase splits
a plan torewrite history into independent units of work, while evolve 
collects independent


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-15 Thread Ævar Arnfjörð Bjarmason


On Thu, Nov 15 2018, sxe...@google.com wrote:

> +Detailed design
> +===
> +Obsolescence information is stored as a graph of meta-commits. A meta-commit 
> is
> +a specially-formatted merge commit that describes how one commit was created
> +from others.
> +
> +Meta-commits look like this:
> +
> +$ git cat-file -p 
> +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> +parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> +author Stefan Xenos  1540841596 -0700
> +committer Stefan Xenos  1540841596 -0700
> +parent-type content
> +parent-type obsolete
> +parent-type origin
> +
> +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by
> +cherry-picking commit 7e1bbcd3”.
> +
> +The tree for meta-commits is always the empty tree whose hash matches
> +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git 
> may
> +attach other trees here. For forward-compatibility fsck should ignore such 
> trees
> +if found on future repository versions. Similarly, current versions of git
> +should always fill in an empty commit comment and tools like fsck should 
> ignore
> +the content of the commit comment if present in a future repository version.
> +This will allow future versions of git to add metadata to the meta-commit
> +comments or tree without breaking forwards compatibility.
> +
> +Parent-type
> +---
> +The “parent-type” field in the commit header identifies a commit as a
> +meta-commit and indicates the meaning for each of its parents. It is never
> +present for normal commits. It is a list of enum values whose order matches 
> the
> +order of the parents. Possible parent types are:
> +
> +- content: the content parent identifies the commit that this meta-commit is
> +  describing.
> +- obsolete: indicates that this parent is made obsolete by the content 
> parent.
> +- origin: indicates that this parent was generated from the given commit.
> +
> +There must be exactly one content parent for each meta-commit and it is 
> always
> +be the first parent. The content commit will always be a normal commit and 
> not a
> +meta-commit. However, future versions of git may create meta-commits for 
> other
> +meta-commits and the fsck tool must be aware of this for forwards 
> compatibility.
> +
> +A meta-commit can have zero or more obsolete parents. An amend operation 
> creates
> +a single obsolete parent. A merge used to resolve divergence (see divergence,
> +below) will create multiple obsolete parents. A meta-commit may have zero
> +obsolete parents if it describes a cherry-pick or squash merge that copies 
> one
> +or more commits but does not replace them.
> +
> +A meta-commit can have zero or more origin parents. A cherry-pick creates a
> +single origin parent. Certain types of squash merge will create multiple 
> origin
> +parents.
> +
> +An obsolete parent or origin parent may be either a normal commit (indicating
> +the oldest-known version of a change) or another meta-commit (for a change 
> that
> +has already been modified one or more times).

I think it's worth pointing out for those that are rusty on commit
object details (but I checked) is that the reason for it not being:

tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent aa7ce55545bf2c14bef48db91af1a74e2347539a
parent-type content
parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
parent-type obsolete
parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
parent-type origin
author Stefan Xenos  1540841596 -0700
committer Stefan Xenos  1540841596 -0700

Which would be easier to read, is that we're very sensitive to the order
of the first few fields (tree -> parent -> author -> committer) and fsck
will error out if we interjected a new field.


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-15 Thread Johannes Schindelin
Hi Stefan,

On Wed, 14 Nov 2018, sxe...@google.com wrote:

> From: Stefan Xenos 
> 
> This document describes what an obsolescence graph for
> git would look like, the behavior of the evolve command,
> and the changes planned for other commands.

Thanks, this is a good discussion starter.

> +Objective
> +-
> +Track the edits to a commit over time in an obsolescence graph.

I am not sure that we necessarily need this to be a graph. I think part of
the problems with not being able to GC *any* of this is by this
requirement to have it stored in a graph, rather than having mappings from
which you could reconstruct any non-GC'ed parts of that graph, if you
really want.

> +Background
> +--
> +Imagine you have three dependent changes up for review and you receive 
> feedback
> +that requires editing all three changes. While you're editing one, more 
> feedback
> +arrives on one of the others. What do you do?
> +
> +The evolve command is a convenient way to work with chains of commits that 
> are
> +under review. Whenever you rebase or amend a commit, the repository remembers
> +that the old commit is obsolete and has been replaced by the new one. Then, 
> at
> +some point in the future, you can run "git evolve" and the correct sequence 
> of
> +rebases will occur in the correct order such that no commit has an obsolete
> +parent.
> +
> +Part of making the "evolve" command work involves tracking the edits to a 
> commit
> +over time, which is why we need an obsolescence graph. However, the 
> obsolescence
> +graph will also bring other benefits:
> +
> +- Users can view the history of a commit directly (the sequence of amends and
> +  rebases it has undergone, orthogonal to the history of the branch it is 
> on).
> +- It will be possible to quickly locate and list all the changes the user
> +  currently has in progress.
> +- It can be used as part of other high-level commands that combine or split
> +  changes.
> +- It can be used to decorate commits (in git log, gitk, etc) that are either
> +  obsolete or are the tip of a work in progress.
> +- By pushing and pulling the obsolescence graph, users can collaborate more
> +  easily on changes-in-progress. This is better than pushing and pulling the
> +  changes themselves since the obsolescence graph can be used to locate a 
> more
> +  specific merge base, allowing for better merges between different versions 
> of
> +  the same change.
> +- It could be used to correctly rebase local changes and other local branches
> +  after running git-filter-branch.
> +- It can replace the change-id footer used by gerrit.

Okay.

> +Similar technologies
> +
> +There are some other technologies that address the same end-user problem.
> +
> +Rebase -i can be used to solve the same problem, but users can't easily 
> switch
> +tasks midway through an interactive rebase or have more than one interactive
> +rebase going on at the same time. It can't handle the case where you have
> +multiple changes sharing the same parent when that parent needs to be rebased
> +and won't let you collaborate with others on resolving a complicated 
> interactive
> +rebase. You can think of rebase -i as a top-down approach and the evolve 
> command
> +as the bottom-up approach to the same problem.
> +
> +Several patch queue managers have been built on top of git (such as topgit,
> +stgit, and quilt). They address the same user need. However they also rely on
> +state managed outside git that needs to be kept in sync. Such state can be
> +easily damaged when running a git native command that is unaware of the patch
> +queue. They also typically require an explicit initialization step to be 
> done by
> +the user which creates workflow problems.
> +
> +Replacements (refs/replace) are superficially similar to obsolescences in 
> that
> +they describe that one commit should be replaced by another. However, they
> +differ in both how they are created and how they are intended to be used.
> +Obsolescences are created automatically by the commands a user runs, and they
> +describe the user’s intent to perform a future rebase. Obsolete commits still
> +appear in branches, logs, etc like normal commits (possibly with an extra
> +decoration that marks them as obsolete). Replacements are typically created
> +explicitly by the user, they are meant to be kept around for a long time, and
> +they describe a replacement to be applied at read-time rather than as the 
> input
> +to a future operation. When a replaced commit is queried, it is typically 
> hidden
> +and swapped out with its replacement as though the replacement has already
> +occurred.

Why is this missing most notably `hg evolve`? Also, there should be *at
least* a brief introduction how `hg evolve` works. They do have the
benefit of real-world testing, and probably encountered problems and came
up with solutions, and we would be remiss if we did not learn from them.

Also, please do not forget `git imerge`.


[PATCH] technical doc: add a design doc for the evolve command

2018-11-14 Thread sxenos
From: Stefan Xenos 

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.

Signed-off-by: Stefan Xenos 
---
 Documentation/technical/evolve.txt | 885 +
 1 file changed, 885 insertions(+)
 create mode 100644 Documentation/technical/evolve.txt

diff --git a/Documentation/technical/evolve.txt 
b/Documentation/technical/evolve.txt
new file mode 100644
index 00..88470eada3
--- /dev/null
+++ b/Documentation/technical/evolve.txt
@@ -0,0 +1,885 @@
+Git Obsolescence Graph
+==
+
+Objective
+-
+Track the edits to a commit over time in an obsolescence graph.
+
+Background
+--
+Imagine you have three dependent changes up for review and you receive feedback
+that requires editing all three changes. While you're editing one, more 
feedback
+arrives on one of the others. What do you do?
+
+The evolve command is a convenient way to work with chains of commits that are
+under review. Whenever you rebase or amend a commit, the repository remembers
+that the old commit is obsolete and has been replaced by the new one. Then, at
+some point in the future, you can run "git evolve" and the correct sequence of
+rebases will occur in the correct order such that no commit has an obsolete
+parent.
+
+Part of making the "evolve" command work involves tracking the edits to a 
commit
+over time, which is why we need an obsolescence graph. However, the 
obsolescence
+graph will also bring other benefits:
+
+- Users can view the history of a commit directly (the sequence of amends and
+  rebases it has undergone, orthogonal to the history of the branch it is on).
+- It will be possible to quickly locate and list all the changes the user
+  currently has in progress.
+- It can be used as part of other high-level commands that combine or split
+  changes.
+- It can be used to decorate commits (in git log, gitk, etc) that are either
+  obsolete or are the tip of a work in progress.
+- By pushing and pulling the obsolescence graph, users can collaborate more
+  easily on changes-in-progress. This is better than pushing and pulling the
+  changes themselves since the obsolescence graph can be used to locate a more
+  specific merge base, allowing for better merges between different versions of
+  the same change.
+- It could be used to correctly rebase local changes and other local branches
+  after running git-filter-branch.
+- It can replace the change-id footer used by gerrit.
+
+Similar technologies
+
+There are some other technologies that address the same end-user problem.
+
+Rebase -i can be used to solve the same problem, but users can't easily switch
+tasks midway through an interactive rebase or have more than one interactive
+rebase going on at the same time. It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased
+and won't let you collaborate with others on resolving a complicated 
interactive
+rebase. You can think of rebase -i as a top-down approach and the evolve 
command
+as the bottom-up approach to the same problem.
+
+Several patch queue managers have been built on top of git (such as topgit,
+stgit, and quilt). They address the same user need. However they also rely on
+state managed outside git that needs to be kept in sync. Such state can be
+easily damaged when running a git native command that is unaware of the patch
+queue. They also typically require an explicit initialization step to be done 
by
+the user which creates workflow problems.
+
+Replacements (refs/replace) are superficially similar to obsolescences in that
+they describe that one commit should be replaced by another. However, they
+differ in both how they are created and how they are intended to be used.
+Obsolescences are created automatically by the commands a user runs, and they
+describe the user’s intent to perform a future rebase. Obsolete commits still
+appear in branches, logs, etc like normal commits (possibly with an extra
+decoration that marks them as obsolete). Replacements are typically created
+explicitly by the user, they are meant to be kept around for a long time, and
+they describe a replacement to be applied at read-time rather than as the input
+to a future operation. When a replaced commit is queried, it is typically 
hidden
+and swapped out with its replacement as though the replacement has already
+occurred.
+
+Goals
+-
+Legend: Goals marked with P0 are required. Goals marked with Pn should be
+attempted unless they interfere with goals marked with Pn-1.
+
+P0. All commands that modify commits (such as the normal commit --amend or
+rebase command) should mark the old commit as being obsolete and replaced 
by
+the new one. No additional commands should be required to keep the
+obsolescence graph up-to-date.
+P0. Any commit that may be involved in a future evolve