Re: [PATCH] technical doc: add a design doc for the evolve command
> I don't have a strong opinion about whether this would go in the > design doc. I suppose the doc could have an "implementation plan" > section describing temporary stopping points on the way to the final > result, but it's not necessary to include that. As long as this is something I'm just doing for fun and nobody needs to coordinate anything with me, I was planning to just document the endpoint and then work on whatever seems interesting at any given moment. Of course, if I found a job/team that would let me do this as my day job, I'd be more willing to commit to deliverables. - Stefan On Tue, Nov 20, 2018 at 5:33 PM Jonathan Nieder wrote: > > Stefan Xenos wrote: > > Jonathan Nieder wrote: > > >> putting it in the commit message is a way to > >> experiment with the workflow without changing the object format > > > > As long as we're talking about a temporary state of affairs for users > > that have opted in, and we're explicit about the fact that future > > versions of git won't understand the change graphs that are produced > > during that temporary state of affairs, I'm fine with using the commit > > message. We can move it to the header prior to enabling the feature by > > default. > > Yay! I think that addresses both my and Ævar's concerns. Also, if > you run into an issue that requires changing the object format > earlier, that's fine and we can handle the situation when it comes. > > I don't have a strong opinion about whether this would go in the > design doc. I suppose the doc could have an "implementation plan" > section describing temporary stopping points on the way to the final > result, but it's not necessary to include that. > > Thanks for the quick and thoughtful replies. > > Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
Hi Stefan On 20/11/2018 20:24, Stefan Xenos wrote: >> If a merge has been cherry-picked we cannot update it as we don't record >> which parent was used for the pick, however it is probably not a problem >> in practice - I think it is unusual to amend merges. > > I've read and reread that sentence several times and don't fully > understand it. Could you elaborate? Sorry if I wasn't very clear. To cherry-pick (or revert) a merge commit one has to specify a parent of the commit being picked with -m for cherry-pick to use as the merge base for the three way merge that creates the new commit. If the original merge is updated then evolve wont know which parent to use as the merge base when evolving the cherry-picked version of the merge as the parent is not recorded in the meta commit. > It sounds scary, though. With the evolve command, amending merges will > need to be supported. Evolving a merge should be fine, I was just referring to merges that have been cherry-picked. Best Wishes Phillip (Thanks for your reply to my other message, I'm still digesting it at the moment, once I've done that and found the references to mercurial using commit obsolescence information in rebase and pull I'll reply.) > If you create a merge and then amend one of its > parent commits, the evolve command will need to rebase the merge and > point one or both parents to the replacement instead. > > - Stefan > On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood > wrote: >> >> On 15/11/2018 00:55, sxe...@google.com wrote: >>> From: Stefan Xenos >>> >>> +Obsolescence across cherry-picks >>> + >>> +By default the evolve command will treat cherry-picks and squash merges as >>> being >>> +completely separate from the original. Further amendments to the original >>> commit >>> +will have no effect on the cherry-picked copy. However, this behavior may >>> not be >>> +desirable in all circumstances. >>> + >>> +The evolve command may at some point support an option to look for cases >>> where >>> +the source of a cherry-pick or squash merge has itself been amended, and >>> +automatically apply that same change to the cherry-picked copy. In such >>> cases, >>> +it would traverse origin edges rather than ignoring them, and would treat a >>> +commit with origin edges as being obsolete if any of its origins were >>> obsolete. >> >> If a merge has been cherry-picked we cannot update it as we don't record >> which parent was used for the pick, however it is probably not a problem >> in practice - I think it is unusual to amend merges. >> >> Best Wishes >> >> Phillip
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos wrote: > Jonathan Nieder wrote: >> putting it in the commit message is a way to >> experiment with the workflow without changing the object format > > As long as we're talking about a temporary state of affairs for users > that have opted in, and we're explicit about the fact that future > versions of git won't understand the change graphs that are produced > during that temporary state of affairs, I'm fine with using the commit > message. We can move it to the header prior to enabling the feature by > default. Yay! I think that addresses both my and Ævar's concerns. Also, if you run into an issue that requires changing the object format earlier, that's fine and we can handle the situation when it comes. I don't have a strong opinion about whether this would go in the design doc. I suppose the doc could have an "implementation plan" section describing temporary stopping points on the way to the final result, but it's not necessary to include that. Thanks for the quick and thoughtful replies. Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
> putting it in the commit message is a way to > experiment with the workflow without changing the object format As long as we're talking about a temporary state of affairs for users that have opted in, and we're explicit about the fact that future versions of git won't understand the change graphs that are produced during that temporary state of affairs, I'm fine with using the commit message. We can move it to the header prior to enabling the feature by default. - Stefan On Tue, Nov 20, 2018 at 2:06 PM Jonathan Nieder wrote: > > Stefan Xenos wrote: > > On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason > > wrote: > > >> I think it sounds better to just make it, in the header: > >> > >> x-evolve-pt content > >> x-evolve-pt obsolete > >> x-evolve-pt origin > >> > >> Where "pt = parent-type", we could of course spell that out too, but in > >> this case it's "x-evolve-pt" is the exact same number of bytes as > >> "parent-type", so nobody can object that it takes more space:) > >> > >> We'd then carry some documentation where we say everything except "x-*-" > >> is reserved, and that we'd like to know about new "*" there before it's > >> used, so it can be documented. > [...] > > that should > > probably be the subject of a separate proposal (who owns the content > > of a namespace, what is the process for adding a new namespace or a > > new attribute within a namespace, what order should the header > > attributes appear in, what problem is namespacing there to solve, when > > do we use a namespaced attribute versus a "reserved" attribute, etc.). > > Agreed. There are reasons that I prefer not to go in this direction, > but regardless, it would be the subject of a separate thread if you want > to pursue it. > > >> Putting it in the commit message just sounds like a hack around not > >> having namespaced headers. If we'd like to keep this then tools would > >> need to parse both (potentially unpacking a lot of the commit message > >> object, it can be quite big in some cases...). > > On the contrary: putting it in the commit message is a way to > experiment with the workflow without changing the object format at > all. > > I don't think we should underestimate the value of that ability. > > I don't understand what you're referring to by parsing both. Are you > saying that if the experiment proves successful, we wouldn't be able > to migrate completely to a new format? That sounds worrying to me --- > I want the ability to experiment and to act on what we learn from an > experiment, including when it touches on formats. > > Thanks, > Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos wrote: > On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason > wrote: >> I think it sounds better to just make it, in the header: >> >> x-evolve-pt content >> x-evolve-pt obsolete >> x-evolve-pt origin >> >> Where "pt = parent-type", we could of course spell that out too, but in >> this case it's "x-evolve-pt" is the exact same number of bytes as >> "parent-type", so nobody can object that it takes more space:) >> >> We'd then carry some documentation where we say everything except "x-*-" >> is reserved, and that we'd like to know about new "*" there before it's >> used, so it can be documented. [...] > that should > probably be the subject of a separate proposal (who owns the content > of a namespace, what is the process for adding a new namespace or a > new attribute within a namespace, what order should the header > attributes appear in, what problem is namespacing there to solve, when > do we use a namespaced attribute versus a "reserved" attribute, etc.). Agreed. There are reasons that I prefer not to go in this direction, but regardless, it would be the subject of a separate thread if you want to pursue it. >> Putting it in the commit message just sounds like a hack around not >> having namespaced headers. If we'd like to keep this then tools would >> need to parse both (potentially unpacking a lot of the commit message >> object, it can be quite big in some cases...). On the contrary: putting it in the commit message is a way to experiment with the workflow without changing the object format at all. I don't think we should underestimate the value of that ability. I don't understand what you're referring to by parsing both. Are you saying that if the experiment proves successful, we wouldn't be able to migrate completely to a new format? That sounds worrying to me --- I want the ability to experiment and to act on what we learn from an experiment, including when it touches on formats. Thanks, Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
> If a merge has been cherry-picked we cannot update it as we don't record > which parent was used for the pick, however it is probably not a problem > in practice - I think it is unusual to amend merges. I've read and reread that sentence several times and don't fully understand it. Could you elaborate? It sounds scary, though. With the evolve command, amending merges will need to be supported. If you create a merge and then amend one of its parent commits, the evolve command will need to rebase the merge and point one or both parents to the replacement instead. - Stefan On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood wrote: > > On 15/11/2018 00:55, sxe...@google.com wrote: > > From: Stefan Xenos > > > > +Obsolescence across cherry-picks > > + > > +By default the evolve command will treat cherry-picks and squash merges as > > being > > +completely separate from the original. Further amendments to the original > > commit > > +will have no effect on the cherry-picked copy. However, this behavior may > > not be > > +desirable in all circumstances. > > + > > +The evolve command may at some point support an option to look for cases > > where > > +the source of a cherry-pick or squash merge has itself been amended, and > > +automatically apply that same change to the cherry-picked copy. In such > > cases, > > +it would traverse origin edges rather than ignoring them, and would treat a > > +commit with origin edges as being obsolete if any of its origins were > > obsolete. > > If a merge has been cherry-picked we cannot update it as we don't record > which parent was used for the pick, however it is probably not a problem > in practice - I think it is unusual to amend merges. > > Best Wishes > > Phillip
Re: [PATCH] technical doc: add a design doc for the evolve command
> This explains why we have 'origin' fields in the meta commits, it might > be worth putting a forward reference or note earlier on to explain why > recording the origin is useful. (I didn't find gerrit needs it very > convincing on its own but it is actually more general than gerrit's > specific use case) I'll add the forward reference. TBH, gerrit is the main reason I added it - so I'm interested in why you didn't find the gerrit use-case convincing. Can you elaborate? (If there's some other way around the gerrit requirement, we might not need the origin parents) > Should this be meta/mychange:refs/for/master or have I missed something? It should be metas/mychange/ It's already fixed in the v2 patch. I really wanted to use the namespace "changes", but gerrit is squatting on that. I tried "change", but that brakes the plural naming scheme and may get confused with gerrit's namespace, so I settled on "metas". > I think it would make sense to have this next to the sections on commit > --amend and merge I was wondering what about rebase when I was reading > those sections. Will do. > I'm a bit confused why it is creating a meta ref per commit rather than > one for the current branch. I tried to explain that later in the doc. meta refs serve two purposes - they act as stable names for changes (or at least the commits at the head of each change) and they point to the metacommits that are currently in use. For both purposes, we need a ref per commit. For the "stable name" case, this should be obvious - something that just points to a branch couldn't provide different names for each commit on that branch. The metacommit case is less obvious - the set of metacommits for one change aren't connected to the metacommits for any other change. The "parents" of a metacommit are older versions of the same change. They don't point to the metacommits from the parent change. That means that there is no single ref we could create for a branch that would reach all the necessary metacommits. > I got the impression they had put quite a lot of effort > into having evolve automatically run and resolve divergences when > pulling and rebasing, is there a long term plan for git to do the same? IMO, we should add anything to the plan if doing so improves the workflow of our users... but it sounds like you're referring to mercurial features I've never used. Could you point me to specific docs on the feature you want and/or make a concrete suggestion about how it might work? I never use pull so it slipped my mind. It would probably make sense to have the option of doing an automatic evolve after pull (actually, once the feature is stable, most users would probably want it to be the default). How do you think it should be triggered? "git pull --evolve"? or perhaps "git pull --rebase=evolve"? We should probably also introduce a new "evolve" enum value to branch..rebase config value. I'll use "--evolve" for now. If may make sense to add "--evolve" to every git command that performs an automatic evolve when done. > What happens if the original commit are currently checked out with local > changes? For a start, I'll probably just display an error message if the current working tree is dirty ("Please stash"). Long term, I'd like it to work like rebase --autostash. It should stash your changes, do the evolve, return to the evolved version of the original change, and reapply the stash. I'll add this to the doc. > Can I suggest using refs/remote//metas. I Ooh! Great idea! I'll update the doc. > I think this could be useful (although I guess you can get the branches > you've been working on recently from HEAD's reflog quite easily). The changes list is different from the reflog. It's a list of all your unsubmitted patches - regardless of their age or what branch they're on. They may not have corresponding branches: you may have been working on them with a detached head, or there may be multiple changes on the same branch. You might not have visited them recently, in which case they wouldn't be in the reflog at all. You may have reset to an older version of the change, in which case they'd be in the reflog but the reflog and change point to different places. If you've used gerrit before, the "changes" list will contain pretty much the same content as the gerrit dashboard, except that it works locally. >> +Much like a merge conflict, divergence is a situation that requires user >> +intervention to resolve. The evolve command will stop when it encounters >> +divergence and prompt the user to resolve the problem. Users can solve the >> +problem in several ways: >> + >> +- Discard one of the changes (by deleting its change branch). >> +- Merge the two changes (producing a single change branch). > >I assume this wont create merge commits for the actual commits though, >just merge the meta branches and create some new commits that are each >the result of something like 'merge-recursive original-commit >our-new-version
Re: [PATCH] technical doc: add a design doc for the evolve command
This sounds like a proposal for general namespacing. I like it - that would pave the way for other header extensions - but that should probably be the subject of a separate proposal (who owns the content of a namespace, what is the process for adding a new namespace or a new attribute within a namespace, what order should the header attributes appear in, what problem is namespacing there to solve, when do we use a namespaced attribute versus a "reserved" attribute, etc.). x-evolve-pt seems reasonable to me. If you're keen on this and want to document the namespacing proposal, I'll conform to it. However, if don't have formal rules for namespaces in place yet it might be better to avoid the use of an x- prefix for now, just in case I accidentally squat on a name that breaks whatever namespacing rules we eventually come up with. Since we're talking bytes, a more compact representation of parent-type could use single-letter codes: x-evolve-pt c r o (where c=content, r=replace/obsolete, o=origin) - Stefan On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason wrote: > > > On Tue, Nov 20 2018, Jonathan Nieder wrote: > > > Ævar Arnfjörð Bjarmason wrote: > >> On Thu, Nov 15 2018, sxe...@google.com wrote: > > > >>> +Parent-type > >>> +--- > >>> +The “parent-type” field in the commit header identifies a commit as a > >>> +meta-commit and indicates the meaning for each of its parents. It is > >>> never > >>> +present for normal commits. > > [...] > >> I think it's worth pointing out for those that are rusty on commit > >> object details (but I checked) is that the reason for it not being: > >> > >> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > >> parent aa7ce55545bf2c14bef48db91af1a74e2347539a > >> parent-type content > >> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > >> parent-type obsolete > >> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > >> parent-type origin > >> author Stefan Xenos 1540841596 -0700 > >> committer Stefan Xenos 1540841596 -0700 > >> > >> Which would be easier to read, is that we're very sensitive to the order > >> of the first few fields (tree -> parent -> author -> committer) and fsck > >> will error out if we interjected a new field. > > > > By the way, in the spirit of limiting the initial scope, I wonder > > whether the parent-type fields can be stored in the commit message > > initially. > > > > Elsewhere in this thread it was mentioned that the parent-type is a > > field to allow tools like "git fsck" to understand the meaning of > > these parent relationships (for example, to forbid a commit > > referencing a meta-commit). The same could be done using special > > commit message text, though. > > > > The advantage of such an approach would be that we could experiment > > without changing the official object format at all. If experiments > > revealed a different set of information to store, we could update the > > format without having to maintain the memory of the older format in > > "git fsck"'s understanding of commit object fields. So even though I > > think that in the end we would want to put this information in the > > commit object header, I'm tempted to suspect that the benefits of > > putting it in the commit message to start outweigh the costs (in > > particular, of having to migrate to another format later). > > I think it sounds better to just make it, in the header: > > x-evolve-pt content > x-evolve-pt obsolete > x-evolve-pt origin > > Where "pt = parent-type", we could of course spell that out too, but in > this case it's "x-evolve-pt" is the exact same number of bytes as > "parent-type", so nobody can object that it takes more space:) > > We'd then carry some documentation where we say everything except "x-*-" > is reserved, and that we'd like to know about new "*" there before it's > used, so it can be documented. > > Putting it in the commit message just sounds like a hack around not > having namespaced headers. If we'd like to keep this then tools would > need to parse both (potentially unpacking a lot of the commit message > object, it can be quite big in some cases...).
Re: [PATCH] technical doc: add a design doc for the evolve command
> I was trying to see if this is something we can leave out to limit the > initial scope. Oh, in that case, "yes". :-) If there's a need to cut something, origin parents would be a viable candidate. I was thinking that this file could document the final goal so that if anyone else wanted to contribute to the implementation, we would be heading in the same direction. It seems reasonable that an early implementation may omit origin parents. Since the actual implementation will lag behind the spec, I'll add a status section to the top of the document where we can describe the delta between plan and implementation. Also, I'm now convinced we're talking about the same thing. :-) > > Are you claiming that this is undesirable, or are you claiming that > > this could be accomplished without origin parents? > > I was trying to see if this is something we can leave out to limit > the initial scope.
Re: [PATCH] technical doc: add a design doc for the evolve command
On 15/11/2018 00:55, sxe...@google.com wrote: From: Stefan Xenos +Obsolescence across cherry-picks + +By default the evolve command will treat cherry-picks and squash merges as being +completely separate from the original. Further amendments to the original commit +will have no effect on the cherry-picked copy. However, this behavior may not be +desirable in all circumstances. + +The evolve command may at some point support an option to look for cases where +the source of a cherry-pick or squash merge has itself been amended, and +automatically apply that same change to the cherry-picked copy. In such cases, +it would traverse origin edges rather than ignoring them, and would treat a +commit with origin edges as being obsolete if any of its origins were obsolete. If a merge has been cherry-picked we cannot update it as we don't record which parent was used for the pick, however it is probably not a problem in practice - I think it is unusual to amend merges. Best Wishes Phillip
Re: [PATCH] technical doc: add a design doc for the evolve command
On 20/11/2018 12:18, Phillip Wood wrote: On 15/11/2018 00:55, sxe...@google.com wrote: From: Stefan Xenos +Divergence +-- +From the user’s perspective, two changes are divergent if they both ask for +different replacements to the same commit. More precisely, a target commit is +considered divergent if there is more than one commit at the head of a change in +refs/metas that leads to the target commit via an unbroken chain of “obsolete” +edges. + +Much like a merge conflict, divergence is a situation that requires user +intervention to resolve. The evolve command will stop when it encounters +divergence and prompt the user to resolve the problem. Users can solve the +problem in several ways: + +- Discard one of the changes (by deleting its change branch). +- Merge the two changes (producing a single change branch). I assume this wont create merge commits for the actual commits though, just merge the meta branches and create some new commits that are each the result of something like 'merge-recursive original-commit our-new-version their-new-version' That should have been merge-recursive original-commit^ -- our-new-version their-new-version Best Wishes Phillip
Re: [PATCH] technical doc: add a design doc for the evolve command
Hi Stefan Thanks for working on this, I think it could be a really useful addition to git. I'd echo Gábor's comments about making commands descriptive and easy for the user to find, git has aliases, accepts abbreviated option names and has shell completion so I don't think typing is really such a problem. From your reply it looks like you've taken those concerns on board. I've got some more comments below. On 15/11/2018 00:55, sxe...@google.com wrote: From: Stefan Xenos This document describes what an obsolescence graph for git would look like, the behavior of the evolve command, and the changes planned for other commands. Signed-off-by: Stefan Xenos --- Documentation/technical/evolve.txt | 885 + 1 file changed, 885 insertions(+) create mode 100644 Documentation/technical/evolve.txt diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt new file mode 100644 index 00..88470eada3 --- /dev/null +++ b/Documentation/technical/evolve.txt @@ -0,0 +1,885 @@ +Git Obsolescence Graph +== + +Objective +- +Track the edits to a commit over time in an obsolescence graph. + +Background +-- +Imagine you have three dependent changes up for review and you receive feedback +that requires editing all three changes. While you're editing one, more feedback +arrives on one of the others. What do you do? + +The evolve command is a convenient way to work with chains of commits that are +under review. Whenever you rebase or amend a commit, the repository remembers +that the old commit is obsolete and has been replaced by the new one. Then, at +some point in the future, you can run "git evolve" and the correct sequence of +rebases will occur in the correct order such that no commit has an obsolete +parent. + +Part of making the "evolve" command work involves tracking the edits to a commit +over time, which is why we need an obsolescence graph. However, the obsolescence +graph will also bring other benefits: + +- Users can view the history of a commit directly (the sequence of amends and + rebases it has undergone, orthogonal to the history of the branch it is on). +- It will be possible to quickly locate and list all the changes the user + currently has in progress. +- It can be used as part of other high-level commands that combine or split + changes. +- It can be used to decorate commits (in git log, gitk, etc) that are either + obsolete or are the tip of a work in progress. +- By pushing and pulling the obsolescence graph, users can collaborate more + easily on changes-in-progress. This is better than pushing and pulling the + changes themselves since the obsolescence graph can be used to locate a more + specific merge base, allowing for better merges between different versions of + the same change. +- It could be used to correctly rebase local changes and other local branches + after running git-filter-branch. +- It can replace the change-id footer used by gerrit. + +Similar technologies + +There are some other technologies that address the same end-user problem. + +Rebase -i can be used to solve the same problem, but users can't easily switch +tasks midway through an interactive rebase or have more than one interactive +rebase going on at the same time. It can't handle the case where you have +multiple changes sharing the same parent when that parent needs to be rebased +and won't let you collaborate with others on resolving a complicated interactive +rebase. You can think of rebase -i as a top-down approach and the evolve command +as the bottom-up approach to the same problem. + +Several patch queue managers have been built on top of git (such as topgit, +stgit, and quilt). They address the same user need. However they also rely on +state managed outside git that needs to be kept in sync. Such state can be +easily damaged when running a git native command that is unaware of the patch +queue. They also typically require an explicit initialization step to be done by +the user which creates workflow problems. + +Replacements (refs/replace) are superficially similar to obsolescences in that +they describe that one commit should be replaced by another. However, they +differ in both how they are created and how they are intended to be used. +Obsolescences are created automatically by the commands a user runs, and they +describe the user’s intent to perform a future rebase. Obsolete commits still +appear in branches, logs, etc like normal commits (possibly with an extra +decoration that marks them as obsolete). Replacements are typically created +explicitly by the user, they are meant to be kept around for a long time, and +they describe a replacement to be applied at read-time rather than as the input +to a future operation. When a replaced commit is queried, it is typically hidden +and swapped out with its replacement as though the replacement has already +occurred. + +Goals
Re: [PATCH] technical doc: add a design doc for the evolve command
On Tue, Nov 20 2018, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: >> On Thu, Nov 15 2018, sxe...@google.com wrote: > >>> +Parent-type >>> +--- >>> +The “parent-type” field in the commit header identifies a commit as a >>> +meta-commit and indicates the meaning for each of its parents. It is never >>> +present for normal commits. > [...] >> I think it's worth pointing out for those that are rusty on commit >> object details (but I checked) is that the reason for it not being: >> >> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 >> parent aa7ce55545bf2c14bef48db91af1a74e2347539a >> parent-type content >> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 >> parent-type obsolete >> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 >> parent-type origin >> author Stefan Xenos 1540841596 -0700 >> committer Stefan Xenos 1540841596 -0700 >> >> Which would be easier to read, is that we're very sensitive to the order >> of the first few fields (tree -> parent -> author -> committer) and fsck >> will error out if we interjected a new field. > > By the way, in the spirit of limiting the initial scope, I wonder > whether the parent-type fields can be stored in the commit message > initially. > > Elsewhere in this thread it was mentioned that the parent-type is a > field to allow tools like "git fsck" to understand the meaning of > these parent relationships (for example, to forbid a commit > referencing a meta-commit). The same could be done using special > commit message text, though. > > The advantage of such an approach would be that we could experiment > without changing the official object format at all. If experiments > revealed a different set of information to store, we could update the > format without having to maintain the memory of the older format in > "git fsck"'s understanding of commit object fields. So even though I > think that in the end we would want to put this information in the > commit object header, I'm tempted to suspect that the benefits of > putting it in the commit message to start outweigh the costs (in > particular, of having to migrate to another format later). I think it sounds better to just make it, in the header: x-evolve-pt content x-evolve-pt obsolete x-evolve-pt origin Where "pt = parent-type", we could of course spell that out too, but in this case it's "x-evolve-pt" is the exact same number of bytes as "parent-type", so nobody can object that it takes more space:) We'd then carry some documentation where we say everything except "x-*-" is reserved, and that we'd like to know about new "*" there before it's used, so it can be documented. Putting it in the commit message just sounds like a hack around not having namespaced headers. If we'd like to keep this then tools would need to parse both (potentially unpacking a lot of the commit message object, it can be quite big in some cases...).
Re: [PATCH] technical doc: add a design doc for the evolve command
Ævar Arnfjörð Bjarmason wrote: > On Thu, Nov 15 2018, sxe...@google.com wrote: >> +Parent-type >> +--- >> +The “parent-type” field in the commit header identifies a commit as a >> +meta-commit and indicates the meaning for each of its parents. It is never >> +present for normal commits. [...] > I think it's worth pointing out for those that are rusty on commit > object details (but I checked) is that the reason for it not being: > > tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > parent aa7ce55545bf2c14bef48db91af1a74e2347539a > parent-type content > parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > parent-type obsolete > parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > parent-type origin > author Stefan Xenos 1540841596 -0700 > committer Stefan Xenos 1540841596 -0700 > > Which would be easier to read, is that we're very sensitive to the order > of the first few fields (tree -> parent -> author -> committer) and fsck > will error out if we interjected a new field. By the way, in the spirit of limiting the initial scope, I wonder whether the parent-type fields can be stored in the commit message initially. Elsewhere in this thread it was mentioned that the parent-type is a field to allow tools like "git fsck" to understand the meaning of these parent relationships (for example, to forbid a commit referencing a meta-commit). The same could be done using special commit message text, though. The advantage of such an approach would be that we could experiment without changing the official object format at all. If experiments revealed a different set of information to store, we could update the format without having to maintain the memory of the older format in "git fsck"'s understanding of commit object fields. So even though I think that in the end we would want to put this information in the commit object header, I'm tempted to suspect that the benefits of putting it in the commit message to start outweigh the costs (in particular, of having to migrate to another format later). Thanks, Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
Hi, Stefan Xenos wrote: > But since several comments have focused on the commands, let's brainstorm! > > Here's some options that occur to me: > > 1. Three commands: evolve + change + obslog as top-level commands (the > current proposal). Change gets a bunch of subcommands for manipulating > the change graph and metas/ namespace. > > 2. All top-level: evolve + lschange + mkchange + rmchange + > prunechange + obslog. None of the commands get subcommands. > > 3. Everything under change: "change evolve", "change obslog" become > new change subcommands. > > 4. Evolve as a rebase argument, obslog as a log argument. Use "rebase > --evolve" to initiate evolve and use "log --obslog" to initiate > obslog. The change command stays as it is in the proposal. > > 5. Two commands: evolve + change. obslog becomes a "log" argument. > > Note that there will be more "evolve"-specific arguments in the > future. For most transformations that evolve uses, there will be a > matching argument to enable or disable that transformation and as we > add transformations we'll also add arguments. > > If we go with option 3, it would make for a very cluttered help page. > For example, if you're looking for information on how to use evolve, > you'd have to scroll past a bunch of formatting information that are > only relevant to obslog... and if you're looking for the formatting > options, you'd have to scroll past a bunch of > transformation-enablement options that are only relevant to evolve. > > Based on your log feedback above, I'm thinking #5 may make sense. (5) sounds good to me, too. Thanks, both, for your thoughtfulness. Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos writes: >> But it is not immediately obvious to me how it would help to have >> "Z was cherry-picked from W" in "evolve". > > The evolve command would use it for handling the > obsolescence-over-cherry-pick (OOCP) feature. If someone cherry-picks > a commit and then amends the original, the evolve command would give > you the option of applying the same amendment to the cherry-picked > version. Yeah, I missed that case when I was formulating my thought on how we can start smaller and simpler to get the ball rolling. And for "this commit and anything built on top of it need to be adjusted since that other commit, which this commit was made by cherry-picking it, has been obsoleted" to work, the "origin" commit pointed at by the meta commit must be made available. > Are you claiming that this is undesirable, or are you claiming that > this could be accomplished without origin parents? I was trying to see if this is something we can leave out to limit the initial scope.
Re: [PATCH] technical doc: add a design doc for the evolve command
> Subcommand names and --long-options are just as descriptive. Yeah, I'm convinced about the descriptiveness. If you check the latest version of the patch, I've already updated the "change" command to use subcommands rather than lettered arguments. > If a user wants to deal with reflogs, then there is 'git help reflog' I guess it depends on whether you prefer having a single big help page (risk: user may see irrelevant content), or a number of small help pages (risk: user may need to follow cross-references). My guess is that we should probably try to hit the sweet spot that minimizes the amount of irrelevant information on a help page, the probability of needing to follow a cross-reference to understand context, and the amount of content that needs to be duplicated between pages. But assuming we add a bunch of formatting options to obslog that match log, it may make sense to just have an "--obslog" argument to log. > I think 'git obslog' should allow the same when showing the log of a change. Sounds good. We should probably also change the default formatting for the obslog command to be some sort of description of what changed since the commit message will probably be very similar for every entry. I'll update the proposal to mention formatting options once we sort out where obslog will actually live. > By adding several new commands users will have to consult the manpages of > 'evolve', > 'change', 'obslog', etc., even though the commands and the concepts are > closely related. I'm not sure that's the case. There is some common background material you'd need to understand in order to use any of those commands ("what are changes?"), but the same could be said of pretty much any git command ("what are commits?"). Assuming the user knows what a change is, I'm pretty sure I could write a self-contained description for evolve, change, or obslog that doesn't require cross-referencing any of the other commands... and the evolve command could probably be understood even without understanding changes. But since several comments have focused on the commands, let's brainstorm! Here's some options that occur to me: 1. Three commands: evolve + change + obslog as top-level commands (the current proposal). Change gets a bunch of subcommands for manipulating the change graph and metas/ namespace. 2. All top-level: evolve + lschange + mkchange + rmchange + prunechange + obslog. None of the commands get subcommands. 3. Everything under change: "change evolve", "change obslog" become new change subcommands. 4. Evolve as a rebase argument, obslog as a log argument. Use "rebase --evolve" to initiate evolve and use "log --obslog" to initiate obslog. The change command stays as it is in the proposal. 5. Two commands: evolve + change. obslog becomes a "log" argument. Note that there will be more "evolve"-specific arguments in the future. For most transformations that evolve uses, there will be a matching argument to enable or disable that transformation and as we add transformations we'll also add arguments. If we go with option 3, it would make for a very cluttered help page. For example, if you're looking for information on how to use evolve, you'd have to scroll past a bunch of formatting information that are only relevant to obslog... and if you're looking for the formatting options, you'd have to scroll past a bunch of transformation-enablement options that are only relevant to evolve. Based on your log feedback above, I'm thinking #5 may make sense. - Stefan On Mon, Nov 19, 2018 at 7:55 AM SZEDER Gábor wrote: > > On Sat, Nov 17, 2018 at 12:30:58PM -0800, Stefan Xenos wrote: > > > Further, I see that this document tries to suggest a proliferation of new > > > commands > > > > It does. Let me explain a bit about the reasoning behind this > > breakdown of commands. My main priority was to keep the commands as > > consistent with existing git commands as possible. Secondary goals > > were: > > - Mapping a single intent to a single command where possible makes it > > easier to explain what that command does. > > - Having lots of simpler commands as opposed to a few complex commands > > makes them easier to type. > > - Command names are more descriptive than lettered arguments. > > Subcommand names and --long-options are just as descriptive. > > > > Git already has a "log" and "reflog" command for displaying two > > different types of log, > > No, there is 'git log' for displaying logs in various ways, and there > is 'git reflog' which not only displays reflogs, but also operates on > them, e.g. deletes specific reflog entries or expires old entries, > necessitating and justifying the dedicated 'git reflog' command. > > > so putting "obslog" on its own command makes > > it consistent with the existing logs, easier to type, and keeps the > > command simple. > > > - We could turn "obslog" into an extra option on the "log" command, > > but that would be inconsistent with reflog and would complicate the > >
Re: [PATCH] technical doc: add a design doc for the evolve command
Hi, Xenos wrote: > Lets explore the "when" question. I think there's a compelling reason > to add them as soon as possible - namely, gerrit. If and when we come > to some sort of agreement on this proposal, gerrit could start adding > tooling to understand change graphs as an alternative to change-id > footers. That work could proceed in parallel with the work in git-core > once we know what the data structures look like, but it can't start > until the data structures are sufficient to address all the use cases > that were previously covered by change-id. At the moment, meta-commits > without origin parents would not cover all of gerrit's use-cases so > this would block adoption in gerrit. By this, are you referring to the "Cherry-picks" list in the Gerrit web UI? Thanks, Jonathan
Re: [PATCH] technical doc: add a design doc for the evolve command
> But it is not immediately obvious to me how it would help to have "Z was > cherry-picked from W" in "evolve". The evolve command would use it for handling the obsolescence-over-cherry-pick (OOCP) feature. If someone cherry-picks a commit and then amends the original, the evolve command would give you the option of applying the same amendment to the cherry-picked version. Are you claiming that this is undesirable, or are you claiming that this could be accomplished without origin parents? > the developer wanted to use the change between W^ and W in a context that is > quite different from I guess that depends on the reason for doing the cherry-pick. A very common scenario I see for cherry-picks is cherry-picking a bugfix from a development branch to a maintenance branch. In that situation, if there was a better version of the original bugfix you'd also want to update the cherry-pick on the maintenance branch to use the better version of the fix. That's what OOCP does. > make no sense to "evolve" anything that was built on top of W on top of Z. Agreed. But that's not what evolve would do with the origin edges. It would be looking for amendments of W, not children of W. > It is of course OK to build a different feature that can take advantage of > the cherry-pick information on top of the same meta commit concept in later > steps All valid points - we could build a useful "evolve" command without origin edges (and without OOCP), we could easily add origin parents later to a design that just supported obsolete and content parents, and the decision about /when/ to add origin parents is orthogonal to the decision about /if/ to add them. Lets explore the "when" question. I think there's a compelling reason to add them as soon as possible - namely, gerrit. If and when we come to some sort of agreement on this proposal, gerrit could start adding tooling to understand change graphs as an alternative to change-id footers. That work could proceed in parallel with the work in git-core once we know what the data structures look like, but it can't start until the data structures are sufficient to address all the use cases that were previously covered by change-id. At the moment, meta-commits without origin parents would not cover all of gerrit's use-cases so this would block adoption in gerrit. - Stefan On Sun, Nov 18, 2018 at 8:15 PM Junio C Hamano wrote: > > Stefan Xenos writes: > > > The scenario you describe would not produce an origin edge in the > > metacommit graph. If the user amended X, there would be no origin > > edges - just a replacement. If you cherry-picked Z you'd get no > > replacements and just an origin. In neither case would you get both > > types of parent. > > OK, that makes things a lot simpler. > > I can see why we want to record "commit X obsoletes commit Y" to > help the "evolve" feature, which was the original motivation this > started the whole discussion. But it is not immediately obvious to > me how it would help to have "Z was cherry-picked from W" in > "evolve". > > The whole point of cherry-picking an old commit W to produce a new > commit Z is because the developer wanted to use the change between > W^ and W in a context that is quite different from W^, so it would > make no sense to "evolve" anything that was built on top of W on top > of Z. > > It is of course OK to build a different feature that can take > advantage of the cherry-pick information on top of the same meta > commit concept in later steps, and to ensure that is doable, the > initial meta commit design must be done in a way that is flexible > enough to be extended, but it is not clear to me if this "origin" > thing is "while this does not have much to do with 'evolve', let's > throw in fields that would help another feature while we are at it" > or "in addition to 'X obsoletes Y', we need the cherry-pick > information for 'evolve' feature because..." (and because it is not > clear, I am assuming that it is the former). If we can design the > "evolve" thing with only the "contents" and "obsoletes", that would > allow us to limit the scope of discussion we need to have around > meta commit and have something that works earlier, wouldn't it? > > Thanks.
Re: [PATCH] technical doc: add a design doc for the evolve command
On Sat, Nov 17, 2018 at 12:30:58PM -0800, Stefan Xenos wrote: > > Further, I see that this document tries to suggest a proliferation of new > > commands > > It does. Let me explain a bit about the reasoning behind this > breakdown of commands. My main priority was to keep the commands as > consistent with existing git commands as possible. Secondary goals > were: > - Mapping a single intent to a single command where possible makes it > easier to explain what that command does. > - Having lots of simpler commands as opposed to a few complex commands > makes them easier to type. > - Command names are more descriptive than lettered arguments. Subcommand names and --long-options are just as descriptive. > Git already has a "log" and "reflog" command for displaying two > different types of log, No, there is 'git log' for displaying logs in various ways, and there is 'git reflog' which not only displays reflogs, but also operates on them, e.g. deletes specific reflog entries or expires old entries, necessitating and justifying the dedicated 'git reflog' command. > so putting "obslog" on its own command makes > it consistent with the existing logs, easier to type, and keeps the > command simple. > - We could turn "obslog" into an extra option on the "log" command, > but that would be inconsistent with reflog and would complicate the > already-complex log command. On one hand, it's unclear to me what additional operations the proposed 'git obslog' command will perform besides showing the log of a change. If there are no such operations, then it can't really be compared to 'git reflog' to justify a dedicated 'git obslog' command. OTOH, note that 'git log' already has a '--walk-reflogs' option, and indeed 'git reflog [show]' is implemented via the common log machinery. And this is not a mere implementation detail, because "git reflog show accepts any of the options accepted by git log" (quoting its manpage), making it possible to filter, limit and format reflog entries, e.g.: git reflog --format='%h %cd %s' --author=szeder -5 branch file I think 'git obslog' should allow the same when showing the log of a change. > Personally, I don't > consider a proliferation of new commands to be inherently bad (or > inherently good, really). Is there a reason new commands should be > avoided? If a user wants to deal with reflogs, then there is 'git help reflog' which in one manpage describes the concept, and how to list and expire reflogs and delete individual entries from a reflog using the various subcommands. If a user wants to deal with stashes, then there is 'git help stash', which in one manpage describes the concept, and how to create, list, show, apply, delete, etc. stashes using the various subcommands. See where this is going? The same applies to bisect, notes, remotes, rerere, submodules, worktree; maybe there are more. This is a Good Thing. By adding several new commands users will have to consult the manpages of 'evolve', 'change', 'obslog', etc., even though the commands and the concepts are closely related.
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos writes: > The scenario you describe would not produce an origin edge in the > metacommit graph. If the user amended X, there would be no origin > edges - just a replacement. If you cherry-picked Z you'd get no > replacements and just an origin. In neither case would you get both > types of parent. OK, that makes things a lot simpler. I can see why we want to record "commit X obsoletes commit Y" to help the "evolve" feature, which was the original motivation this started the whole discussion. But it is not immediately obvious to me how it would help to have "Z was cherry-picked from W" in "evolve". The whole point of cherry-picking an old commit W to produce a new commit Z is because the developer wanted to use the change between W^ and W in a context that is quite different from W^, so it would make no sense to "evolve" anything that was built on top of W on top of Z. It is of course OK to build a different feature that can take advantage of the cherry-pick information on top of the same meta commit concept in later steps, and to ensure that is doable, the initial meta commit design must be done in a way that is flexible enough to be extended, but it is not clear to me if this "origin" thing is "while this does not have much to do with 'evolve', let's throw in fields that would help another feature while we are at it" or "in addition to 'X obsoletes Y', we need the cherry-pick information for 'evolve' feature because..." (and because it is not clear, I am assuming that it is the former). If we can design the "evolve" thing with only the "contents" and "obsoletes", that would allow us to limit the scope of discussion we need to have around meta commit and have something that works earlier, wouldn't it? Thanks.
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos writes: >> I meant the project's history, not the meta-graph thing. > > In that case, we agree. The proposal suggests that "origin" should be > reachable from the meta-graph for the cherry-picked commit, NOT the > cherry-picked commit itself. Does that resolve our disagreement, or is > reachability from the meta-graph also undesirable for you? Sorry, I confused myself. Yes, I do mind that the "origin" thing in the meta history to pin the old commit whose contents were cherry picked to create a new commit, which is separate from the old commit that was rewritten to create a new commit. The latter (i.e. the old one) I do not mind to get retrieved when such a meta commit is fetched, and all of us of course would want the new one, too (which is the whole point of adding the meta commit to help other commits built on the old one migrate to the new one). But I simply do not see the point of having to drag the history leading to "origin", and that is why I am moderately against recording "the change in this came from that commit via cherry-pick" in a meta commit.
Re: [PATCH] technical doc: add a design doc for the evolve command
> I meant the project's history, not the meta-graph thing. In that case, we agree. The proposal suggests that "origin" should be reachable from the meta-graph for the cherry-picked commit, NOT the cherry-picked commit itself. Does that resolve our disagreement, or is reachability from the meta-graph also undesirable for you? > By having a "this was cherry-picked from that commit" in a commit > that is not GC'ed, the original commit that has no longer have any > relevance (because the newer one that is the result of the > cherry-pick is the surviving version people will be building on) is > kept reachable. It is very much delierate that "cherry-pick -x" > does not make the "origin" reachable and merely notes it in the > human readable form that is ignored by the reachablity machinery. Hmm. It sounds like you may be arguing against reachability from the cherry-picked commit (which we agree on). I'm arguing for reachability ONLY from the meta-graph. From your reply it's not completely clear to me whether you also disapprove of reachability from the meta-graph or if you thought the origin edges would be present on the cherry-picked commit itself. Could you clarify? I suspect it may be the latter, which suggests ambiguity in the proposal. If you could point to the text that gave the impression origin parents would be present in the cherry-picked commits themselves, I'll fix it. > This is where we differ. If commit X was rewritten (perhaps with > help from change cherry-picked from commit Z, or without any) to > produce Y, I do agree that it would be logical to keep X around > until every dependent commit on it are migrated to be on top of Y. The scenario you describe would not produce an origin edge in the metacommit graph. If the user amended X, there would be no origin edges - just a replacement. If you cherry-picked Z you'd get no replacements and just an origin. In neither case would you get both types of parent. I'd suggest we focus on the cherry-pick scenario since it's the simplest real-world use case that produces origin parents. All the more complex scenarios involving both parent types only occur if you start from that simple case, so if you convince me that the origin-only use case is unnecessary or undesirable, it would also follow that the more complex origin-plus-obsolete-parent use case is unnecessary. So, if you don't mind - let me simplify that use-case: "If commit Z is cherry-picked to produce Y, is there any need to keep Z around?". I don't think we need X in the example to answer that question. > But we do not need Z to transplant what used to be on X on top of Y, > do we? That's correct. The origin parent would be used to incorporate amended versions of Z into Y, not to transplant things. It would also be used to locate ancestors when merging code based on Z with code based on Y. > So I do agree that in such a situation they want the > relevant parts of the history retained, but I do not agree that > "origin" is among them. You may be entirely right, but at this point I'm not certain whether we're disagreeing or miscommunicating. :-(
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos writes: >> And the other half is that while I consider the "origin" thing is >> unnecessary for the above reasons, having it means we need to not >> just transfer the history reading to aa7ce555 and d664309ee (which >> are necessary anyway while we have histories to transplant from >> d664309ee to aa7ce555) but also have to pull in the history leading >> to 7e1bbcd and we cannot discard it. > > I'll assume that by "history" you're referring to the change graph > (the metacommits) and not the branches (the commits), which would have > no origin edges or connection between replacements. I meant the project's history, not the meta-graph thing. By having a "this was cherry-picked from that commit" in a commit that is not GC'ed, the original commit that has no longer have any relevance (because the newer one that is the result of the cherry-pick is the surviving version people will be building on) is kept reachable. It is very much delierate that "cherry-pick -x" does not make the "origin" reachable and merely notes it in the human readable form that is ignored by the reachablity machinery. > If the user has kept a change around in their metas namespace, it's an > indication that they (or their collaborators) are still working on it > and want its history to be retained. This is where we differ. If commit X was rewritten (perhaps with help from change cherry-picked from commit Z, or without any) to produce Y, I do agree that it would be logical to keep X around until every dependent commit on it are migrated to be on top of Y. But we do not need Z to transplant what used to be on X on top of Y, do we? So I do agree that in such a situation they want the relevant parts of the history retained, but I do not agree that "origin" is among them. Side note. As long as we have commits yet to be migrated to be on Y that still is on X, ew do not need the meta-commit to be protecting from getting GC'ed, as X is reachable from these "need to be updated" branch tips anyway. What we gain from extra reachability brought in by the meta commits is that by fetching the "change", we get Y (and its anestors), even if we are not following any branch that contains it, so that we can migrate those that are still based on X to it.
Re: [PATCH] technical doc: add a design doc for the evolve command
> Am I correct to understand that the reason why a commit object is > (ab|re)used to represent a meta-commit is because by doing so we > would get connectivity (i.e. fetching & pushing would transfer all > the associated objects along) for free, and by not representing it > as a new and different object type, existing implementations can > just pass them along without understanding what they are, and as > long as these are not mixed as parts of the main history of the > project (e.g. when enumerating commits that has aa7ce5 as its > parents, because somebody else obsoleted aa7ce5 and you want to > evolve anything that built on it, you do not want to mistake the > above "meta" commit as a commit that is part of the ordinary history > and rebuild on top of the new version of aa7ce5, which would lead to > a disaster), everything would work just fine? Yes, sir. That's it exactly. My first draft of the proposal suggested creating a new top-level object type, but when I started digging through the code I realized that the new object was so similar to a commit that there was no need for a new type. > Perhaps you'd use something like "presence of parent-type header > marks that a commit is a meta-commit and not part of the main > history". Yes, that's called out explicitly as part of the proposal (see the first sentence in the Parent-type subsection). Fsck would enforce this invariant. > How are these meta commits anchored so that it won't be reclaimed by > repack? They would either be anchored by a ref in the metas/ namespace (for active changes currently under consideration by evolve) or by the reflog (for recently deleted changes). > I do not see any "parent" field used to chain them together, They point to one another using the usual "parent" field present in all commit objects. For an example of what the raw struct would look like with parent pointers, see the top of the "Detailed design" section or search the doc for the string . For examples of how the metacommits in a change graph would be connected after various operations, see the "Commit" section and the "Merge" section. Please let me know if any of these examples are insufficiently explained or if there's any other examples you'd like to see. > but I do not think we can afford to spend one ref per meta > commit, as refs are not designed to point into each and every object > in the repository. Agreed. This is actually one of the reasons I'm proposing the use of chains of meta-commits as opposed to using a purely ref-based approach. I describe several other ref-based approaches in the "Other options considered" section, and I made essentially the same point there. We only create refs in the metas/ namespace to point to the head of each change, and the rest of the commits and metacommits used by the graph are reached via the parent pointers in the metacommits. > I have a moderately strong opposition against "origin" thing. If > aa7ce555 replaces d664309ee, in order for the tool to be able to > "evolve" other histories that build on top of d664309ee, it only > needs the history between aa7ce555 and d664309ee and it would not > matter how aa7ce555 was built relative to its parent. I see I haven't justified the "origin" thing well enough. I'll elaborate in the document, but here's the short version. The "origin" edges are needed to address several use-cases: 1. Gerrit needs to know about cherry picks. This is one of the lesser-known things that it uses the change-id footers for and if we want to be able to eliminate the gerrit change-id footers we need to record and communicate information about cherry-picks somehow. This is the main reason for the origin edges - the early drafts of this proposal didn't have them but it came up when I asked a kind Gerrit maintainer to whether the proposal would be sufficient to eliminate gerrit's change-ids. However there may be alternatives I didn't think of. If we were to omit the origin edges, can you suggest an alternative way for git to record the fact that one commit was cherry-picked from another and communicate this fact to gerrit? I see that I forgot to call out "replacing gerrit change-ids" as an explicit goal. I'll add that to the doc. 2. Obsolescence across cherry-picks. In your example, it *may* actually matter how aa7ce55 was constructed. One such scenario is what I'm calling obsolescence across cherry-picks. Let me describe the use-case for it: Alice creates commit A1. Bob cherry-picks A1 to another branch, producing B1. At this point, Bob has a metacommit saying that A1 is the origin of B1. Alice amends A1, producing A2. She shares this with Bob. At this point, Bob probably wants to amend B1 to include whatever bugfix Alice did in A2 since the thing he cherry-picked is now out of date. That's what the obsolescence across cherry-picks feature does. If bob runs evolve with this option enabled, the evolve command will produce B2 by amending B1 with whatever diff Alice did between A1 and
Re: [PATCH] technical doc: add a design doc for the evolve command
Stefan Xenos writes: >> I don't think this counts as a typical modification and is probably hard to >> detect automatically. > > Clever use of commands! (side: wouldn't it just be easier to just use > git commit --amend, though?) When an original commit is mostly an early part of a feature, mixed with a small but an urgent bugfix, it is not unusual to start your work from "reset HEAD^" (or "reset --soft HEAD^") and recreate a commit that has the main part of the change from the original, leaving the remainder in the working tree to be worked into another bugfix commit, most likely to be on a new branch forked from an earlier point in the history, i.e. git reset HEAD^ git add -p git commit -c @{1} git checkout -m -b a-small-bugfix-split-out master edit git commit -a I agree with both of you that we want to have a way to mark that the first commit we made by partially committing what was in the original came from the original one, and also that the second one has contents from the same original one. It is unclear, without human involvement, if we can mechanically infer that anything that used to be built on top of the original commit would want to be rebuilt on top of the first half of the split commit (i.e. the early part of the feature with the bugfix separated out) but not on the other half (i.e. the bugfix alone).
Re: [PATCH] technical doc: add a design doc for the evolve command
> This breaks the "git change" symmetry with "git branch", but after > responding to other messages regarding that command, I'm starting to > think that's not really a problem. Sorry, I appended that sentence to the wrong paragraph. It should have gone with the previous one that regarding the "git change" command. On Sun, Nov 18, 2018 at 2:27 PM Stefan Xenos wrote: > > > I don't think this counts as a typical modification and is probably hard to > > detect automatically. > > Clever use of commands! (side: wouldn't it just be easier to just use > git commit --amend, though?) > > Either way, I agree that there should be a way to manually create a > change graph or modify one into any possible shape. I've updated the > "change" command to do what you want - the new version will have > subcommands for creating arbitrary change graphs. > > > subject line will change over time and the original one may become > > irrelevant. > > There's a section on change naming further down the document. My > criteria for name selection was that good names should be unique, > short to type, and memorable to the user. Being relevant to the commit > wasn't actually a requirement for me except insofar as it helps make > them memorable. If we agree that these are reasonable criteria, commit > hashes wouldn't be as good a choice since they'd satisfy the > uniqueness criteria but would not be short or memorable. I expect that > whatever criteria we select probably won't be optimal for all users > which is why the design also includes a new hook for name selection. I > believe that selected words from the commit comment should cover all > three criteria in the majority of cases, and that the hook and the > "change rename" command should cover the remaining corner cases. This > breaks the "git change" symmetry with "git branch", but after > responding to other messages regarding that command, I'm starting to > think that's not really a problem. > > > How do we group changes of a topic together? I think branch-diff could take > > advantage of that. > > Could you clarify your use-case for me? I'm not sure what you mean by > "changes of a topic". Are you referring to gerrit topics here? Topic > branches? Or are you asking for some way for end-users to classify and > organize their unsubmitted changes? > > > Could we just organize it like a normal history? > > Basically all commits will be linked in a new merge history. > > From what I can tell, you're suggesting the following changes: > 1. Reorder the parents such that the content parent comes last rather > than first. > 2. Move parent-type from the structured portion of the header to the > unstructured portion of the commit message. > > I'm fine with 1 if that makes something easier. > > Regarding 2, I can see some good reasons to put parent-type in the > header rather than the user-readable portion of the commit message > - fsck can rely on them when checking the database for validity (for > example, it can assert that the current repository version doesn't > attach a non-empty tree, that the content parent always points to a > real commit, the commit message is empty, that the number of > parent-types matches the number of parents, that the enum values are > valid, that the parent orders are correct, etc.). > - accidental collisions are impossible (users can't accidentally > corrupt their database by adding or removing the word "parent-type" in > a commit message). > - it doesn't spam the user-readable region with machine-readable > repository internals. > > > This makes it possible to just use "git log --first-parent > > --patch" (or "git log --oneline --graph") to examine the change. > > The "git log --oneline --graph" thing should work fine with the > proposal as it currently is, but I'm not sure that the --first-parent > --patch thing would be very useful no matter how we order the parents. > The metacommits have empty trees and commit messages, so such a log > would just list the metacommit hashes and nothing else. That certainly > has some utility, but I'd guess it's probably not what you were going > for. Were you intending to suggest that the metacommit should also use > the same tree and commit message as its content commit? If so, we > briefly considered this option while preparing this proposal. That > would make some commands do approximately the right thing for free. > However, when we started working through the use-cases (for example, > checking out a metacommit) we found that all the ones we looked at > would still need special cases for metacommits and those special cases > wouldn't be much simpler than they'd be with an empty tree and > message. Admittedly, git log wasn't one of the use-cases we worked > through. > > - Stefan > > On Fri, Nov 16, 2018 at 10:07 PM Duy Nguyen wrote: > > > > On Thu, Nov 15, 2018 at 2:00 AM wrote: > > > +Goals > > > +- > > > +Legend: Goals marked with P0 are required. Goals marked with Pn should be > > > +attempted unless
Re: [PATCH] technical doc: add a design doc for the evolve command
> I don't think this counts as a typical modification and is probably hard to > detect automatically. Clever use of commands! (side: wouldn't it just be easier to just use git commit --amend, though?) Either way, I agree that there should be a way to manually create a change graph or modify one into any possible shape. I've updated the "change" command to do what you want - the new version will have subcommands for creating arbitrary change graphs. > subject line will change over time and the original one may become irrelevant. There's a section on change naming further down the document. My criteria for name selection was that good names should be unique, short to type, and memorable to the user. Being relevant to the commit wasn't actually a requirement for me except insofar as it helps make them memorable. If we agree that these are reasonable criteria, commit hashes wouldn't be as good a choice since they'd satisfy the uniqueness criteria but would not be short or memorable. I expect that whatever criteria we select probably won't be optimal for all users which is why the design also includes a new hook for name selection. I believe that selected words from the commit comment should cover all three criteria in the majority of cases, and that the hook and the "change rename" command should cover the remaining corner cases. This breaks the "git change" symmetry with "git branch", but after responding to other messages regarding that command, I'm starting to think that's not really a problem. > How do we group changes of a topic together? I think branch-diff could take > advantage of that. Could you clarify your use-case for me? I'm not sure what you mean by "changes of a topic". Are you referring to gerrit topics here? Topic branches? Or are you asking for some way for end-users to classify and organize their unsubmitted changes? > Could we just organize it like a normal history? > Basically all commits will be linked in a new merge history. >From what I can tell, you're suggesting the following changes: 1. Reorder the parents such that the content parent comes last rather than first. 2. Move parent-type from the structured portion of the header to the unstructured portion of the commit message. I'm fine with 1 if that makes something easier. Regarding 2, I can see some good reasons to put parent-type in the header rather than the user-readable portion of the commit message - fsck can rely on them when checking the database for validity (for example, it can assert that the current repository version doesn't attach a non-empty tree, that the content parent always points to a real commit, the commit message is empty, that the number of parent-types matches the number of parents, that the enum values are valid, that the parent orders are correct, etc.). - accidental collisions are impossible (users can't accidentally corrupt their database by adding or removing the word "parent-type" in a commit message). - it doesn't spam the user-readable region with machine-readable repository internals. > This makes it possible to just use "git log --first-parent > --patch" (or "git log --oneline --graph") to examine the change. The "git log --oneline --graph" thing should work fine with the proposal as it currently is, but I'm not sure that the --first-parent --patch thing would be very useful no matter how we order the parents. The metacommits have empty trees and commit messages, so such a log would just list the metacommit hashes and nothing else. That certainly has some utility, but I'd guess it's probably not what you were going for. Were you intending to suggest that the metacommit should also use the same tree and commit message as its content commit? If so, we briefly considered this option while preparing this proposal. That would make some commands do approximately the right thing for free. However, when we started working through the use-cases (for example, checking out a metacommit) we found that all the ones we looked at would still need special cases for metacommits and those special cases wouldn't be much simpler than they'd be with an empty tree and message. Admittedly, git log wasn't one of the use-cases we worked through. - Stefan On Fri, Nov 16, 2018 at 10:07 PM Duy Nguyen wrote: > > On Thu, Nov 15, 2018 at 2:00 AM wrote: > > +Goals > > +- > > +Legend: Goals marked with P0 are required. Goals marked with Pn should be > > +attempted unless they interfere with goals marked with Pn-1. > > + > > +P0. All commands that modify commits (such as the normal commit --amend or > > +rebase command) should mark the old commit as being obsolete and > > replaced by > > +the new one. No additional commands should be required to keep the > > +obsolescence graph up-to-date. > > I sometimes "modify" a commit by "git reset @^", pick up the changes > then "git commit -c @{1}". I don't think this counts as a typical > modification and is probably hard to detect automatically. But I
Re: [PATCH] technical doc: add a design doc for the evolve command
Resending this without HTML enabled... sorry if you receive it twice. > The file name and the title are in a mismatch. Good point. However, the focus of this proposal really is supposed to be on the underlying data structure, not just the evolve command (which is the driving use-case for the graph but not the only important one). I think I'll fix the mismatch by renaming both the title and document to "change graph" if that seems acceptable. I'll also expand the "objective" paragraph to mention the evolve command. > Perhaps"three sequential patches"? I've added a quick informal definition of the word "change", along with a cross-reference to the precise definition later in the document. > These two paragraphs could be moved lower, under a "Semi-Related Work" Good point. I'll keep the patch queue managers here since they really can be used to solve the same problem that evolve addresses, but I'll move replacements paragraph down to a new section on semi-related work. There was also a request to discuss git-imerge which I'll insert there. > Instead, I would try to use the term "patch" to describe a change to the > codebase I know you didn't finish the document but later on I define the term "change" to have essentially this meaning. I've moved the definition earlier in the document to make the earlier sections easier to understand. Given the choice of the word "patch" or "change" for this definition, I prefer to use "change" since gerrit already defines it in this way and the word "patch" already has a meaning in git (a file containing a diff). > Making a note so I come back to this. I hope to learn what you mean by this > "more specific merge base".) Lets say we have commits: P <- C Then two people amend C in different ways producing: P <- C P <- C1 P <- C2 ...then we try to resolve the divergence by merging C1 and C2. Without the change graph, the closest merge-base (ancestor) would be P. With the change graph, the closest merge base would be C. > If we GC'd commit A but still have the newer A', I can either thinkthat I'm not sure I followed that. Are you suggesting a change to the proposal or asking for a clarification? On Fri, Nov 16, 2018 at 1:36 PM Derrick Stolee wrote: > > On 11/14/2018 7:55 PM, sxe...@google.com wrote: > > From: Stefan Xenos > > > > This document describes what an obsolescence graph for > > git would look like, the behavior of the evolve command, > > and the changes planned for other commands. > > Thanks for putting this together! > > > diff --git a/Documentation/technical/evolve.txt > > b/Documentation/technical/evolve.txt > ... > > +Git Obsolescence Graph > > +== > > + > > +Objective > > +- > > +Track the edits to a commit over time in an obsolescence graph. > > The file name and the title are in a mismatch. > > I'd prefer if the title was "Git Evolve Design Document" and this > opening paragraph > was about the reasons we want a 'git evolve' command. Here is my attempt: > >The proposed 'git evolve' command will help users craft a > high-quality commit >history in their topic branches. By working to improve commits one at > a time, >then running 'git evolve', users can rewrite recent history with more > options >than interactive rebase. The core benefit is that users can pause > their progress >and move to other branches before returning to where they left off. > Users can >also share progress with others using standard 'push', 'fetch', and > 'format-patch' >commands. > > > +Background > > +-- > > Perhaps you can call this "Example"? > > > +Imagine you have three dependent changes up for review and you receive > > feedback > > +that requires editing all three changes. While you're editing one, more > > feedback > > +arrives on one of the others. What do you do? > > "three dependent changes" sounds a bit vague enough to me to possibly > confuse readers. Perhaps > "three sequential patches"? > > > +- Users can view the history of a commit directly (the sequence of amends > > and > > + rebases it has undergone, orthogonal to the history of the branch it is > > on). > > "the history of a commit" doesn't semantically work, as a commit is an > immutable Git object. > > Instead, I would try to use the term "patch" to describe a change to the > codebase, and that > takes the form as a list of commits that are improving on each other > (but don't actually > have each other in their commit history). This means that the lifetime > of a patch is described > by the commits that are amended or rebased. > > > +- By pushing and pulling the obsolescence graph, users can collaborate more > > + easily on changes-in-progress. This is better than pushing and pulling > > the > > + changes themselves since the obsolescence graph can be used to locate a > > more > > + specific merge base, allowing for better merges between different > > versions of > > + the same change. > > (Making a note so I come back to
Re: [PATCH] technical doc: add a design doc for the evolve command
> I am not sure that we necessarily need this to be a graph. I think part > of the problems with not being able to GC *any* of this is by this > requirement to have it stored in a graph, rather than having mappings from > which you could reconstruct any non-GC'ed parts of that graph, if you > really want. Sorry, I'm not sure what GC problem you're alluding to here. As far as I'm aware, this proposal should permit us to GC or retain any subset of commits that we want. We create a chain of metacommits pointing to the commits we want to retain, and put a ref in the metas namespace to cause the chain itself to be retained. If we want to GC a different subset, we can build a different chain of metacommits and move the ref (or delete the ref entirely to permit the whole chain to be gc'd). Could you be more specific about which use-case is problematic? > Why is this missing most notably `hg evolve`? Good point. I'll add a brief description and comparison to the doc. > Also, please do not forget `git imerge`. Thanks for directing me to this. It looks fantastic! I'm not sure it's really an alternative to this work, but I could see adding an argument to "git evolve" that allows you to use imerge for resolving merge conflicts at any given step. > Further, I see that this document tries to suggest a proliferation of new > commands It does. Let me explain a bit about the reasoning behind this breakdown of commands. My main priority was to keep the commands as consistent with existing git commands as possible. Secondary goals were: - Mapping a single intent to a single command where possible makes it easier to explain what that command does. - Having lots of simpler commands as opposed to a few complex commands makes them easier to type. - Command names are more descriptive than lettered arguments. Git already has a "log" and "reflog" command for displaying two different types of log, so putting "obslog" on its own command makes it consistent with the existing logs, easier to type, and keeps the command simple. The "evolve" command updates changes to give them up-to-date parents. This is a new type of user intent that didn't exist previously in git, so putting it on its own command keeps things simpler for users. The relationship between the evolve and change commands is a lot like the the relationship between the rebase command and the branch commands. They could technically be combined into one command but I'm not sure this would help with usability. The "change" command combines many user intents (create a change, rename a change, delete a change, etc.) If I were to design it from scratch, I'd prefer to have all of these things on separate commands. However, since changes are very similar to branches and users are presumably already familiar with the branch command, I intentionally made the change command as close as possible to the branch command - using the same arguments for the same purpose. In this case, I sacrificed the single-intent and simple commands goals in order to retain consistency. Anyway, that was my reasoning behind the selection of commands. Of course, I'd welcome feedback - a good UX is the one that was built by listening to feedback from its intended users. Personally, I don't consider a proliferation of new commands to be inherently bad (or inherently good, really). Is there a reason new commands should be avoided? Some other alternatives to consider: - We could turn "obslog" into an extra option on the "log" command, but that would be inconsistent with reflog and would complicate the already-complex log command. - If we were to combine "evolve" with another command, "git rebase --evolve" would probably be the best candidate. However, this is longer to type and I tend to prefer lots of simple commands over a few complex ones. Also, the evolve command will get additional options in the future (to enable stuff like amend-over-cherry-pick, various automatic resolution strategies for divergence, etc.)... and putting it on rebase would mean we'd end up with a lot of extra arguments whose doc says "this argument is only used if you're also using --evolve". - We could break the "change" command into a bunch of simpler ones "lschange", "mkchange", "rmchange", "mvchange", etc. I actually like this a lot, but this would make it diverge from the "branch" command so I'm not sure we should do it unless enough of us feel the same way. - We could combine the "change" command with the "branch" command. The branch command could look for the "metas" prefix to determine whether its argument is a branch or a change -- or it could just search one namespace followed by the other. This would make for fewer commands, but I'm concerned it may create confusion by making changes resemble branches too closely. If you're not already familiar with the distinction, you may see unexpected behavior when the "branch" you think you're manipulating turns out to be a change. - Stefan On Thu, Nov 15, 2018 at 4:52
Re: [PATCH] technical doc: add a design doc for the evolve command
sxe...@google.com writes: > +Detailed design > +=== > +Obsolescence information is stored as a graph of meta-commits. A meta-commit > is > +a specially-formatted merge commit that describes how one commit was created > +from others. > + > +Meta-commits look like this: > + > +$ git cat-file -p > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > +author Stefan Xenos 1540841596 -0700 > +committer Stefan Xenos 1540841596 -0700 > +parent-type content > +parent-type obsolete > +parent-type origin > + > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by > +cherry-picking commit 7e1bbcd3”. > + > +The tree for meta-commits is always the empty tree whose hash matches > +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git > may > +attach other trees here. For forward-compatibility fsck should ignore such > trees > +if found on future repository versions. Similarly, current versions of git > +should always fill in an empty commit comment and tools like fsck should > ignore > +the content of the commit comment if present in a future repository version. > +This will allow future versions of git to add metadata to the meta-commit > +comments or tree without breaking forwards compatibility. Am I correct to understand that the reason why a commit object is (ab|re)used to represent a meta-commit is because by doing so we would get connectivity (i.e. fetching & pushing would transfer all the associated objects along) for free, and by not representing it as a new and different object type, existing implementations can just pass them along without understanding what they are, and as long as these are not mixed as parts of the main history of the project (e.g. when enumerating commits that has aa7ce5 as its parents, because somebody else obsoleted aa7ce5 and you want to evolve anything that built on it, you do not want to mistake the above "meta" commit as a commit that is part of the ordinary history and rebuild on top of the new version of aa7ce5, which would lead to a disaster), everything would work just fine? Perhaps you'd use something like "presence of parent-type header marks that a commit is a meta-commit and not part of the main history". How are these meta commits anchored so that it won't be reclaimed by repack? I do not see any "parent" field used to chain them together, but I do not think we can afford to spend one ref per meta commit, as refs are not designed to point into each and every object in the repository. I have a moderately strong opposition against "origin" thing. If aa7ce555 replaces d664309ee, in order for the tool to be able to "evolve" other histories that build on top of d664309ee, it only needs the history between aa7ce555 and d664309ee and it would not matter how aa7ce555 was built relative to its parent. The user may have typed/developed it from scratch, the user may have borrowed 70% of its change from 7e1bbcd while remaining 30% was done from scratch, or it was a concatenation of the change made in 7e1bbcd and another commit. One half of my point being that we can do _without_ it, and in all cases, aa7ce555, if leaving the fact that it was derived from 7e1bbcd is so important, can mention that in its log message how it relates to the "origin" thing. And the other half is that while I consider the "origin" thing is unnecessary for the above reasons, having it means we need to not just transfer the history reading to aa7ce555 and d664309ee (which are necessary anyway while we have histories to transplant from d664309ee to aa7ce555) but also have to pull in the history leading to 7e1bbcd and we cannot discard it.
Re: [PATCH] technical doc: add a design doc for the evolve command
On Thu, Nov 15, 2018 at 2:00 AM wrote: > +Goals > +- > +Legend: Goals marked with P0 are required. Goals marked with Pn should be > +attempted unless they interfere with goals marked with Pn-1. > + > +P0. All commands that modify commits (such as the normal commit --amend or > +rebase command) should mark the old commit as being obsolete and > replaced by > +the new one. No additional commands should be required to keep the > +obsolescence graph up-to-date. I sometimes "modify" a commit by "git reset @^", pick up the changes then "git commit -c @{1}". I don't think this counts as a typical modification and is probably hard to detect automatically. But I hope there's some way for me to tell git "yes this is a modified commit of that one, record that!". > +Example usage > +- > +# First create three dependent changes > +$ echo foo>bar.txt && git add . > +$ git commit -m "This is a test" > +created change metas/this_is_a_test I guess as an example, how the name metas/this_is_a_test is constructed does not matter much. But it's probably better to stick with some sort of id because subject line will change over time and the original one may become irrelevant. Perhaps we could use the original commit id as name. > +$ echo foo2>bar2.txt && git add . > +$ git commit -m "This is also a test" > +created change metas/this_is_also_a_test > +$ echo foo3>bar3.txt && git add . > +$ git commit -m "More testing" > +created change metas/more_testing > + > +# List all our changes in progress > +$ git change -l > +metas/this_is_a_test > +metas/this_is_also_a_test > +* metas/more_testing > +metas/some_change_already_merged_upstream > + > +# Now modify the earliest change, using its stable name > +$ git reset --hard metas/this_is_a_test > +$ echo morefoo>>bar.txt && git add . && git commit --amend --no-edit > + > +# Use git-evolve to fix up any dependent changes > +$ git evolve > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test > +rebasing metas/more_testing onto metas/this_is_also_a_test > +Done > + > +# Use git-obslog to view the history of the this_is_a_test change > +$ git obslog > +93f110 metas/this_is_a_test@{0} commit (amend): This is a test > +930219 metas/this_is_a_test@{1} commit: This is a test > + > +# Now create an unrelated change > +$ git reset --hard origin/master > +$ echo newchange>unrelated.txt && git add . > +$ git commit -m "Unrelated change" > +created change metas/unrelated_change > + > +# Fetch the latest code from origin/master and use git-evolve > +# to rebase all dependent changes. > +$ git fetch origin master > +$ git evolve origin/master > +deleting metas/some_change_already_merged_upstream > +rebasing metas/this_is_a_test onto origin/master > +rebasing metas/this_is_also_a_test onto metas/this_is_a_test > +rebasing metas/more_testing onto metas/this_is_also_a_test > +rebasing metas/unrelated_change onto origin/master > +Conflict detected! Resolve it and then use git evolve --continue to resume. > + > +# Sort out the conflict > +$ git mergetool > +$ git evolve --continue > +Done > + > +# Share the full history of edits for the this_is_a_test change > +# with a review server > +$ git push origin metas/this_is_a_test:refs/for/master > +# Share the lastest commit for “Unrelated change”, without history > +$ git push origin HEAD:refs/for/master How do we group changes of a topic together? I think branch-diff could take advantage of that. > +Detailed design > +=== > +Obsolescence information is stored as a graph of meta-commits. A meta-commit > is > +a specially-formatted merge commit that describes how one commit was created > +from others. > + > +Meta-commits look like this: > + > +$ git cat-file -p > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > +author Stefan Xenos 1540841596 -0700 > +committer Stefan Xenos 1540841596 -0700 > +parent-type content > +parent-type obsolete > +parent-type origin > + > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by > +cherry-picking commit 7e1bbcd3”. This feels a bit forced. Could we just organize it like a normal history? Something like * |\ | * last version of the commit * |\ | * second last version of the commit * |\ Basically all commits will be linked in a new merge history. Real commits are on the second parent, first parent is to link changes together. This makes it possible to just use "git log --first-parent --patch" (or "git log --oneline --graph") to examine the change. More details (e.g. parent-type) could be stored as normal trailers in the commit message of these merges. -- Duy
Re: [PATCH] technical doc: add a design doc for the evolve command
On 11/14/2018 7:55 PM, sxe...@google.com wrote: From: Stefan Xenos This document describes what an obsolescence graph for git would look like, the behavior of the evolve command, and the changes planned for other commands. Thanks for putting this together! diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt ... +Git Obsolescence Graph +== + +Objective +- +Track the edits to a commit over time in an obsolescence graph. The file name and the title are in a mismatch. I'd prefer if the title was "Git Evolve Design Document" and this opening paragraph was about the reasons we want a 'git evolve' command. Here is my attempt: The proposed 'git evolve' command will help users craft a high-quality commit history in their topic branches. By working to improve commits one at a time, then running 'git evolve', users can rewrite recent history with more options than interactive rebase. The core benefit is that users can pause their progress and move to other branches before returning to where they left off. Users can also share progress with others using standard 'push', 'fetch', and 'format-patch' commands. +Background +-- Perhaps you can call this "Example"? +Imagine you have three dependent changes up for review and you receive feedback +that requires editing all three changes. While you're editing one, more feedback +arrives on one of the others. What do you do? "three dependent changes" sounds a bit vague enough to me to possibly confuse readers. Perhaps "three sequential patches"? +- Users can view the history of a commit directly (the sequence of amends and + rebases it has undergone, orthogonal to the history of the branch it is on). "the history of a commit" doesn't semantically work, as a commit is an immutable Git object. Instead, I would try to use the term "patch" to describe a change to the codebase, and that takes the form as a list of commits that are improving on each other (but don't actually have each other in their commit history). This means that the lifetime of a patch is described by the commits that are amended or rebased. +- By pushing and pulling the obsolescence graph, users can collaborate more + easily on changes-in-progress. This is better than pushing and pulling the + changes themselves since the obsolescence graph can be used to locate a more + specific merge base, allowing for better merges between different versions of + the same change. (Making a note so I come back to this. I hope to learn what you mean by this "more specific merge base".) + +Similar technologies + ... It can't handle the case where you have +multiple changes sharing the same parent when that parent needs to be rebased Perhaps this could be made more concrete by describing commit history and a specific workflow change using 'git evolve'. Suppose we have two topic branches, topic1 and topic2, that point to commits A and B, respectively.Suppose further that A and B have a common parent C with parent D. If we rebase topic1 relativeto D, then we create new commits C' and A' that are newer versions of commits C and A. It would benice to easily update topic2 to be on a new commit B' with parent C'. Currently, a user needs to knowthat C updated to C', and use 'git rebase --onto C' C topic2'. Instead, if we have a marker showing thatC' is an updated version of C, 'git log topic2' would show that topic2 can be updated, and the 'gitevolve' command would perform the correct action to make B' with parent C'. (This paragraph above is an example of "what can happen now is complicated and demands that the user keep some information in their memory" and "the new workflow is simpler and helps users make the right decision". I think we could use more of these at the start to sell the idea.) +and won't let you collaborate with others on resolving a complicated interactive +rebase. In the same sentence, we have an even more complicated workflow mentioned as an aside. This could be fleshed out more concretely. It could help describing that the current model is for usersto share "!fixup" commits and then one performs an interactive rebase to apply those fixups inthe correct order. If a user instead shares an amended commit, then we are in a difficult state toapply those changes. The new workflow would be to share amended commits and 'git evolve'inserts the correct amended commits in the right order. I'm a big proponent of the teaching philosophy of "examples first". It's easier to talk abstractlyafter going through some concrete examples. You can think of rebase -i as a top-down approach and the evolve command +as the bottom-up approach to the same problem. This comparison is important. Perhaps it is more specific to say "interactive rebase splits a plan torewrite history into independent units of work, while evolve collects independent
Re: [PATCH] technical doc: add a design doc for the evolve command
On Thu, Nov 15 2018, sxe...@google.com wrote: > +Detailed design > +=== > +Obsolescence information is stored as a graph of meta-commits. A meta-commit > is > +a specially-formatted merge commit that describes how one commit was created > +from others. > + > +Meta-commits look like this: > + > +$ git cat-file -p > +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 > +parent aa7ce55545bf2c14bef48db91af1a74e2347539a > +parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 > +parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 > +author Stefan Xenos 1540841596 -0700 > +committer Stefan Xenos 1540841596 -0700 > +parent-type content > +parent-type obsolete > +parent-type origin > + > +This says “commit aa7ce555 makes commit d64309ee obsolete. It was created by > +cherry-picking commit 7e1bbcd3”. > + > +The tree for meta-commits is always the empty tree whose hash matches > +4b825dc642cb6eb9a060e54bf8d69288fbee4904 exactly, but future versions of git > may > +attach other trees here. For forward-compatibility fsck should ignore such > trees > +if found on future repository versions. Similarly, current versions of git > +should always fill in an empty commit comment and tools like fsck should > ignore > +the content of the commit comment if present in a future repository version. > +This will allow future versions of git to add metadata to the meta-commit > +comments or tree without breaking forwards compatibility. > + > +Parent-type > +--- > +The “parent-type” field in the commit header identifies a commit as a > +meta-commit and indicates the meaning for each of its parents. It is never > +present for normal commits. It is a list of enum values whose order matches > the > +order of the parents. Possible parent types are: > + > +- content: the content parent identifies the commit that this meta-commit is > + describing. > +- obsolete: indicates that this parent is made obsolete by the content > parent. > +- origin: indicates that this parent was generated from the given commit. > + > +There must be exactly one content parent for each meta-commit and it is > always > +be the first parent. The content commit will always be a normal commit and > not a > +meta-commit. However, future versions of git may create meta-commits for > other > +meta-commits and the fsck tool must be aware of this for forwards > compatibility. > + > +A meta-commit can have zero or more obsolete parents. An amend operation > creates > +a single obsolete parent. A merge used to resolve divergence (see divergence, > +below) will create multiple obsolete parents. A meta-commit may have zero > +obsolete parents if it describes a cherry-pick or squash merge that copies > one > +or more commits but does not replace them. > + > +A meta-commit can have zero or more origin parents. A cherry-pick creates a > +single origin parent. Certain types of squash merge will create multiple > origin > +parents. > + > +An obsolete parent or origin parent may be either a normal commit (indicating > +the oldest-known version of a change) or another meta-commit (for a change > that > +has already been modified one or more times). I think it's worth pointing out for those that are rusty on commit object details (but I checked) is that the reason for it not being: tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 parent aa7ce55545bf2c14bef48db91af1a74e2347539a parent-type content parent d64309ee51d0af12723b6cb027fc9f195b15a5e9 parent-type obsolete parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136 parent-type origin author Stefan Xenos 1540841596 -0700 committer Stefan Xenos 1540841596 -0700 Which would be easier to read, is that we're very sensitive to the order of the first few fields (tree -> parent -> author -> committer) and fsck will error out if we interjected a new field.
Re: [PATCH] technical doc: add a design doc for the evolve command
Hi Stefan, On Wed, 14 Nov 2018, sxe...@google.com wrote: > From: Stefan Xenos > > This document describes what an obsolescence graph for > git would look like, the behavior of the evolve command, > and the changes planned for other commands. Thanks, this is a good discussion starter. > +Objective > +- > +Track the edits to a commit over time in an obsolescence graph. I am not sure that we necessarily need this to be a graph. I think part of the problems with not being able to GC *any* of this is by this requirement to have it stored in a graph, rather than having mappings from which you could reconstruct any non-GC'ed parts of that graph, if you really want. > +Background > +-- > +Imagine you have three dependent changes up for review and you receive > feedback > +that requires editing all three changes. While you're editing one, more > feedback > +arrives on one of the others. What do you do? > + > +The evolve command is a convenient way to work with chains of commits that > are > +under review. Whenever you rebase or amend a commit, the repository remembers > +that the old commit is obsolete and has been replaced by the new one. Then, > at > +some point in the future, you can run "git evolve" and the correct sequence > of > +rebases will occur in the correct order such that no commit has an obsolete > +parent. > + > +Part of making the "evolve" command work involves tracking the edits to a > commit > +over time, which is why we need an obsolescence graph. However, the > obsolescence > +graph will also bring other benefits: > + > +- Users can view the history of a commit directly (the sequence of amends and > + rebases it has undergone, orthogonal to the history of the branch it is > on). > +- It will be possible to quickly locate and list all the changes the user > + currently has in progress. > +- It can be used as part of other high-level commands that combine or split > + changes. > +- It can be used to decorate commits (in git log, gitk, etc) that are either > + obsolete or are the tip of a work in progress. > +- By pushing and pulling the obsolescence graph, users can collaborate more > + easily on changes-in-progress. This is better than pushing and pulling the > + changes themselves since the obsolescence graph can be used to locate a > more > + specific merge base, allowing for better merges between different versions > of > + the same change. > +- It could be used to correctly rebase local changes and other local branches > + after running git-filter-branch. > +- It can replace the change-id footer used by gerrit. Okay. > +Similar technologies > + > +There are some other technologies that address the same end-user problem. > + > +Rebase -i can be used to solve the same problem, but users can't easily > switch > +tasks midway through an interactive rebase or have more than one interactive > +rebase going on at the same time. It can't handle the case where you have > +multiple changes sharing the same parent when that parent needs to be rebased > +and won't let you collaborate with others on resolving a complicated > interactive > +rebase. You can think of rebase -i as a top-down approach and the evolve > command > +as the bottom-up approach to the same problem. > + > +Several patch queue managers have been built on top of git (such as topgit, > +stgit, and quilt). They address the same user need. However they also rely on > +state managed outside git that needs to be kept in sync. Such state can be > +easily damaged when running a git native command that is unaware of the patch > +queue. They also typically require an explicit initialization step to be > done by > +the user which creates workflow problems. > + > +Replacements (refs/replace) are superficially similar to obsolescences in > that > +they describe that one commit should be replaced by another. However, they > +differ in both how they are created and how they are intended to be used. > +Obsolescences are created automatically by the commands a user runs, and they > +describe the user’s intent to perform a future rebase. Obsolete commits still > +appear in branches, logs, etc like normal commits (possibly with an extra > +decoration that marks them as obsolete). Replacements are typically created > +explicitly by the user, they are meant to be kept around for a long time, and > +they describe a replacement to be applied at read-time rather than as the > input > +to a future operation. When a replaced commit is queried, it is typically > hidden > +and swapped out with its replacement as though the replacement has already > +occurred. Why is this missing most notably `hg evolve`? Also, there should be *at least* a brief introduction how `hg evolve` works. They do have the benefit of real-world testing, and probably encountered problems and came up with solutions, and we would be remiss if we did not learn from them. Also, please do not forget `git imerge`.
[PATCH] technical doc: add a design doc for the evolve command
From: Stefan Xenos This document describes what an obsolescence graph for git would look like, the behavior of the evolve command, and the changes planned for other commands. Signed-off-by: Stefan Xenos --- Documentation/technical/evolve.txt | 885 + 1 file changed, 885 insertions(+) create mode 100644 Documentation/technical/evolve.txt diff --git a/Documentation/technical/evolve.txt b/Documentation/technical/evolve.txt new file mode 100644 index 00..88470eada3 --- /dev/null +++ b/Documentation/technical/evolve.txt @@ -0,0 +1,885 @@ +Git Obsolescence Graph +== + +Objective +- +Track the edits to a commit over time in an obsolescence graph. + +Background +-- +Imagine you have three dependent changes up for review and you receive feedback +that requires editing all three changes. While you're editing one, more feedback +arrives on one of the others. What do you do? + +The evolve command is a convenient way to work with chains of commits that are +under review. Whenever you rebase or amend a commit, the repository remembers +that the old commit is obsolete and has been replaced by the new one. Then, at +some point in the future, you can run "git evolve" and the correct sequence of +rebases will occur in the correct order such that no commit has an obsolete +parent. + +Part of making the "evolve" command work involves tracking the edits to a commit +over time, which is why we need an obsolescence graph. However, the obsolescence +graph will also bring other benefits: + +- Users can view the history of a commit directly (the sequence of amends and + rebases it has undergone, orthogonal to the history of the branch it is on). +- It will be possible to quickly locate and list all the changes the user + currently has in progress. +- It can be used as part of other high-level commands that combine or split + changes. +- It can be used to decorate commits (in git log, gitk, etc) that are either + obsolete or are the tip of a work in progress. +- By pushing and pulling the obsolescence graph, users can collaborate more + easily on changes-in-progress. This is better than pushing and pulling the + changes themselves since the obsolescence graph can be used to locate a more + specific merge base, allowing for better merges between different versions of + the same change. +- It could be used to correctly rebase local changes and other local branches + after running git-filter-branch. +- It can replace the change-id footer used by gerrit. + +Similar technologies + +There are some other technologies that address the same end-user problem. + +Rebase -i can be used to solve the same problem, but users can't easily switch +tasks midway through an interactive rebase or have more than one interactive +rebase going on at the same time. It can't handle the case where you have +multiple changes sharing the same parent when that parent needs to be rebased +and won't let you collaborate with others on resolving a complicated interactive +rebase. You can think of rebase -i as a top-down approach and the evolve command +as the bottom-up approach to the same problem. + +Several patch queue managers have been built on top of git (such as topgit, +stgit, and quilt). They address the same user need. However they also rely on +state managed outside git that needs to be kept in sync. Such state can be +easily damaged when running a git native command that is unaware of the patch +queue. They also typically require an explicit initialization step to be done by +the user which creates workflow problems. + +Replacements (refs/replace) are superficially similar to obsolescences in that +they describe that one commit should be replaced by another. However, they +differ in both how they are created and how they are intended to be used. +Obsolescences are created automatically by the commands a user runs, and they +describe the user’s intent to perform a future rebase. Obsolete commits still +appear in branches, logs, etc like normal commits (possibly with an extra +decoration that marks them as obsolete). Replacements are typically created +explicitly by the user, they are meant to be kept around for a long time, and +they describe a replacement to be applied at read-time rather than as the input +to a future operation. When a replaced commit is queried, it is typically hidden +and swapped out with its replacement as though the replacement has already +occurred. + +Goals +- +Legend: Goals marked with P0 are required. Goals marked with Pn should be +attempted unless they interfere with goals marked with Pn-1. + +P0. All commands that modify commits (such as the normal commit --amend or +rebase command) should mark the old commit as being obsolete and replaced by +the new one. No additional commands should be required to keep the +obsolescence graph up-to-date. +P0. Any commit that may be involved in a future evolve