Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

As others mentioned, this is essentially a fancy way to experiment
with adding a new header (with the same name as an existing header) to
a commit.  It is kind of a scary thing to do because anyone trying to
parse commits, including old versions of git, is likely to get
confused by the multiple trees.  It doesn't affect the reachability
calculation in the way that it should so this ends up being something
that should be straightforward to do with a message in the commit body
instead.

To affect reachability, you could use multiple parent lines instead.
You'd need synthetic commits to hang the trees on.  This is similar to
how "git stash" stores the index state.

In other words, I think what you are trying to do is feasible, but not
in the exact way you described.

[...]
> * porcelain to modify the repo "at larger scale", such as rebase,
> cherrypicking, reverting
>   involving more than 1 commit.
>
> These large scale operations involving multiple commits however
> are all modal in its nature. Before doing anything else, you have to
> finish or abort the rebase or you need expert knowledge how to
> go otherwise.
>
> During the rebase there might be a hard to resolve conflict, which
> you may not want to resolve right now, but defer to later.  Deferring a
> conflict is currently impossible, because precisely one tree is recorded.

Junio mentions you'd want to record:
 - stages of the index, to re-create a conflicted index
 - working tree files, with conflict markers

In addition you may also want to record:
 - state (todo list) from .git/rebase-merge, to allow picking up where
   you left off in such a larger operation
 - similar state for other commands --- e.g. MERGE_MSG

Recording this work-in-progress state is in the spirit of "git stash"
does.  People also sometimes like to record their state in progress with
a "wip commit" at the tip of a branch.  Both of those workflows would
benefit from something like this, I'd think.

So I kind of like this.  Maybe a "git save-wip" command that is like
"git stash" but records state to the current branch?  And similarly
improving "git stash" to record the same richer state.

And in the spirit of "git stash" I think it is possible without
even modifying the commit object format.

[...]
> I'd be advocating for having multiple trees in a commit
> possible locally; it might be a bad idea to publish such trees.

I think such "WIP state" may also be useful for publishing, to allow
collaborating on a thorny rebase or merge.

Thanks and hope that helps,
Jonathan


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Junio C Hamano
Jacob Keller  writes:

> How does this let you defer a conflict? A future commit which modified
> blobs in that tree wouldn't know what version of the trees/blobs to
> actually use? Clearly future commits could record their own trees, but
> how would they generate the "correct" tree?
>
> Maybe I am missing something here?

If you write four trees out of each stage in the index and record
them, you could in theory have a new command that reads them and
recreate the conflicted index.  Oh, and then you would need the
fifth tree that records what the working-tree files (with conflict
markers) looked like, in order to reproduce the state seen by the
person who ran "git merge", attempted to resolve and gave up halfway
in the middle.

As a local operation, extending "git stash" somehow so that it can
stash even in a conflicted working tree may be a better approach,
and it does not need cruft headers in commit objects, I would think.


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Junio C Hamano
Stefan Beller  writes:

> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

The second one is merely a random unauthorized header that is not
interpreted in any way by Git.  It is merely being confusing by
starting with "tree " and having 40-hex after it, but the 40-hex
does not get interpreted as an object name, and does not participate
in reachability computation (i.e. packing, pruning and fsck).

There is not much difference between that and a line of trailer in
the commit log message (other than this one is less discoverable).


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-02-28 Thread Jacob Keller
On Wed, Feb 28, 2018 at 3:30 PM, Stefan Beller  wrote:
> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.
>
> But why would I want to do that?
>
> There are multiple abstraction levels in Git, I think of them as follows:
> * data structures / object model
> * plumbing
> * porcelain commands to manipulate the repo "at small scale", e.g.
> create a commit/tag
> * porcelain to modify the repo "at larger scale", such as rebase,
> cherrypicking, reverting
>   involving more than 1 commit.
>
> These large scale operations involving multiple commits however
> are all modal in its nature. Before doing anything else, you have to
> finish or abort the rebase or you need expert knowledge how to
> go otherwise.
>
> During the rebase there might be a hard to resolve conflict, which
> you may not want to resolve right now, but defer to later.  Deferring a
> conflict is currently impossible, because precisely one tree is recorded.
>

How does this let you defer a conflict? A future commit which modified
blobs in that tree wouldn't know what version of the trees/blobs to
actually use? Clearly future commits could record their own trees, but
how would they generate the "correct" tree?

Maybe I am missing something here?

Thanks,
Jake

> If we had multiple trees possible in a commit, then all these large scale
> operations would stop being modal and you could just record the unresolved
> merge conflict instead; to come back later and fix it up later.
>
> I'd be advocating for having multiple trees in a commit
> possible locally; it might be a bad idea to publish such trees.
>
> Opinions or other use cases?
>
> Thanks,
> Stefan


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-02-28 Thread Jeff King
On Wed, Feb 28, 2018 at 03:30:27PM -0800, Stefan Beller wrote:

> During the rebase there might be a hard to resolve conflict, which
> you may not want to resolve right now, but defer to later.  Deferring a
> conflict is currently impossible, because precisely one tree is recorded.
> 
> If we had multiple trees possible in a commit, then all these large scale
> operations would stop being modal and you could just record the unresolved
> merge conflict instead; to come back later and fix it up later.
> 
> I'd be advocating for having multiple trees in a commit
> possible locally; it might be a bad idea to publish such trees.
> 
> Opinions or other use cases?

What benefit does it have over adding a new header "unresolved-tree" or
similar? I do not think you are getting any backwards compatibility
here. For instance, "prune" will not traverse it with existing versions
of git, nor "pack-objects" include it in a pack (I didn't actually test
it, so I could be wrong; but those are all based around parse_commit,
which should look at only the first tree).

-Peff


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-02-28 Thread Brandon Williams
On 03/01, Ramsay Jones wrote:
>
>
> On 28/02/18 23:30, Stefan Beller wrote:
> > $ git hash-object --stdin -w -t commit < > tree c70b4a33a0089f15eb3b38092832388d75293e86
> > parent 105d5b91138ced892765a84e771a061ede8d63b8
> > author Stefan Beller  1519859216 -0800
> > committer Stefan Beller  1519859216 -0800
> > tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> > test commit
> > EOF
> > 19abfc3bf1c5d782045acf23abdf7eed81e16669
> > $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> > $
> >
> > So it is technically possible to create a commit with two tree entries
> > and fsck is not complaining.
>
> Hmm, it's a while since I looked at that code, but I don't think
> you have a commit with two trees - the second 'tree ' line
> is just part of the commit message, isn't it?
>
> ATB,
> Ramsay Jones
>

Actually it doesn't look like it.  The commit msg doesn't start till
after an empty newline so that commit has an empty commit msg.  Here's
one which you can see the msg when passed to show:

git hash-object --stdin -w -t commit < 1519859216 -0800
committer Brandon Williams  1519859216 -0800
tree 76d269b57d3c4283922216f84a2850e99f561ccc

This is a test commit with multiple trees
EOF

Of course the extra tree is ignored, but fsck doesn't complain and show
happily shows what it knows about.

--
Brandon Williams


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-02-28 Thread Ramsay Jones


On 28/02/18 23:30, Stefan Beller wrote:
> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
> 
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

Hmm, it's a while since I looked at that code, but I don't think
you have a commit with two trees - the second 'tree ' line
is just part of the commit message, isn't it?

ATB,
Ramsay Jones



The case for two trees in a commit ("How to make rebase less modal")

2018-02-28 Thread Stefan Beller
$ git hash-object --stdin -w -t commit < 1519859216 -0800
committer Stefan Beller  1519859216 -0800
tree 5495266479afc9a4bd9560e9feac465ed43fa63a
test commit
EOF
19abfc3bf1c5d782045acf23abdf7eed81e16669
$ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
$

So it is technically possible to create a commit with two tree entries
and fsck is not complaining.

But why would I want to do that?

There are multiple abstraction levels in Git, I think of them as follows:
* data structures / object model
* plumbing
* porcelain commands to manipulate the repo "at small scale", e.g.
create a commit/tag
* porcelain to modify the repo "at larger scale", such as rebase,
cherrypicking, reverting
  involving more than 1 commit.

These large scale operations involving multiple commits however
are all modal in its nature. Before doing anything else, you have to
finish or abort the rebase or you need expert knowledge how to
go otherwise.

During the rebase there might be a hard to resolve conflict, which
you may not want to resolve right now, but defer to later.  Deferring a
conflict is currently impossible, because precisely one tree is recorded.

If we had multiple trees possible in a commit, then all these large scale
operations would stop being modal and you could just record the unresolved
merge conflict instead; to come back later and fix it up later.

I'd be advocating for having multiple trees in a commit
possible locally; it might be a bad idea to publish such trees.

Opinions or other use cases?

Thanks,
Stefan