Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-07 Thread Jeff King
On Thu, Jan 03, 2013 at 02:33:09PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Oh, I agree it's insane to try to carry through unmerged entries. I'm
> > just concerned that not all code paths are careful enough to check.
> 
> I would actually be surprised if some code path do assume somebody
> might give them an index with conflicting entries in it and guard
> against it.  We have been coding under the "index must exactly match
> the second tree when three-way unpack_trees() begin" requirement
> since day one.  An conflicted entry will appear as "index and HEAD
> not matching" and will cause reject_merge() early in threeway_merge()
> anyway, no?

Hmm. There is code in threeway_merge to handle a few cases:

/*
 * We start with cases where the index is allowed to match
 * something other than the head: #14(ALT) and #2ALT, where it
 * is permitted to match the result instead.
 */
/* #14, #14ALT, #2ALT */
if (remote && !df_conflict_head && head_match && !remote_match) {
if (index && !same(index, remote) && !same(index, head))
return o->gently ? -1 : reject_merge(index, o);
return merged_entry(remote, index, o);
}

but I do not think we have to worry, because we know that the index will
never match remote, either (and merged_entry has already been taught to
be wary of the conflicted bit, anyway). I'm not entirely clear on how
that would get triggered if all of the callers avoid operating on a
modified index.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-03 Thread Junio C Hamano
Jeff King  writes:

> Oh, I agree it's insane to try to carry through unmerged entries. I'm
> just concerned that not all code paths are careful enough to check.

I would actually be surprised if some code path do assume somebody
might give them an index with conflicting entries in it and guard
against it.  We have been coding under the "index must exactly match
the second tree when three-way unpack_trees() begin" requirement
since day one.  An conflicted entry will appear as "index and HEAD
not matching" and will cause reject_merge() early in threeway_merge()
anyway, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-03 Thread Jeff King
On Thu, Jan 03, 2013 at 12:34:27PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Or are you suggesting that the three-way case should always be protected
> > by checking that there are no unmerged entries before we start it? That
> > seems sane to me, but I haven't confirmed that that is the case.
> 
> I think the normal (and hopefully only) "-m -u O A B" use case of
> threeway is the bog-standard "git merge", which requires even more:
> your index must exactly match HEAD, even though you are allowed to
> have local changes in the working tree.
> 
> That requirement is not likely to change, as cleanly merged paths
> are automatically added to the index, so "diff --cached" should show
> only the changes from cleanly merged part, while "diff" should show
> paths that still needs user's help.
> 
> If we allowed local modification to the index (let alone conflicted
> entries in it) before the merge begins, the users would not be able
> to tell which paths are in what state after a half-merge stops and
> asks for help.  Updated paths may not have anything to do with the
> merge (i.e. earlier "git add" before the merge started), conflicting
> paths may not have anything to do with the merge (i.e. leftover
> conflicts before the merge started).

Oh, I agree it's insane to try to carry through unmerged entries. I'm
just concerned that not all code paths are careful enough to check. They
probably are, though, as a null sha1 in your index would be the least of
your worries, and somebody would have noticed in the last 7 years.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-03 Thread Junio C Hamano
Jeff King  writes:

> Or are you suggesting that the three-way case should always be protected
> by checking that there are no unmerged entries before we start it? That
> seems sane to me, but I haven't confirmed that that is the case.

I think the normal (and hopefully only) "-m -u O A B" use case of
threeway is the bog-standard "git merge", which requires even more:
your index must exactly match HEAD, even though you are allowed to
have local changes in the working tree.

That requirement is not likely to change, as cleanly merged paths
are automatically added to the index, so "diff --cached" should show
only the changes from cleanly merged part, while "diff" should show
paths that still needs user's help.

If we allowed local modification to the index (let alone conflicted
entries in it) before the merge begins, the users would not be able
to tell which paths are in what state after a half-merge stops and
asks for help.  Updated paths may not have anything to do with the
merge (i.e. earlier "git add" before the merge started), conflicting
paths may not have anything to do with the merge (i.e. leftover
conflicts before the merge started).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-03 Thread Jeff King
On Thu, Jan 03, 2013 at 07:34:53AM -0800, Junio C Hamano wrote:

> > Good point; I was just thinking about the --reset case.
> >
> > With "-m", though, we could in theory carry over the unmerged entries
> > (again, assuming that "old" and "new" are the same; otherwise it is an
> > obvious reject). But those entries would be confused with any new
> > unmerged entries we create. It seems we already protect against this,
> > though: "read-tree -m" will not run at all if you have unmerged entries.
> >
> > Likewise, "checkout" seems to have similar protections.
> >
> > So I think it may be a non-issue.
> 
> Yeah.  Also earlier in the thread you mentioned three-way case, but
> I do not think we ever would want --reset with three trees, so I
> think that too is a non-issue for the same reason.

Yeah, agreed; we should always reject in the three-way case. I would
worry more that it has a bug where something is _not_ rejected, and we
end up putting a bogus null sha1 entry into the index (which is the
actual problem with twoway_merge). IOW, if we have the bogus sha1 in the
index (because we marked it with CE_CONFLICTED), and the two sides and
the common ancestor are all the same, would we blindly carry through the
bogus conflicted entry (which we would prefer, because it has the
up-to-date stat information)?

Or are you suggesting that the three-way case should always be protected
by checking that there are no unmerged entries before we start it? That
seems sane to me, but I haven't confirmed that that is the case.

> I would still feel safer if we expressed the expectation of
> the callee in the code, perhaps like this in the two-way case:
> 
>   if (current->ce_flags & CE_CONFLICTED) {
>   if (!o->reset) {
>   ... either die or fail ...
>   } else {
>   ... your fix ...
>   }
>   }

Agreed. I looked at that, but it seemed like it was going to involve
repeating a lot of the "are the two trees the same" logic. Let me see if
I can refactor it to avoid that.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-03 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Jan 01, 2013 at 02:24:46PM -0800, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > So I think we need to update twoway_merge to recognize unmerged entries,
>> > which gives us two options:
>> >
>> >   1. Reject the merge.
>> >
>> >   2. Throw away the current unmerged entry in favor of the "new" entry
>> >  (when old and new are the same, of course; otherwise we would
>> >  reject).
>> >
>> > I think (2) is the right thing.
>> 
>> As "--reset" in "read-tree --reset -u A B" is a way to say "we know
>> we are based on A and we want to go to B no matter what", I agree we
>> should not reject the merge.
>> 
>> With -m instead of --reset, I am not sure what the right semantics
>> should be, though.
>
> Good point; I was just thinking about the --reset case.
>
> With "-m", though, we could in theory carry over the unmerged entries
> (again, assuming that "old" and "new" are the same; otherwise it is an
> obvious reject). But those entries would be confused with any new
> unmerged entries we create. It seems we already protect against this,
> though: "read-tree -m" will not run at all if you have unmerged entries.
>
> Likewise, "checkout" seems to have similar protections.
>
> So I think it may be a non-issue.

Yeah.  Also earlier in the thread you mentioned three-way case, but
I do not think we ever would want --reset with three trees, so I
think that too is a non-issue for the same reason.

I would still feel safer if we expressed the expectation of
the callee in the code, perhaps like this in the two-way case:

if (current->ce_flags & CE_CONFLICTED) {
if (!o->reset) {
... either die or fail ...
} else {
... your fix ...
}
}

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-03 Thread Jeff King
On Tue, Jan 01, 2013 at 02:24:46PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > So I think we need to update twoway_merge to recognize unmerged entries,
> > which gives us two options:
> >
> >   1. Reject the merge.
> >
> >   2. Throw away the current unmerged entry in favor of the "new" entry
> >  (when old and new are the same, of course; otherwise we would
> >  reject).
> >
> > I think (2) is the right thing.
> 
> As "--reset" in "read-tree --reset -u A B" is a way to say "we know
> we are based on A and we want to go to B no matter what", I agree we
> should not reject the merge.
> 
> With -m instead of --reset, I am not sure what the right semantics
> should be, though.

Good point; I was just thinking about the --reset case.

With "-m", though, we could in theory carry over the unmerged entries
(again, assuming that "old" and "new" are the same; otherwise it is an
obvious reject). But those entries would be confused with any new
unmerged entries we create. It seems we already protect against this,
though: "read-tree -m" will not run at all if you have unmerged entries.

Likewise, "checkout" seems to have similar protections.

So I think it may be a non-issue.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] two-way read-tree can write null sha1s into index

2013-01-01 Thread Junio C Hamano
Jeff King  writes:

> So I think we need to update twoway_merge to recognize unmerged entries,
> which gives us two options:
>
>   1. Reject the merge.
>
>   2. Throw away the current unmerged entry in favor of the "new" entry
>  (when old and new are the same, of course; otherwise we would
>  reject).
>
> I think (2) is the right thing.

As "--reset" in "read-tree --reset -u A B" is a way to say "we know
we are based on A and we want to go to B no matter what", I agree we
should not reject the merge.

With -m instead of --reset, I am not sure what the right semantics
should be, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] two-way read-tree can write null sha1s into index

2012-12-29 Thread Jeff King
On Sat, Dec 29, 2012 at 06:05:41AM -0500, Jeff King wrote:

>   [clear state from last run]
>   $ rm -rf .git/rebase-apply
>   $ git reset --hard
> 
>   [apply the patch; we get a conflict]
>   $ git am -3sc 
> queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
> 
>   [now run just the read-tree from "am --abort"]
>   $ git.compile read-tree --reset -u HEAD ORIG_HEAD
>   warning: cache entry has null sha1: sound/usb/midi.c
> 
>   [and now check our index]
>   $ git ls-files -s sound/usb/midi.c
>   100644  0 sound/usb/midi.c
> 
>   [yes, this index is bogus]
>   $ git write-tree
>   error: invalid object 100644  for 
> 'sound/usb/midi.c'
>   fatal: git-write-tree: error building trees
> 
> So I think this check may actually be finding a real bug. I have seen
> these null sha1s in the wild, but I was never able to track down the
> actual cause. Maybe this will give us a clue. Now we just need to work
> backwards and figure out who is putting it in the in-memory index and
> why.

I made some progress on this, but I'd like a sanity check from others
(especially Junio). As far as I can tell, this is a bug in read-tree.

When we call "read-tree --reset -u HEAD ORIG_HEAD", the first thing we
do with the index is call read_cache_unmerged. Originally that would
read the index, leaving aside any unmerged entries. However, as of
d1a43f2 (reset --hard/read-tree --reset -u: remove unmerged new paths,
2008-10-15), it actually creates a new cache entry. This serves as a
placeholder, so that we later know to update the working tree.

However, we later noticed that the sha1 of that unmerged entry was
just copied from some higher stage, leaving you with random content in
the index.  That was fixed by e11d7b5 ("reset --merge": fix unmerged
case, 2009-12-31), which instead puts the null sha1 into the newly
created entry, and sets a CE_CONFLICTED flag. At the same time, it
teaches the unpack-trees machinery to pay attention to this flag, so
that oneway_merge throws away the current value.

However, it did not update the code paths for  twoway_merge, which is
where we end up in the read-tree above. We notice that the HEAD and
ORIG_HEAD versions are the same, and say "oh, we can just reuse the
current version". But that's not true. The current version is bogus.

So I think we need to update twoway_merge to recognize unmerged entries,
which gives us two options:

  1. Reject the merge.

  2. Throw away the current unmerged entry in favor of the "new" entry
 (when old and new are the same, of course; otherwise we would
 reject).

I think (2) is the right thing. It fixes the entry of the bogus sha1
into the index, _and_ it solves the problem that "git am --abort" leaves
the conflicted entry as a modification. It should just go away. But
maybe I am forgetting some other case where read-tree should be more
conservative, and (1) is a safer choice.

Something like this patch:

diff --git a/unpack-trees.c b/unpack-trees.c
index 6d96366..e06e01f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1746,14 +1746,19 @@ int twoway_merge(struct cache_entry **src, struct 
unpack_trees_options *o)
newtree = NULL;
 
if (current) {
-   if ((!oldtree && !newtree) || /* 4 and 5 */
-   (!oldtree && newtree &&
-same(current, newtree)) || /* 6 and 7 */
-   (oldtree && newtree &&
-same(oldtree, newtree)) || /* 14 and 15 */
-   (oldtree && newtree &&
-!same(oldtree, newtree) && /* 18 and 19 */
-same(current, newtree))) {
+   if (current->ce_flags & CE_CONFLICTED) {
+   if (same(oldtree, newtree))
+   return merged_entry(newtree, current, o);
+   return o->gently ? -1 : reject_merge(current, o);
+   }
+   else if ((!oldtree && !newtree) || /* 4 and 5 */
+(!oldtree && newtree &&
+ same(current, newtree)) || /* 6 and 7 */
+(oldtree && newtree &&
+ same(oldtree, newtree)) || /* 14 and 15 */
+(oldtree && newtree &&
+ !same(oldtree, newtree) && /* 18 and 19 */
+ same(current, newtree))) {
return keep_entry(current, o);
}
else if (oldtree && !newtree && same(current, oldtree)) {

I suspect threeway_merge may need a similar update, but I haven't looked
too carefully yet.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html