On Fri, Jun 1, 2018 at 10:03 PM, Duy Nguyen <pclo...@gmail.com> wrote:
> On Fri, Jun 1, 2018 at 8:34 PM, Elijah Newren <new...@gmail.com> wrote:
>> On Fri, Jun 1, 2018 at 9:11 AM, Nguyễn Thái Ngọc Duy <pclo...@gmail.com> 
>> wrote:
>>> This is more of a bug report than an actual fix because I'm not sure
>>> if "o->src_index" is always the correct answer instead of "the_index"
>>> here. But this is very similar to 7db118303a (unpack_trees: fix
>>> breakage when o->src_index != o->dst_index - 2018-04-23) and could
>>> potentially break things again...

I'm pretty sure your patch is correct.  Adding Brandon Williams to the
cc for comment since his patches came up in the analysis below...

>> Actually, I don't think the patch will break anything in the current
>> code.  Currently, all callers of unpack_trees() (even merge recursive
>> which uses different src_index and dst_index now) set src_index =
>> &the_index.  So, these changes should continue to work as before (with
>> a minor possible exception of merge-recursive calling into other
>> functions from unpack-trees.c after unpack_trees() has finished..).
>> That's not to say that your patch is bug free, just that I think any
>> bugs shouldn't be triggerable from the current codebase.
>
> Ah.. I thought merge-recursive would do fancier things and used some
> temporary index. Good to know.

Well, it does does use a temporary index, but for dst_index rather
than src_index.  It then does some fancier things, but not until the
call to unpack_trees() is over.  In particular, at that point, it
swaps the_index and tmp_index, reversing their roles so that now
tmp_index is the original index and the_index becomes the result after
unpack_trees() is run.  That's done because I later want to use the
original index for calling verify_uptodate().  verify_uptodate() is
then used for was_tracked_and_matches() and was_tracked().

Anyway, the whole upshot of this is:
  * merge-recursive uses src_index == &the_index for the unpack_trees() call.
  * merge-recursive uses src_index == o->orig_index for subsequent
calls to verify_uptodate(), but verify_uptodate() doesn't call into
any of the sites you have modified.

Further:
  * Every other existing caller of unpack-trees in the code sets
src_index == &the_index, so this won't break any of them.
  * There are only two callers in the codebase that set dst_index to
something other than &the_index -- diff-lib.c (which sets it to NULL)
and merge-recursive (which does the stuff described above).

So, having done that analysis, I am now pretty convinced your patch
won't break anything.  That's one half...

>> Also, if any of the changes you made are wrong, what was there before
>> was also clearly wrong.  So I think we're at least no worse off.
>>
>> But, I agree, it's not easy to verify what the correct thing should be
>> in all cases.  I'll try to take a closer look in the next couple days.
>
> Thanks. I will also stare at this code some more in the next couple
> days trying to remember what these functions do.

Your patch has two divisible parts:

1) Your modifications to
  * clear_ce_flags_1()
  * clear_ce_flags_dir()
  * clear_ce_flags()
  * mark_new_skip_worktree()
The clear_ce_flags*() functions are only called by each other and by
mark_new_skip_worktree(), which in turn is only called from
unpack_trees().  Also, in all of these, your change ends up only
modifying what index_state is passed to is_excluded_from_list().

2) Your modifications to
  * verify_clean_subdirectory()
  * check_ok_to_remove()
In this case, the former is only called by the latter, and the latter
ends up only being called (via verify_absent_1()) from verify_absent()
and verify_absent_sparse().

I'll address each, in reverse order.

2) The stuff that affects verify_absent()

While verify_absent() is not called from merge-recursive right now, it
was something I wanted to use in the future for very similar reasons
that verify_uptodate() started being called directly from
merge-recursive.  In particular, if the rewrite of merge-recursive[A]
I want to do sets index_only when calling unpack_trees(), then does
the whole merge without touching the worktree, then at the end goes to
update the working tree, it will need to do extra checks.
verify_absent() will come in handy as one of those extra checks.  For
that case, using the_index (the new index just created with lots of
changes) would be wrong in all the same ways that using the_index
caused massive problems for was_tracked() in merge-recursive (e.g. the
blow up of when Junio merged the original directory rename detection
series into master and subsequently reverted it); we'd instead want
src_index (which was the index that existed when merge was called)
instead.  So, with this patch you've fixed some important bugs that I
would have hit later.

[A] sidenote: see
https://public-inbox.org/git/xmqqk1ydkbx0....@gitster.mtv.corp.google.com/
for more details

1) mark_new_skip_worktree() ... is_excluded_from_list().

Sadly, I'm not very familiar with the skip_worktree and sparse
checkout stuff.  However, the fact that mark_new_skip_worktree()
explicitly takes an index_state (and a different one is passed to it
the two different times it is called), and that it is the only caller
of the clear_ce_flags*() family of functions, and that those function
use the cache entries from the index passed to them in all cases other
than the calls to is_excluded_from_list() makes those two look like
oversights.  In fact, a little more digging turns up commit
   fba92be8f7 ("dir: convert is_excluded_from_list to take an index",
2017-05-05)
and before then, those functions didn't use the_index directly.  But
they did use it indirectly, because they called a function in dir.c
that had it hardcoded.  So it looks like Brandon fixed part of the bug
for us, but moved the incorrect hardcoding from dir.c to
unpack-trees.c.  Your patch is just fixing it up.  In fact, a little
more digging turns up:

2c1eb10454 ("dir: convert read_directory to take an index", 2017-05-05)
a0bba65b10 ("dir: convert is_excluded to take an index", 2017-05-05)

which appear to be the culprits behind the other two uses of the_index
called from verify_absent().  It looks like before these commits that
unpack_trees() was carefully using the appropriate index, except that
functions in dir.c had use of the_index hardcoded.  Brandon fix the
functions in dir.c for us, but ended up still hardcoding the_index in
unpack-trees.c.  Basically, he did most of the necessary lifting, and
your patch just finally changes them over to use the correct index.

Brandon: Does anything look off in my analysis above?

Reply via email to