On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
> Commits 404ebceda01c ("dir: also check directories for matching
> pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
> match files under a dir, recurse into it", 2019-09-17) added calls to
> match_pathspec() and do_match_pathspec() passing along their pathspec
> parameter. Both match_pathspec() and do_match_pathspec() assume the
> pathspec argument they are given is non-NULL. It turns out that
> unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
> pathspec == NULL, and it is possible on case insensitive filesystems for
> that NULL to make it to these new calls to match_pathspec() and
> do_match_pathspec(). Add appropriate checks on the NULLness of pathspec
> to avoid a segfault.
>
> In case the negation throws anyone off (one of the calls was to
> do_match_pathspec() while the other was to !match_pathspec(), yet no
> negation of the NULLness of pathspec is used), there are two ways to
> understand the differences:
> * The code already handled the pathspec == NULL cases before this
> series, and this series only tried to change behavior when there was
> a pathspec, thus we only want to go into the if-block if pathspec is
> non-NULL.
> * One of the calls is for whether to recurse into a subdirectory, the
> other is for after we've recursed into it for whether we want to
> remove the subdirectory itself (i.e. the subdirectory didn't match
> but something under it could have). That difference in situation
> leads to the slight differences in logic used (well, that and the
> slightly unusual fact that we don't want empty pathspecs to remove
> untracked directories by default).
>
> Denton found and analyzed one issue and provided the patch for the
> match_pathspec() call, SZEDER figured out why the issue only reproduced
> for some folks and not others and provided the testcase, and I looked
> through the remainder of the series and noted the do_match_pathspec()
> call that should have the same check.
>
> Co-authored-by: Denton Liu <[email protected]>
> Co-authored-by: SZEDER Gábor <[email protected]>
> Signed-off-by: Elijah Newren <[email protected]>
> ---
> Note: Applies on top of en/clean-nested-with-ignored, in next.
>
> As with v1, the authorship is really mixed, so I don't know if I
> should use Co-authored-by (highlighted as a possibility by Denton), or
> the far more common Helped-by (as suggested by Junio but based on a
> more limited summary of the different contributions), or if perhaps
> Denton or SZEDER should be marked as the author and I be marked as
> Helped-by or Co-authored-by. Since Denton commented on round 1, I
> used his suggestion for attribution in this round, but I'm open to
> changing it to whatever works best.
>
> Changes since v2:
> - This time actually removed the entire unnecessary comment
>
> Range-diff:
> 1: c495b9303c ! 1: 40392c6bba dir: special case check for the possibility
> that pathspec is NULL
> @@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode
> normalization)' '
> + git reset --hard &&
> + mkdir -p gitweb/subdir &&
> + >gitweb/subdir/file &&
> -+ # it is not strictly necessary to add and commit the
> + git add gitweb &&
> + git commit -m "add gitweb/subdir/file" &&
> +
>
> dir.c | 8 +++++---
> t/t0050-filesystem.sh | 21 +++++++++++++++++++++
> 2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 7ff79170fc..bd39b86be4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1962,8 +1962,9 @@ static enum path_treatment
> read_directory_recursive(struct dir_struct *dir,
> ((state == path_untracked) &&
> (get_dtype(cdir.de, istate, path.buf, path.len) ==
> DT_DIR) &&
> ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> - do_match_pathspec(istate, pathspec, path.buf,
> path.len,
> - baselen, NULL,
> DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> + (pathspec &&
> + do_match_pathspec(istate, pathspec, path.buf,
> path.len,
> + baselen, NULL,
> DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) {
> struct untracked_cache_dir *ud;
> ud = lookup_untracked(dir->untracked, untracked,
> path.buf + baselen,
> @@ -1975,7 +1976,8 @@ static enum path_treatment
> read_directory_recursive(struct dir_struct *dir,
> if (subdir_state > dir_state)
> dir_state = subdir_state;
>
> - if (!match_pathspec(istate, pathspec, path.buf,
> path.len,
> + if (pathspec &&
> + !match_pathspec(istate, pathspec, path.buf,
> path.len,
> 0 /* prefix */, NULL,
> 0 /* do NOT special case dirs */))
> state = path_none;
> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index 192c94eccd..a840919967 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
> git merge topic
> '
>
> +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a
> case insensitive fs' '
> + git init repo &&
> + (
> + cd repo &&
> +
> + >Gitweb &&
> + git add Gitweb &&
> + git commit -m "add Gitweb" &&
> +
> + git checkout --orphan todo &&
> + git reset --hard &&
> + mkdir -p gitweb/subdir &&
> + >gitweb/subdir/file &&
> + git add gitweb &&
> + git commit -m "add gitweb/subdir/file" &&
> +
> + git checkout master
> + )
> +'
I don't like this test ;)
I only intended it as a "here is how to reliably reproduce the
segfault without all the clutter of the full git.git repository" that
I wrote way past my bedtime. But I think that:
- it shouldn't have the CASE_INSENSITIVE_FS prereq. Yes, that
segfault could only be triggered on a case insensitive filesystem,
but the given sequence of commands should succeed in a case
sensitive file system just as well.
(Have no idea why I added that prereq in the first place; as I
said above, it was way past my bedtime...)
- it's in the wrong test script; it would be better among other
tests checking what 'git checkout' should or must not overwrite
when switching branches, but not sure which test script that is.
(I think I added it to this test script, because it stood out a
bit when grepping for case insensitive fs in the test suite; I
play the "past my bedtime" card again :)
- it's already satisfied by 'git checkout master' not failing, but
it doesn't check whether the resulting contents of the worktree
are as expected.
- it still bothers me why that additional subdir was necessary to
trigger the segfault. Did you look into it?