Erik Elfström <erik.elfst...@gmail.com> writes: > Before this change, clean used resolve_gitlink_ref to check for the > presence of nested git repositories. This had the drawback of creating > a ref_cache entry for every directory that should potentially be > cleaned. The linear search through the ref_cache list caused a massive > performance hit for large number of directories.
I'd prefer to see the "current state" described in the current tense, e.g. "git clean" uses resolve_gitlink_ref() to check for the presence of nested git repositories, but it has the drawback of creating a ref_cache entry for every directory that should potentially be cleaned. The linear search through the ref_cache list causes a massive performance hit for large number of directories. > Teach clean.c:remove_dirs to use setup.c:is_git_directory > instead. is_git_directory will actually open HEAD and parse the HEAD > ref but this implies a nested git repository and should be rare when > cleaning. I am not sure what you wanted to say in this paragraph. What does it being rare have to do with it? Even if it is not rare (i.e. the top-level project you are working with has many submodules checked out without using the more recent "a file .git pointing into .git/modules/ via 'gitdir: $overThere'" mechanism), if we found a nested git repository, we treat it as special and exclude it from cleaning it out, which is a good thing, no? Doesn't this implementation get confused by modern submodule checkouts and descend into and clean their working tree, though? Module M with path P would have a directory P in the working tree of the top-level project, and P/.git is a regular file that will fail "is_git_directory()" test but records the location of the real submodule repository i.e. ".git/modules/M" via the "gitdir:" mechanism. > Using is_git_directory should give a more standardized check for what > is and what isn't a git repository but also gives a slight behavioral > change. We will now detect and respect bare and empty nested git > repositories (only init run). Update t7300 to reflect this. > > The time to clean an untracked directory containing 100000 sub > directories went from 61s to 1.7s after this change. > > Helped-by: Jeff King <p...@peff.net> > Signed-off-by: Erik Elfström <erik.elfst...@gmail.com> > --- > builtin/clean.c | 24 ++++++++++++++++++++---- > t/t7300-clean.sh | 4 ++-- > 2 files changed, 22 insertions(+), 6 deletions(-) > > diff --git a/builtin/clean.c b/builtin/clean.c > index 98c103f..b679913 100644 > --- a/builtin/clean.c > +++ b/builtin/clean.c > @@ -10,7 +10,6 @@ > #include "cache.h" > #include "dir.h" > #include "parse-options.h" > -#include "refs.h" > #include "string-list.h" > #include "quote.h" > #include "column.h" > @@ -148,6 +147,25 @@ static int exclude_cb(const struct option *opt, const > char *arg, int unset) > return 0; > } > > +static int is_git_repository(struct strbuf *path) > +{ > + int ret = 0; > + if (is_git_directory(path->buf)) > + ret = 1; > + else { > + size_t orig_path_len = path->len; > + assert(orig_path_len != 0); > + if (path->buf[orig_path_len - 1] != '/') > + strbuf_addch(path, '/'); > + strbuf_addstr(path, ".git"); > + if (is_git_directory(path->buf)) > + ret = 1; > + strbuf_setlen(path, orig_path_len); > + } > + > + return ret; > +} > + > static int remove_dirs(struct strbuf *path, const char *prefix, int > force_flag, > int dry_run, int quiet, int *dir_gone) > { > @@ -155,13 +173,11 @@ static int remove_dirs(struct strbuf *path, const char > *prefix, int force_flag, > struct strbuf quoted = STRBUF_INIT; > struct dirent *e; > int res = 0, ret = 0, gone = 1, original_len = path->len, len; > - unsigned char submodule_head[20]; > struct string_list dels = STRING_LIST_INIT_DUP; > > *dir_gone = 1; > > - if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && > - !resolve_gitlink_ref(path->buf, "HEAD", > submodule_head)) { > + if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && > is_git_repository(path)) { > if (!quiet) { > quote_path_relative(path->buf, prefix, "ed); > printf(dry_run ? _(msg_would_skip_git_dir) : > _(msg_skip_git_dir), > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh > index 58e6b4a..da294fe 100755 > --- a/t/t7300-clean.sh > +++ b/t/t7300-clean.sh > @@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' ' > ! test -d bar > ' > > -test_expect_failure 'nested git (only init) should be kept' ' > +test_expect_success 'nested git (only init) should be kept' ' > rm -fr foo bar && > git init foo && > mkdir bar && > @@ -465,7 +465,7 @@ test_expect_failure 'nested git (only init) should be > kept' ' > test_path_is_missing bar > ' > > -test_expect_failure 'nested git (bare) should be kept' ' > +test_expect_success 'nested git (bare) should be kept' ' > rm -fr foo bar && > git init --bare foo && > mkdir bar && -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html