Erik Elfström <[email protected]> writes:
> Before this change, clean used resolve_gitlink_ref to check for the
> presence of nested git repositories. This had the drawback of creating
> a ref_cache entry for every directory that should potentially be
> cleaned. The linear search through the ref_cache list caused a massive
> performance hit for large number of directories.
I'd prefer to see the "current state" described in the current
tense, e.g.
"git clean" uses resolve_gitlink_ref() to check for the presence of
nested git repositories, but it has the drawback of creating a
ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list causes a
massive performance hit for large number of directories.
> Teach clean.c:remove_dirs to use setup.c:is_git_directory
> instead. is_git_directory will actually open HEAD and parse the HEAD
> ref but this implies a nested git repository and should be rare when
> cleaning.
I am not sure what you wanted to say in this paragraph. What does
it being rare have to do with it? Even if it is not rare (i.e. the
top-level project you are working with has many submodules checked
out without using the more recent "a file .git pointing into
.git/modules/ via 'gitdir: $overThere'" mechanism), if we found a
nested git repository, we treat it as special and exclude it from
cleaning it out, which is a good thing, no?
Doesn't this implementation get confused by modern submodule
checkouts and descend into and clean their working tree, though?
Module M with path P would have a directory P in the working tree of
the top-level project, and P/.git is a regular file that will fail
"is_git_directory()" test but records the location of the real
submodule repository i.e. ".git/modules/M" via the "gitdir:"
mechanism.
> Using is_git_directory should give a more standardized check for what
> is and what isn't a git repository but also gives a slight behavioral
> change. We will now detect and respect bare and empty nested git
> repositories (only init run). Update t7300 to reflect this.
>
> The time to clean an untracked directory containing 100000 sub
> directories went from 61s to 1.7s after this change.
>
> Helped-by: Jeff King <[email protected]>
> Signed-off-by: Erik Elfström <[email protected]>
> ---
> builtin/clean.c | 24 ++++++++++++++++++++----
> t/t7300-clean.sh | 4 ++--
> 2 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/clean.c b/builtin/clean.c
> index 98c103f..b679913 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -10,7 +10,6 @@
> #include "cache.h"
> #include "dir.h"
> #include "parse-options.h"
> -#include "refs.h"
> #include "string-list.h"
> #include "quote.h"
> #include "column.h"
> @@ -148,6 +147,25 @@ static int exclude_cb(const struct option *opt, const
> char *arg, int unset)
> return 0;
> }
>
> +static int is_git_repository(struct strbuf *path)
> +{
> + int ret = 0;
> + if (is_git_directory(path->buf))
> + ret = 1;
> + else {
> + size_t orig_path_len = path->len;
> + assert(orig_path_len != 0);
> + if (path->buf[orig_path_len - 1] != '/')
> + strbuf_addch(path, '/');
> + strbuf_addstr(path, ".git");
> + if (is_git_directory(path->buf))
> + ret = 1;
> + strbuf_setlen(path, orig_path_len);
> + }
> +
> + return ret;
> +}
> +
> static int remove_dirs(struct strbuf *path, const char *prefix, int
> force_flag,
> int dry_run, int quiet, int *dir_gone)
> {
> @@ -155,13 +173,11 @@ static int remove_dirs(struct strbuf *path, const char
> *prefix, int force_flag,
> struct strbuf quoted = STRBUF_INIT;
> struct dirent *e;
> int res = 0, ret = 0, gone = 1, original_len = path->len, len;
> - unsigned char submodule_head[20];
> struct string_list dels = STRING_LIST_INIT_DUP;
>
> *dir_gone = 1;
>
> - if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
> - !resolve_gitlink_ref(path->buf, "HEAD",
> submodule_head)) {
> + if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
> is_git_repository(path)) {
> if (!quiet) {
> quote_path_relative(path->buf, prefix, "ed);
> printf(dry_run ? _(msg_would_skip_git_dir) :
> _(msg_skip_git_dir),
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index 58e6b4a..da294fe 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' '
> ! test -d bar
> '
>
> -test_expect_failure 'nested git (only init) should be kept' '
> +test_expect_success 'nested git (only init) should be kept' '
> rm -fr foo bar &&
> git init foo &&
> mkdir bar &&
> @@ -465,7 +465,7 @@ test_expect_failure 'nested git (only init) should be
> kept' '
> test_path_is_missing bar
> '
>
> -test_expect_failure 'nested git (bare) should be kept' '
> +test_expect_success 'nested git (bare) should be kept' '
> rm -fr foo bar &&
> git init --bare foo &&
> mkdir bar &&
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html