Jeff King <p...@peff.net> writes: > The get_ref_cache code was designed to scale to the actual number of > submodules. I do not mind seeing it become a hash if people really do > have a large number of submodules, but that is not what is happening > here. > ... > So git-clean speculatively asks "what is HEAD in this maybe-submodule?". The > right solution is probably one of: > > 1. In remove_dirs, find out if we have an actual submodule before calling > resolve_gitlink_ref. > > 2. Teach get_ref_cache a "read-only" mode that will not auto-vivify the > cache > if it does not already exist. > > Of the two, I think (1) is probably cleaner (I think the way the ref > code is structured, we have to create the submodule ref_cache in order > to start looking things up in it).
Thanks for a great analysis. I too wondered if we should be growing the per-submodule ref-cache when we are only probing. > It looks like we don't even really care about the value of HEAD. We just > want to know "is it a git directory?". I think in other places (like > "git add"), we just do an existence check for "$dir/.git". That would > not catch a bare repository, but I do not think the current check does > either (it is looking for submodules, which always have a .git). If we wanted to be consistent, perhaps we should be reusing the "is this a git repository?" check used by the auto-discovery codepath (setup.c:is_git_directory(), perhaps?), but the idea looks simple enough and sounds sensible. > Maybe something like (largely untested): > > diff --git a/builtin/clean.c b/builtin/clean.c > index 98c103f..e2cc47b 100644 > --- a/builtin/clean.c > +++ b/builtin/clean.c > @@ -148,6 +148,32 @@ static int exclude_cb(const struct option *opt, const > char *arg, int unset) > return 0; > } > > +static int dir_is_repo(struct strbuf *path) > +{ > + size_t orig = path->len; > + int ret; > + > + strbuf_addstr(path, "/.git"); > + if (!access(path->buf, F_OK)) > + ret = 1; /* definitely */ > + else if (errno == ENOENT) > + ret = 0; /* definitely not */ > + else { > + /* > + * We couldn't tell. It would probably be safer to err > + * on the side of saying "yes" here, because we are > + * deciding what to delete, and are more likely to keep > + * a sub-repo. But it would probably also create annoying > + * false positives, where a directory we do not have > + * permission to read would say something misleading > + * like "not deleting sub-repo foo..." > + */ > + ret = 0; > + } > + strbuf_setlen(path, orig); > + return ret; > +} > + > static int remove_dirs(struct strbuf *path, const char *prefix, int > force_flag, > int dry_run, int quiet, int *dir_gone) > { > @@ -155,13 +181,11 @@ static int remove_dirs(struct strbuf *path, const char > *prefix, int force_flag, > struct strbuf quoted = STRBUF_INIT; > struct dirent *e; > int res = 0, ret = 0, gone = 1, original_len = path->len, len; > - unsigned char submodule_head[20]; > struct string_list dels = STRING_LIST_INIT_DUP; > > *dir_gone = 1; > > - if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && > - !resolve_gitlink_ref(path->buf, "HEAD", > submodule_head)) { > + if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && dir_is_repo(path)) { > if (!quiet) { > quote_path_relative(path->buf, prefix, "ed); > printf(dry_run ? _(msg_would_skip_git_dir) : > _(msg_skip_git_dir), -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html