Re: [PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery
Hi Peff, On Thu, 2 Mar 2017, Jeff King wrote: > On Fri, Mar 03, 2017 at 03:04:11AM +0100, Johannes Schindelin wrote: > > > For historical reasons, Git searches for the .git/ directory (or the > > .git file) by changing the working directory successively to the > > parent directory of the current directory, until either anything was > > found or until a ceiling or a mount point is hit. > > This is starting to get into the meat of changes we've been putting off > writing for ages. I'll read with my fingers crossed. :) Heh. > > Further global state may be changed, depending on the actual type of > > discovery, e.g. the global variable > > `repository_format_precious_objects` is set in the > > `check_repository_format_gently()` function (which is a bit > > surprising, given the function name). > > It's gentle in the sense that if it does not find a valid repo, it > touches no state. I do suspect the functions you want are the ones it > builds on: read_repository_format() and verify_repository_format(), > which I added not too long ago for the exact purpose of checking repo > config without touching anything global. Okay. But I think that my interpretation of the word "gently" is as valid as Git's source code's. > > We do have a use case where we would like to find the .git/ directory > > without having any global state touched, though: when we read the early > > config e.g. for the pager or for alias expansion. > > > > Let's just rename the function `setup_git_directory_gently_1()` to > > `discover_git_directory()` and move all code that changes any global > > state back into `setup_git_directory_gently()`. > > Given the earlier paragraph, it sounds like you want to move the > global-state-changing parts out of check_repository_format_gently(). But > that wouldn't be right; it's triggered separate from the discovery > process by things like enter_repo(). Oh, right. I really only meant to move the global-state-changing parts out of the discover_git_directory(). > However, I don't see that happening in the patch, which is good. I just > wonder if the earlier paragraph should really be complaining about how > setup_git_directory() (and its callees) touches a lot of global state, > not check_repository_format_gently(), whose use is but one of multiple > global-state modifications. Okay, I'll try my best to rephrase the commit message. For good measure, I will also keep the name setup_git_directory_gently_1() because it won't get exported directly (I made up my mind about wrapping that function to allow for an easier interface that does *not* return the "cdup"). > > Note: the new loop is a *little* tricky, as we have to handle the root > > directory specially: we cannot simply strip away the last component > > including the slash, as the root directory only has that slash. To > > remedy that, we introduce the `min_offset` variable that holds the > > minimal length of an absolute path, and using that to special-case the > > root directory, including an early exit before trying to find the > > parent of the root directory. > > I wondered how t1509 fared with this, as it is the only test of > repositories at the root directory (and it is not run by default because > it involves a bunch of flaky and expensive chroot setup). Oh, thanks. I allowed myself to forget about that test script (and did a lot of testing by hand, but of course *during* the development of v2, not when I had finished...). > Unfortunately it seems to fail with your patch: > > expecting success: > test_cmp_val "foo/" "$(git rev-parse --show-prefix)" > > --- expected > +++ result > @@ -1 +1 @@ > -foo/ > +oo/ > not ok 66 - auto gitdir, foo: prefix I can reproduce this failure here. Side note: it took a while until I realized that the prepare-chroot.sh script has to be run *every time* I change *anything* in either Git's source code or in the test script. > Could the problem be this: > > > + int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : > > 1; > > [...] > > - if (ceil_offset < 0 && has_dos_drive_prefix(cwd.buf)) > > - ceil_offset = 1; > > + if (ceil_offset < 0) > > + ceil_offset = min_offset - 2; Yes. The previous code did not need cwd.buf[0..offset] to be a valid path, but it needed the offset to point to the trailing slash, if any. > Interestingly, I don't think this is the bug, though. We still correctly > find /.git as a repository. The problem seems to happen later, in > setup_discovered_git_dir(), which does this: > > /* Make "offset" point to past the '/', and add a '/' at the end */ > offset++; > strbuf_addch(cwd, '/'); > return cwd->buf + offset; I fixed this by ensuring that we only increment the offset if it is not already pointing at the end of the first offset (which handles Windows paths correctly, too). > Other than this bug, I very much like the direction that this patch is > taking us. Awesome. I was anxious to
Re: [PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery
On Fri, Mar 03, 2017 at 03:04:11AM +0100, Johannes Schindelin wrote: > For historical reasons, Git searches for the .git/ directory (or the > .git file) by changing the working directory successively to the parent > directory of the current directory, until either anything was found or > until a ceiling or a mount point is hit. This is starting to get into the meat of changes we've been putting off writing for ages. I'll read with my fingers crossed. :) > Further global state may be changed, depending on the actual type of > discovery, e.g. the global variable `repository_format_precious_objects` > is set in the `check_repository_format_gently()` function (which is a > bit surprising, given the function name). It's gentle in the sense that if it does not find a valid repo, it touches no state. I do suspect the functions you want are the ones it builds on: read_repository_format() and verify_repository_format(), which I added not too long ago for the exact purpose of checking repo config without touching anything global. > We do have a use case where we would like to find the .git/ directory > without having any global state touched, though: when we read the early > config e.g. for the pager or for alias expansion. > > Let's just rename the function `setup_git_directory_gently_1()` to > `discover_git_directory()` and move all code that changes any global > state back into `setup_git_directory_gently()`. Given the earlier paragraph, it sounds like you want to move the global-state-changing parts out of check_repository_format_gently(). But that wouldn't be right; it's triggered separate from the discovery process by things like enter_repo(). However, I don't see that happening in the patch, which is good. I just wonder if the earlier paragraph should really be complaining about how setup_git_directory() (and its callees) touches a lot of global state, not check_repository_format_gently(), whose use is but one of multiple global-state modifications. > Note: the new loop is a *little* tricky, as we have to handle the root > directory specially: we cannot simply strip away the last component > including the slash, as the root directory only has that slash. To remedy > that, we introduce the `min_offset` variable that holds the minimal length > of an absolute path, and using that to special-case the root directory, > including an early exit before trying to find the parent of the root > directory. I wondered how t1509 fared with this, as it is the only test of repositories at the root directory (and it is not run by default because it involves a bunch of flaky and expensive chroot setup). Unfortunately it seems to fail with your patch: expecting success: test_cmp_val "foo/" "$(git rev-parse --show-prefix)" --- expected +++ result @@ -1 +1 @@ -foo/ +oo/ not ok 66 - auto gitdir, foo: prefix Could the problem be this: > + int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : > 1; > [...] > - if (ceil_offset < 0 && has_dos_drive_prefix(cwd.buf)) > - ceil_offset = 1; > + if (ceil_offset < 0) > + ceil_offset = min_offset - 2; It works the same as before in the dos-drive case, but we'd get ceil_offset = -1 otherwise. Which seems weird, but maybe I don't understand how ceil_offset is supposed to work. Interestingly, I don't think this is the bug, though. We still correctly find /.git as a repository. The problem seems to happen later, in setup_discovered_git_dir(), which does this: /* Make "offset" point to past the '/', and add a '/' at the end */ offset++; strbuf_addch(cwd, '/'); return cwd->buf + offset; Here, "offset" is the length of the working tree path. The root directory case differs from normal in that "offset" already accounts for the trailing slash. So I think the bug comes from: > - ret = setup_discovered_git_dir(gitdirenv, > -&cwd, offset, > -nongit_ok); > [...] > + prefix = setup_discovered_git_dir(gitdir.buf, &cwd, dir.len, > + nongit_ok); The original knew that "offset" took into account the off-by-one for the root, but that's lost when we use dir.len. I haven't studied it enough to know the best solution, but I suspect you will. Other than this bug, I very much like the direction that this patch is taking us. -Peff
[PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery
For historical reasons, Git searches for the .git/ directory (or the .git file) by changing the working directory successively to the parent directory of the current directory, until either anything was found or until a ceiling or a mount point is hit. Further global state may be changed, depending on the actual type of discovery, e.g. the global variable `repository_format_precious_objects` is set in the `check_repository_format_gently()` function (which is a bit surprising, given the function name). We do have a use case where we would like to find the .git/ directory without having any global state touched, though: when we read the early config e.g. for the pager or for alias expansion. Let's just rename the function `setup_git_directory_gently_1()` to `discover_git_directory()` and move all code that changes any global state back into `setup_git_directory_gently()`. In subsequent patches, we will export the `discover_git_directory()` function and make use of it. Note: the new loop is a *little* tricky, as we have to handle the root directory specially: we cannot simply strip away the last component including the slash, as the root directory only has that slash. To remedy that, we introduce the `min_offset` variable that holds the minimal length of an absolute path, and using that to special-case the root directory, including an early exit before trying to find the parent of the root directory. Signed-off-by: Johannes Schindelin --- setup.c | 187 ++-- 1 file changed, 112 insertions(+), 75 deletions(-) diff --git a/setup.c b/setup.c index 89a0cef9231..edac3c27dc1 100644 --- a/setup.c +++ b/setup.c @@ -816,50 +816,49 @@ static int canonicalize_ceiling_entry(struct string_list_item *item, } } +enum discovery_result { + GIT_DIR_NONE = 0, + GIT_DIR_EXPLICIT, + GIT_DIR_DISCOVERED, + GIT_DIR_BARE, + /* these are errors */ + GIT_DIR_HIT_CEILING = -1, + GIT_DIR_HIT_MOUNT_POINT = -2 +}; + /* * We cannot decide in this function whether we are in the work tree or * not, since the config can only be read _after_ this function was called. + * + * Also, we avoid changing any global state (such as the current working + * directory) to allow early callers. + * + * The directory where the search should start needs to be passed in via the + * `dir` parameter; upon return, the `dir` buffer will contain the path of + * the directory where the search ended, and `gitdir` will contain the path of + * the discovered .git/ directory, if any. This path may be relative against + * `dir` (i.e. *not* necessarily the cwd). */ -static const char *setup_git_directory_gently_1(int *nongit_ok) +static enum discovery_result discover_git_directory(struct strbuf *dir, + struct strbuf *gitdir) { const char *env_ceiling_dirs = getenv(CEILING_DIRECTORIES_ENVIRONMENT); struct string_list ceiling_dirs = STRING_LIST_INIT_DUP; - static struct strbuf cwd = STRBUF_INIT; - const char *gitdirenv, *ret; - char *gitfile; - int offset, offset_parent, ceil_offset = -1; + const char *gitdirenv; + int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : 1; dev_t current_device = 0; int one_filesystem = 1; /* -* We may have read an incomplete configuration before -* setting-up the git directory. If so, clear the cache so -* that the next queries to the configuration reload complete -* configuration (including the per-repo config file that we -* ignored previously). -*/ - git_config_clear(); - - /* -* Let's assume that we are in a git repository. -* If it turns out later that we are somewhere else, the value will be -* updated accordingly. -*/ - if (nongit_ok) - *nongit_ok = 0; - - if (strbuf_getcwd(&cwd)) - die_errno(_("Unable to read current working directory")); - offset = cwd.len; - - /* * If GIT_DIR is set explicitly, we're not going * to do any discovery, but we still do repository * validation. */ gitdirenv = getenv(GIT_DIR_ENVIRONMENT); - if (gitdirenv) - return setup_explicit_git_dir(gitdirenv, &cwd, nongit_ok); + if (gitdirenv) { + strbuf_addstr(gitdir, gitdirenv); + return GIT_DIR_EXPLICIT; + } if (env_ceiling_dirs) { int empty_entry_found = 0; @@ -867,15 +866,15 @@ static const char *setup_git_directory_gently_1(int *nongit_ok) string_list_split(&ceiling_dirs, env_ceiling_dirs, PATH_SEP, -1); filter_string_list(&ceiling_dirs, 0, canonicalize_ceiling_entry, &empty_entry_found); - ceil_offset = longest_ancestor_length(cwd.buf,