Re: [PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery

2017-03-03 Thread Johannes Schindelin
Hi Peff,

On Thu, 2 Mar 2017, Jeff King wrote:

> On Fri, Mar 03, 2017 at 03:04:11AM +0100, Johannes Schindelin wrote:
> 
> > For historical reasons, Git searches for the .git/ directory (or the
> > .git file) by changing the working directory successively to the
> > parent directory of the current directory, until either anything was
> > found or until a ceiling or a mount point is hit.
> 
> This is starting to get into the meat of changes we've been putting off
> writing for ages. I'll read with my fingers crossed. :)

Heh.

> > Further global state may be changed, depending on the actual type of
> > discovery, e.g. the global variable
> > `repository_format_precious_objects` is set in the
> > `check_repository_format_gently()` function (which is a bit
> > surprising, given the function name).
> 
> It's gentle in the sense that if it does not find a valid repo, it
> touches no state. I do suspect the functions you want are the ones it
> builds on: read_repository_format() and verify_repository_format(),
> which I added not too long ago for the exact purpose of checking repo
> config without touching anything global.

Okay. But I think that my interpretation of the word "gently" is as valid
as Git's source code's.

> > We do have a use case where we would like to find the .git/ directory
> > without having any global state touched, though: when we read the early
> > config e.g. for the pager or for alias expansion.
> > 
> > Let's just rename the function `setup_git_directory_gently_1()` to
> > `discover_git_directory()` and move all code that changes any global
> > state back into `setup_git_directory_gently()`.
> 
> Given the earlier paragraph, it sounds like you want to move the
> global-state-changing parts out of check_repository_format_gently(). But
> that wouldn't be right; it's triggered separate from the discovery
> process by things like enter_repo().

Oh, right. I really only meant to move the global-state-changing parts out
of the discover_git_directory().

> However, I don't see that happening in the patch, which is good. I just
> wonder if the earlier paragraph should really be complaining about how
> setup_git_directory() (and its callees) touches a lot of global state,
> not check_repository_format_gently(), whose use is but one of multiple
> global-state modifications.

Okay, I'll try my best to rephrase the commit message.

For good measure, I will also keep the name setup_git_directory_gently_1()
because it won't get exported directly (I made up my mind about wrapping
that function to allow for an easier interface that does *not* return the
"cdup").

> > Note: the new loop is a *little* tricky, as we have to handle the root
> > directory specially: we cannot simply strip away the last component
> > including the slash, as the root directory only has that slash. To
> > remedy that, we introduce the `min_offset` variable that holds the
> > minimal length of an absolute path, and using that to special-case the
> > root directory, including an early exit before trying to find the
> > parent of the root directory.
> 
> I wondered how t1509 fared with this, as it is the only test of
> repositories at the root directory (and it is not run by default because
> it involves a bunch of flaky and expensive chroot setup).

Oh, thanks. I allowed myself to forget about that test script (and did a
lot of testing by hand, but of course *during* the development of v2, not
when I had finished...).

> Unfortunately it seems to fail with your patch:
> 
>   expecting success: 
>   test_cmp_val "foo/" "$(git rev-parse --show-prefix)"
>   
>   --- expected
>   +++ result
>   @@ -1 +1 @@
>   -foo/
>   +oo/
>   not ok 66 - auto gitdir, foo: prefix

I can reproduce this failure here.

Side note: it took a while until I realized that the prepare-chroot.sh
script has to be run *every time* I change *anything* in either Git's
source code or in the test script.

> Could the problem be this:
> 
> > +   int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : 
> > 1;
> > [...]
> > -   if (ceil_offset < 0 && has_dos_drive_prefix(cwd.buf))
> > -   ceil_offset = 1;
> > +   if (ceil_offset < 0)
> > +   ceil_offset = min_offset - 2;

Yes. The previous code did not need cwd.buf[0..offset] to be a valid path,
but it needed the offset to point to the trailing slash, if any.

> Interestingly, I don't think this is the bug, though. We still correctly
> find /.git as a repository. The problem seems to happen later, in
> setup_discovered_git_dir(), which does this:
> 
>   /* Make "offset" point to past the '/', and add a '/' at the end */
>   offset++;
>   strbuf_addch(cwd, '/');
>   return cwd->buf + offset;

I fixed this by ensuring that we only increment the offset if it is not
already pointing at the end of the first offset (which handles Windows
paths correctly, too).

> Other than this bug, I very much like the direction that this patch is
> taking us.

Awesome. I was anxious to 

Re: [PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery

2017-03-02 Thread Jeff King
On Fri, Mar 03, 2017 at 03:04:11AM +0100, Johannes Schindelin wrote:

> For historical reasons, Git searches for the .git/ directory (or the
> .git file) by changing the working directory successively to the parent
> directory of the current directory, until either anything was found or
> until a ceiling or a mount point is hit.

This is starting to get into the meat of changes we've been putting off
writing for ages. I'll read with my fingers crossed. :)

> Further global state may be changed, depending on the actual type of
> discovery, e.g. the global variable `repository_format_precious_objects`
> is set in the `check_repository_format_gently()` function (which is a
> bit surprising, given the function name).

It's gentle in the sense that if it does not find a valid repo, it
touches no state. I do suspect the functions you want are the ones it
builds on: read_repository_format() and verify_repository_format(),
which I added not too long ago for the exact purpose of checking repo
config without touching anything global.

> We do have a use case where we would like to find the .git/ directory
> without having any global state touched, though: when we read the early
> config e.g. for the pager or for alias expansion.
> 
> Let's just rename the function `setup_git_directory_gently_1()` to
> `discover_git_directory()` and move all code that changes any global
> state back into `setup_git_directory_gently()`.

Given the earlier paragraph, it sounds like you want to move the
global-state-changing parts out of check_repository_format_gently(). But
that wouldn't be right; it's triggered separate from the discovery
process by things like enter_repo().

However, I don't see that happening in the patch, which is good. I just
wonder if the earlier paragraph should really be complaining about how
setup_git_directory() (and its callees) touches a lot of global state,
not check_repository_format_gently(), whose use is but one of multiple
global-state modifications.

> Note: the new loop is a *little* tricky, as we have to handle the root
> directory specially: we cannot simply strip away the last component
> including the slash, as the root directory only has that slash. To remedy
> that, we introduce the `min_offset` variable that holds the minimal length
> of an absolute path, and using that to special-case the root directory,
> including an early exit before trying to find the parent of the root
> directory.

I wondered how t1509 fared with this, as it is the only test of
repositories at the root directory (and it is not run by default because
it involves a bunch of flaky and expensive chroot setup). Unfortunately
it seems to fail with your patch:

  expecting success: 
test_cmp_val "foo/" "$(git rev-parse --show-prefix)"

  --- expected
  +++ result
  @@ -1 +1 @@
  -foo/
  +oo/
  not ok 66 - auto gitdir, foo: prefix

Could the problem be this:

> + int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : 
> 1;
> [...]
> - if (ceil_offset < 0 && has_dos_drive_prefix(cwd.buf))
> - ceil_offset = 1;
> + if (ceil_offset < 0)
> + ceil_offset = min_offset - 2;

It works the same as before in the dos-drive case, but we'd get
ceil_offset = -1 otherwise.  Which seems weird, but maybe I don't
understand how ceil_offset is supposed to work.

Interestingly, I don't think this is the bug, though. We still correctly
find /.git as a repository. The problem seems to happen later, in
setup_discovered_git_dir(), which does this:

  /* Make "offset" point to past the '/', and add a '/' at the end */
  offset++;
  strbuf_addch(cwd, '/');
  return cwd->buf + offset;

Here, "offset" is the length of the working tree path. The root
directory case differs from normal in that "offset" already accounts for
the trailing slash.

So I think the bug comes from:

> - ret = setup_discovered_git_dir(gitdirenv,
> -&cwd, offset,
> -nongit_ok);
> [...]
> + prefix = setup_discovered_git_dir(gitdir.buf, &cwd, dir.len,
> +   nongit_ok);

The original knew that "offset" took into account the off-by-one for the
root, but that's lost when we use dir.len. I haven't studied it enough
to know the best solution, but I suspect you will.

Other than this bug, I very much like the direction that this patch is
taking us.

-Peff


[PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery

2017-03-02 Thread Johannes Schindelin
For historical reasons, Git searches for the .git/ directory (or the
.git file) by changing the working directory successively to the parent
directory of the current directory, until either anything was found or
until a ceiling or a mount point is hit.

Further global state may be changed, depending on the actual type of
discovery, e.g. the global variable `repository_format_precious_objects`
is set in the `check_repository_format_gently()` function (which is a
bit surprising, given the function name).

We do have a use case where we would like to find the .git/ directory
without having any global state touched, though: when we read the early
config e.g. for the pager or for alias expansion.

Let's just rename the function `setup_git_directory_gently_1()` to
`discover_git_directory()` and move all code that changes any global
state back into `setup_git_directory_gently()`.

In subsequent patches, we will export the `discover_git_directory()`
function and make use of it.

Note: the new loop is a *little* tricky, as we have to handle the root
directory specially: we cannot simply strip away the last component
including the slash, as the root directory only has that slash. To remedy
that, we introduce the `min_offset` variable that holds the minimal length
of an absolute path, and using that to special-case the root directory,
including an early exit before trying to find the parent of the root
directory.

Signed-off-by: Johannes Schindelin 
---
 setup.c | 187 ++--
 1 file changed, 112 insertions(+), 75 deletions(-)

diff --git a/setup.c b/setup.c
index 89a0cef9231..edac3c27dc1 100644
--- a/setup.c
+++ b/setup.c
@@ -816,50 +816,49 @@ static int canonicalize_ceiling_entry(struct 
string_list_item *item,
}
 }
 
+enum discovery_result {
+   GIT_DIR_NONE = 0,
+   GIT_DIR_EXPLICIT,
+   GIT_DIR_DISCOVERED,
+   GIT_DIR_BARE,
+   /* these are errors */
+   GIT_DIR_HIT_CEILING = -1,
+   GIT_DIR_HIT_MOUNT_POINT = -2
+};
+
 /*
  * We cannot decide in this function whether we are in the work tree or
  * not, since the config can only be read _after_ this function was called.
+ *
+ * Also, we avoid changing any global state (such as the current working
+ * directory) to allow early callers.
+ *
+ * The directory where the search should start needs to be passed in via the
+ * `dir` parameter; upon return, the `dir` buffer will contain the path of
+ * the directory where the search ended, and `gitdir` will contain the path of
+ * the discovered .git/ directory, if any. This path may be relative against
+ * `dir` (i.e. *not* necessarily the cwd).
  */
-static const char *setup_git_directory_gently_1(int *nongit_ok)
+static enum discovery_result discover_git_directory(struct strbuf *dir,
+   struct strbuf *gitdir)
 {
const char *env_ceiling_dirs = getenv(CEILING_DIRECTORIES_ENVIRONMENT);
struct string_list ceiling_dirs = STRING_LIST_INIT_DUP;
-   static struct strbuf cwd = STRBUF_INIT;
-   const char *gitdirenv, *ret;
-   char *gitfile;
-   int offset, offset_parent, ceil_offset = -1;
+   const char *gitdirenv;
+   int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : 
1;
dev_t current_device = 0;
int one_filesystem = 1;
 
/*
-* We may have read an incomplete configuration before
-* setting-up the git directory. If so, clear the cache so
-* that the next queries to the configuration reload complete
-* configuration (including the per-repo config file that we
-* ignored previously).
-*/
-   git_config_clear();
-
-   /*
-* Let's assume that we are in a git repository.
-* If it turns out later that we are somewhere else, the value will be
-* updated accordingly.
-*/
-   if (nongit_ok)
-   *nongit_ok = 0;
-
-   if (strbuf_getcwd(&cwd))
-   die_errno(_("Unable to read current working directory"));
-   offset = cwd.len;
-
-   /*
 * If GIT_DIR is set explicitly, we're not going
 * to do any discovery, but we still do repository
 * validation.
 */
gitdirenv = getenv(GIT_DIR_ENVIRONMENT);
-   if (gitdirenv)
-   return setup_explicit_git_dir(gitdirenv, &cwd, nongit_ok);
+   if (gitdirenv) {
+   strbuf_addstr(gitdir, gitdirenv);
+   return GIT_DIR_EXPLICIT;
+   }
 
if (env_ceiling_dirs) {
int empty_entry_found = 0;
@@ -867,15 +866,15 @@ static const char *setup_git_directory_gently_1(int 
*nongit_ok)
string_list_split(&ceiling_dirs, env_ceiling_dirs, PATH_SEP, 
-1);
filter_string_list(&ceiling_dirs, 0,
   canonicalize_ceiling_entry, 
&empty_entry_found);
-   ceil_offset = longest_ancestor_length(cwd.buf,