Re: [PATCH/RFC 0/7] Pie-in-the-sky attempt to fix the early config

2016-12-09 Thread Jeff King
On Fri, Dec 09, 2016 at 06:28:10PM +0100, Johannes Schindelin wrote:

> > Great. Thanks for taking a stab at this.
> 
> Well, I figured that I can go through you to get this integrated into
> git.git.

I am not sure what you mean here, but it _sounds_ like you are
continuing to be negative about the chances of fixes going into git.git
here. I really don't think that negativity is merited, but even if it
is, making snide comments does not help and mostly just makes the
conversation less pleasant.

> > I don't think it would be bad to use a global for "we do not want a
> > repo".
> 
> I would think it would be bad, as the entire reason for this patch series
> is that we have global state that gets messed up too early (I am speaking
> from the point of view of somebody who patched Git locally so that it does
> read config variables *before* launching builtins).

I think those are two different things. One is global state that is
munged as a side effect of setup_git_directory(). The other is global
state that the process sets to say "this is an invariant" so it does not
have to deal with passing it through a huge call chain.

> > After all, it's just modifying the _existing_ global for "are we in a
> > repo".
> 
> No it does not.

Perhaps I wasn't clear on my "it" here. I mean that we will continue to
have startup_info->have_repository as a global (and if not that, then
certainly the environment variable GIT_DIR and the cwd are process
globals). I'm proposing a global flag to modify how to interpret those
globals. I don't think that really makes anything worse.

> The read_early_config() function specifically does not leave any traces in
> the global namespace. It calls the git_config_with_options() function
> without touching the_config_set.

I'm not talking about read_early_config() modifying the global state.
I'm talking about main() modifying it so that lower-level functions like
read_early_config() can make use of it.

> I would look more favorably on this idea if we were to teach
> the_config_set to record a little bit more about the state from which it
> was constructed, and to auto-flush-and-re-read when it detects that, say,
> git_dir was changed in the meantime.

I considered that when fixing some bugs in git-init a few months ago,
but I think the cure becomes worse than the problem. Automatic cache
invalidation can have some tricky corner cases, and there really
_aren't_ that many places where we need to do it. Just dropping
git_config_clear() in those places is ugly, but practical.

> > And I do not see that global going away anytime soon.
> 
> And that is really, really sad.

I think we differ on that.

> > As an alternative, I also think it would be OK to just always have the
> > pager config read from local repo, even for init/clone.
> 
> For the purpose of this current discussion, I am utterly uninterested in
> the pager config. What I want to use the early config for is substantially
> different and more relevant: I need to configure some command to run
> before, and after, every single Git command here. And this configuration
> needs to be per-repository. And no, I do not want to hardcode anything.

I do not see how that changes things. If there is the concept of a
"before the command is run" phase during which we would read from the
current-directory config even for git-init, it seems to me that applies
logically to pagers, aliases, _and_ whatever pre-command magic you are
interested in adding.

> > In other words, to loosen the idea that git-init can _never_ look in the
> > current git-dir, and declare that there is a stage before the command is
> > initiated, and during which git may read local-repo config. Aliases
> > would fall into this, too, so:
> > 
> >   git config --local alias.foo init
> >   git foo /some/other/dir
> > 
> > would work (as it must, because we cannot know that "foo" is "init"
> > until we read the config!).
> 
> True.
> 
> But is this a good excuse to just shrug our shoulders and let git-init
> (which we do know very well) fall into the same trap?

I'm not sure I agree it is a trap. There is a consistent and reasonable
mental model where it is the natural thing. I understand that's not the
one you prefer, but it seems like a practical one to me.

> > We already have a config-caching system. If we went with a global
> > "config_discover_refs",
> 
> Why "_refs"?

Er, sorry, slip of the tongue. I think I meant config_discover_git, and
for some reason managed to repeat it over and over. Hopefully you
figured out what I meant.

> > then I think the sequence for something like git-init would become:
> > 
> >   1. When git.c starts, config_discover_refs is set to "true". Pager and
> >  alias lookup may look in .git/config if it's available, even if
> >  they go through the configset cache.
> > 
> >   2. As soon as git-init starts, it disables config_discover_refs, and
> >  it flushes the config cache. Any configset lookups will now examine
> >  the reduced 

Re: [PATCH/RFC 0/7] Pie-in-the-sky attempt to fix the early config

2016-12-09 Thread Johannes Schindelin
Hi Peff,

On Thu, 8 Dec 2016, Jeff King wrote:

> On Thu, Dec 08, 2016 at 04:35:56PM +0100, Johannes Schindelin wrote:
> 
> > The idea here is to discover the .git/ directory gently (i.e. without
> > changing the current working directory), and to use it to read the
> > .git/config file early, before we actually called setup_git_directory()
> > (if we ever do that).
> 
> Great. Thanks for taking a stab at this.

Well, I figured that I can go through you to get this integrated into
git.git.

> > Notes:
> > 
> > - I find the diff pretty ugly: I wish there was a more elegant way to
> >   *disable* discovery of .git/ *just* for `init` and `clone`. I
> >   considered a function `about_to_create_git_dir()` that is called in a
> >   hard-coded manner *only* for `init` and `clone`, but that would
> >   introduce another magic side effect, when all I want is to reduce those.
> 
> It looks like a lot of that ugliness comes from passing around the "are
> we OK to peek at git-dir config" flag through the various pager-related
> calls.

Correct.

> I don't think it would be bad to use a global for "we do not want a
> repo".

I would think it would be bad, as the entire reason for this patch series
is that we have global state that gets messed up too early (I am speaking
from the point of view of somebody who patched Git locally so that it does
read config variables *before* launching builtins).

> After all, it's just modifying the _existing_ global for "are we in a
> repo".

No it does not.

The read_early_config() function specifically does not leave any traces in
the global namespace. It calls the git_config_with_options() function
without touching the_config_set.

Of course we could introduce a new config set, just for early use. It
would share all the downsides with the current config set.

I would look more favorably on this idea if we were to teach
the_config_set to record a little bit more about the state from which it
was constructed, and to auto-flush-and-re-read when it detects that, say,
git_dir was changed in the meantime.

> And I do not see that global going away anytime soon.

And that is really, really sad.

> Sometimes it's good to make incremental steps towards an end goal, but
> in this case it seems like we just pay the cost of the step without any
> real plan for reaping the benefit in the long term.

Fine. Let's roll with this for now, then.

I just hope that we do not repeat the "slam the door shut to a better
solution" mistake that led to this situation, as the chdir() way did.

> As an alternative, I also think it would be OK to just always have the
> pager config read from local repo, even for init/clone.

For the purpose of this current discussion, I am utterly uninterested in
the pager config. What I want to use the early config for is substantially
different and more relevant: I need to configure some command to run
before, and after, every single Git command here. And this configuration
needs to be per-repository. And no, I do not want to hardcode anything.

So now the cat is out of the bag, I have ulterior motives.

These motives serve a greater good, though: designing read_early_config()
around a very specific use case will always fall short of a well-designed
read_early_config() that could then be used for any use case that
requires, well, reading the config early.

> In other words, to loosen the idea that git-init can _never_ look in the
> current git-dir, and declare that there is a stage before the command is
> initiated, and during which git may read local-repo config. Aliases
> would fall into this, too, so:
> 
>   git config --local alias.foo init
>   git foo /some/other/dir
> 
> would work (as it must, because we cannot know that "foo" is "init"
> until we read the config!).

True.

But is this a good excuse to just shrug our shoulders and let git-init
(which we do know very well) fall into the same trap?

> > - For the moment, I do not handle dashed invocations of `init` and
> >   `clone` correctly. The real problem is the continued existence of
> >   the dashed git-init and git-clone, of course.
> 
> I assume you mean setting the CREATES_GIT_DIR flag here? I think it
> would apply to the dashed invocations, too. They strip off the "git-"
> and end up in handle_builtin() just like "git clone" does. I didn't test
> it, though.

You are correct. I misread the code to expect it to spawn another instance
of git.exe.

> > - There is still duplicated code. For the sake of this RFC, I did not
> >   address that yet.
> 
> Yeah, I did not read your discover function very carefully. Because I
> think the one thing we really don't want to do here is introduce a
> separate lookup process that is not shared by setup_git_directory(). The
> only sane path, I think, is to refactor setup_git_directory() to build
> on top of discover_git_directory(), which implies that the latter
> handles all of the cases.

This could maybe happen later, but for now I do not concern myself with
building the 

Re: [PATCH/RFC 0/7] Pie-in-the-sky attempt to fix the early config

2016-12-09 Thread Johannes Schindelin
Hi Duy,

On Fri, 9 Dec 2016, Duy Nguyen wrote:

> On Thu, Dec 8, 2016 at 10:35 PM, Johannes Schindelin
>  wrote:
> > Hopefully these patches will lead to something that we can integrate,
> > and that eventually will make Git's startup sequence much less
> > surprising.
> 
> What did it surprise you with?

I mentioned a couple of WTFs elsewhere:

- that it changes the working directory (personally, I am not surprised,
  just because I was exposed to the Git source code for 10+ years,
  however, I know developers who do not share that experience and find that
  feature... interesting)

- that there are multiple code paths to do essentially the same thing, but
  in an incompatible manner: setup_git_directory() and setup_git_env()
  (and possibly more)

- that some environment variables are set, and need to be unset (or even
  possibly reset to no-longer-known values) to allow subprocesses to
  access different repositories

- that git_path() can be used before setup_git_directory() without
  failure, but wreaks havoc with subsequent git_config() calls.

- that the setup_git_directory() cascade is completely oblivious of
  whatever config has already been read (possibly requiring clearing)

- that setup_git_directory() *returns* the prefix, even stores it in
  startup_info, but a subsequent call to setup_git_directory() returns
  NULL

- that as a consequence, builtins that require the prefix to work if there
  is any, but do not necessarily require a repository to begin with, think
  `git diff`, *must not* be marked with RUN_SETUP_GENTLY

- that check_repository_format() sets have_repository=1

- that setup_git_directory() is a oneliner, calling
  setup_git_directory_gently(NULL), when it would be more logical to
  simply make setup_git_directory() accept the "nongit_ok" parameter

- that setup_git_directory_gently() is not at all gentle when the
  parameter is NULL

- that resolve_gitdir() does not, in fact, resolve any git directory, but
  only tests a specified path whether it refers to a .git/ directory or to
  a .git file

- that resolve_gitdir() actually tests for a .git *file*

- that resolve_gitdir() is not used in setup_git_directory_gently_1()

- that resolve_gitdir() tries the order directory,file and
  setup_git_directory_gently_1() tries the opposite order

- that the handling of the ceiling directories is hardcoded into the
  setup_git_directory_gently_1() function

- that a ceiling directory of /home/hello/world does not prevent Git from
  looking at /home/hello/world/.git/

- that canonicalize_ceiling_entry()' relationship to ceiling directories
  is rather coincidental when the name suggests that it is very specific

- that canonicalize_ceiling_entry() does not, in fact, canonicalize
  non-absolute entries

Need I go on? I could, you know...

> I can see that I disrespect the ceiling directory setting, perhaps
> that's that.

No, I actually see a lot of good reasons for the ceiling directories.

Ciao,
Dscho


Re: [PATCH/RFC 0/7] Pie-in-the-sky attempt to fix the early config

2016-12-09 Thread Duy Nguyen
On Thu, Dec 8, 2016 at 10:35 PM, Johannes Schindelin
 wrote:
> Hopefully these patches will lead to something that we can integrate,
> and that eventually will make Git's startup sequence much less
> surprising.

What did it surprise you with? Just curious. I can see that I
disrespect the ceiling directory setting, perhaps that's that.
-- 
Duy


Re: [PATCH/RFC 0/7] Pie-in-the-sky attempt to fix the early config

2016-12-08 Thread Jeff King
On Thu, Dec 08, 2016 at 04:35:56PM +0100, Johannes Schindelin wrote:

> The idea here is to discover the .git/ directory gently (i.e. without
> changing the current working directory), and to use it to read the
> .git/config file early, before we actually called setup_git_directory()
> (if we ever do that).

Great. Thanks for taking a stab at this.

> Notes:
> 
> - I find the diff pretty ugly: I wish there was a more elegant way to
>   *disable* discovery of .git/ *just* for `init` and `clone`. I
>   considered a function `about_to_create_git_dir()` that is called in a
>   hard-coded manner *only* for `init` and `clone`, but that would
>   introduce another magic side effect, when all I want is to reduce those.

It looks like a lot of that ugliness comes from passing around the "are
we OK to peek at git-dir config" flag through the various pager-related
calls.  I don't think it would be bad to use a global for "we do not
want a repo".  After all, it's just modifying the _existing_ global for
"are we in a repo". And I do not see that global going away anytime
soon. Sometimes it's good to make incremental steps towards an end goal,
but in this case it seems like we just pay the cost of the step without
any real plan for reaping the benefit in the long term.

As an alternative, I also think it would be OK to just always have the
pager config read from local repo, even for init/clone. In other words,
to loosen the idea that git-init can _never_ look in the current
git-dir, and declare that there is a stage before the command is
initiated, and during which git may read local-repo config. Aliases
would fall into this, too, so:

  git config --local alias.foo init
  git foo /some/other/dir

would work (as it must, because we cannot know that "foo" is "init"
until we read the config!).

I have a feeling you may declare that too magical, but I think it's
consistent and practical.

> - For the moment, I do not handle dashed invocations of `init` and
>   `clone` correctly. The real problem is the continued existence of
>   the dashed git-init and git-clone, of course.

I assume you mean setting the CREATES_GIT_DIR flag here? I think it
would apply to the dashed invocations, too. They strip off the "git-"
and end up in handle_builtin() just like "git clone" does. I didn't test
it, though.

> - There is still duplicated code. For the sake of this RFC, I did not
>   address that yet.

Yeah, I did not read your discover function very carefully. Because I
think the one thing we really don't want to do here is introduce a
separate lookup process that is not shared by setup_git_directory(). The
only sane path, I think, is to refactor setup_git_directory() to build
on top of discover_git_directory(), which implies that the latter
handles all of the cases.

> - The read_early_config() function is called multiple times, re-reading
>   all the config files and re-discovering the .git/ directory multiple
>   times, which is quite wasteful. For the sake of this RFC, I did not
>   address that yet.

We already have a config-caching system. If we went with a global
"config_discover_refs", then I think the sequence for something like
git-init would become:

  1. When git.c starts, config_discover_refs is set to "true". Pager and
 alias lookup may look in .git/config if it's available, even if
 they go through the configset cache.

  2. As soon as git-init starts, it disables config_discover_refs, and
 it flushes the config cache. Any configset lookups will now examine
 the reduced config.

  3. When git-init has set up the real repo it is operating on, it can
 reenable config_discover_refs (though it may not even need to; that
 flag probably wouldn't have any effect once we've entered the
 repository and have_git_dir() returns true).

> - t7006 fails and the error message is a bit cryptic (not to mention the
>   involved function trace, which is mind-boggling for what is supposed
>   to be a simply, shell script-based test suite). I did not have time to
>   look into that yet.

Running t7006 I see a lot of old failures turned into successes, which
is good (because running from a subdirectory now actually respects local
pager config). The one failure looks like it is testing the wrong thing.

It is checking that we _don't_ respect a local core.pager in some cases,
as a proxy for whether or not we are breaking things by doing setup too
early. But the whole point of your series is to fix that, and respect
the config without causing the setup breakage. After your patches, the
proxy behavior and the failure case are disconnected. The test should be
flipped, and ideally another one added that confirms we didn't actually
run setup_git_directory(), but I'm not sure how to test that directly.

> - after discover_git_directory_gently() did its work, the code happily
>   uses its result *only* for the current read_early_config() run, and
>   lets setup_git_dir_gently() do the whole work *again*. For the sake of
>   

[PATCH/RFC 0/7] Pie-in-the-sky attempt to fix the early config

2016-12-08 Thread Johannes Schindelin
Hopefully these patches will lead to something that we can integrate,
and that eventually will make Git's startup sequence much less
surprising.

The idea here is to discover the .git/ directory gently (i.e. without
changing the current working directory), and to use it to read the
.git/config file early, before we actually called setup_git_directory()
(if we ever do that).

Notes:

- I find the diff pretty ugly: I wish there was a more elegant way to
  *disable* discovery of .git/ *just* for `init` and `clone`. I
  considered a function `about_to_create_git_dir()` that is called in a
  hard-coded manner *only* for `init` and `clone`, but that would
  introduce another magic side effect, when all I want is to reduce those.

- For the moment, I do not handle dashed invocations of `init` and
  `clone` correctly. The real problem is the continued existence of
  the dashed git-init and git-clone, of course.

- There is still duplicated code. For the sake of this RFC, I did not
  address that yet.

- The read_early_config() function is called multiple times, re-reading
  all the config files and re-discovering the .git/ directory multiple
  times, which is quite wasteful. For the sake of this RFC, I did not
  address that yet.

- t7006 fails and the error message is a bit cryptic (not to mention the
  involved function trace, which is mind-boggling for what is supposed
  to be a simply, shell script-based test suite). I did not have time to
  look into that yet.

- after discover_git_directory_gently() did its work, the code happily
  uses its result *only* for the current read_early_config() run, and
  lets setup_git_dir_gently() do the whole work *again*. For the sake of
  this RFC, I did not address that yet.


Johannes Schindelin (7):
  Make read_early_config() reusable
  read_early_config(): avoid .git/config hack when unneeded
  Mark builtins that create .git/ directories
  read_early_config(): special-case `init` and `clone`
  read_early_config(): really discover .git/
  WIP read_config_early(): respect ceiling directories
  WIP: read_early_config(): add tests

 builtin/am.c|   2 +-
 builtin/blame.c |   2 +-
 builtin/grep.c  |   4 +-
 builtin/log.c   |   4 +-
 builtin/var.c   |   2 +-
 cache.h |   8 ++--
 config.c| 110 
 diff.c  |   4 +-
 git.c   |  25 +--
 pager.c |  44 +++
 t/helper/test-config.c  |  15 +++
 t/t1309-early-config.sh |  50 ++
 12 files changed, 209 insertions(+), 61 deletions(-)
 create mode 100755 t/t1309-early-config.sh


base-commit: 8d7a455ed52e2a96debc080dfc011b6bb00db5d2
Published-As: https://github.com/dscho/git/releases/tag/early-config-v1
Fetch-It-Via: git fetch https://github.com/dscho/git early-config-v1

-- 
2.11.0.rc3.windows.1