Re: RFC GSoC idea: new git config features

2014-03-14 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote:

 Jeff King p...@peff.net writes:
 
  If we had the keys in-memory, we could reverse this: config code asks
  for keys it cares about, and we can do an optimized lookup (binary
  search, hash, etc).
 
 I'm actually dreaming of a system where a configuration variable could
 be declared in Git's source code, with associated type (list/single
 value, boolean/string/path/...), default value and documentation (and
 then Documentation/config.txt could become a generated file). One could
 imagine a lot of possibilities like

 Yes, I think something like that would be very nice. ...
 ...
 Migrating the whole code to such system would take time, but creating
 the system and applying it to a few examples might be feasible as a GSoC
 project.

 Agreed, as long as we have enough examples to feel confident that the
 infrastructure is sufficient.

I agree that it would give us a lot of enhancement opportunities if
we had a central catalog of what the supported configuration
variables are and what semantics (e.g. type, multi-value-ness, etc.)
they have.

One thing we need to be careful about is that we still must support
random configuration items that git-core does not care about at all
but scripts (and future versions of git-core) read off of, though.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-03-13 Thread Jeff King
On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote:

 Jeff King p...@peff.net writes:
 
  If we had the keys in-memory, we could reverse this: config code asks
  for keys it cares about, and we can do an optimized lookup (binary
  search, hash, etc).
 
 I'm actually dreaming of a system where a configuration variable could
 be declared in Git's source code, with associated type (list/single
 value, boolean/string/path/...), default value and documentation (and
 then Documentation/config.txt could become a generated file). One could
 imagine a lot of possibilities like

Yes, I think something like that would be very nice. I am not a big
fan of code generation, but if we had config queries like
config_get_bool, then I think it would be reasonably pleasant to take
a spec like:

  Key: help.browser
  Type: string
  Description: Specify the browser for help...

and turn it into:

  const char *config_get_help_browser(void)
  {
  return config_get_string(help.browser);
  }

So technically code generation, but all the heavy lifting is done behind
the scenes. We're not saving lines in the result so much as avoiding
repeating ourselves (that is, the generated code is only mapping the
config-type from the spec into a C type and function name that gives us
extra compile-time safety).

However, I skimmed through config.txt looking for a key to use in my
example above, and there are a surprising number of one-off semantics
(e.g., things that are mostly bool, but can be auto or take some other
special value). We may find that the Type field has a surprising
number of variants that makes a technique like this annoying. But I'd
reserve judgement until somebody actually tries encoding a significant
chunk of the config keys and we see what it looks like.

 Migrating the whole code to such system would take time, but creating
 the system and applying it to a few examples might be feasible as a GSoC
 project.

Agreed, as long as we have enough examples to feel confident that the
infrastructure is sufficient.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-03-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Most callbacks would convert to a query system in a pretty
 straightforward way, but some that have side effects might be tricky.
 Converting them all may be too large for a GSoC project, but I think you
 could do it gradually:

   1. Convert the parser to read into an in-memory representation, but
  leave git_config() as a wrapper which iterates over it.

   2. Add query functions like config_string_get() above.

   3. Convert callbacks to query functions one by one.

   4. Eventually drop git_config().

 A GSoC project could take us partway through (3).

I actually discarded the read from these config files to preparsed
structure to memory, later to be consumed by repeated calls to the
git_config() callback functions, making the only difference from the
current scheme that the preparsed structure will be reset when there
is the new 'reset to the original' definition as obvious and
uninteresting.

This is one of these times that I find myself blessed with capable
others that can go beyond, building on top of such an idea that I
may have discarded without thinking it through, around me ;-)

Yes, the new abstraction like config_type_get() that can live
alongside the existing git_config() feeds callback chain
everything and gradually replace the latter, would be a good way
forward.  Given that we read configuration multiple times anyway for
different purposes, even without the new abstraction, the end result
might perform better if we read the files once and reused in later
calls to git_config().

Thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-03-01 Thread Matthieu Moy
Jeff King p...@peff.net writes:

 If we had the keys in-memory, we could reverse this: config code asks
 for keys it cares about, and we can do an optimized lookup (binary
 search, hash, etc).

I'm actually dreaming of a system where a configuration variable could
be declared in Git's source code, with associated type (list/single
value, boolean/string/path/...), default value and documentation (and
then Documentation/config.txt could become a generated file). One could
imagine a lot of possibilities like

$ git config --describe some-variable
Type: boolean
Default value: true
Description: ...

Somehow, do for config variables what has been done for command-line
option parsing.

Migrating the whole code to such system would take time, but creating
the system and applying it to a few examples might be feasible as a GSoC
project.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-02-28 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 I just wrote up another double-idea that has been stewing in my head for
 a while:

 * Allow configuration values to be unset via a config file
 * Fix git config --unset to clean up detritus from sections that are
 left empty.

The former is *way* too large for a GSoC project.  Most
configuration variables are meant to be read sequencially and affect
in-core variables directly, like

/* file-scope global */
static int frotz = -1;  /* unset */

static int parse_config_frotz(const char *key, const char *value, void 
*cb)
{
if (!strcmp(key, core.frotz))
frotz = git_config_int(value);
return 0;
}

... and somewhere ...
git_config(parse_config_frotz, NULL);

The config parsers are distributed and there is no single registry
that knows how in-core variables owned by each subsystem represent
an unset value.  In the above example, -1 is such a sentinel
value, but in some other contexts, the subsystem may choose to use
INT_MAX.  The only way to allow resetting to previous is to

 (1) come up with a way to pass this key is being reset to
 'unspecified' to existing git_config() callback functions
 (like parse_config_frotz() in the above illustration), which
 may or may not involve changing the function signature of the
 callbacks;

 (2) go through all the git_config() callback functions and make
 them understand the new reset to 'unspecified' convention.

which may not sound too bad at the first glance (especially, the
first one is almost trivial).

But the side effects these callbacks may cause are not limited to
setting a simple scaler variable (like 'frotz' in the illustration)
but would include things that are hard to undo once done
(e.g. calling a set-up function with a lot of side effects).

The latter, on the other hand, should be a change that is of a
fairly limited scope, and would be a good fit for a GSoC project
(incidentally, it has been one of the items on my leftover-bits list
http://git-blame.blogspot.com/p/leftover-bits.html for quite some
time).

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-02-28 Thread Michael Haggerty
On 02/28/2014 09:00 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 I just wrote up another double-idea that has been stewing in my head for
 a while:

 * Allow configuration values to be unset via a config file
 * Fix git config --unset to clean up detritus from sections that are
 left empty.
 
 The former is *way* too large for a GSoC project.  Most
 configuration variables are meant to be read sequencially and affect
 in-core variables directly, like
 
 /* file-scope global */
   static int frotz = -1;  /* unset */
 
 static int parse_config_frotz(const char *key, const char *value, 
 void *cb)
   {
   if (!strcmp(key, core.frotz))
   frotz = git_config_int(value);
   return 0;
   }
 
   ... and somewhere ...
   git_config(parse_config_frotz, NULL);
 
 The config parsers are distributed and there is no single registry
 that knows how in-core variables owned by each subsystem represent
 an unset value.  In the above example, -1 is such a sentinel
 value, but in some other contexts, the subsystem may choose to use
 INT_MAX.  The only way to allow resetting to previous is to
 
  (1) come up with a way to pass this key is being reset to
  'unspecified' to existing git_config() callback functions
  (like parse_config_frotz() in the above illustration), which
  may or may not involve changing the function signature of the
  callbacks;
 
  (2) go through all the git_config() callback functions and make
  them understand the new reset to 'unspecified' convention.

I absolutely understand that changing all of the config parsers is not
feasible.  But I had imagined a third route:

(3) parse the config once, storing the raw values to records in
memory.  When an unset is seen, delete any previous records that
have accumulated for that key.  After the whole config has been
read, iterate through the records, feeding the surviving values
into the callback in the order they were originally read (minus
deletions).

Do you see any problems with this way of implementing the functionality
(aside from slightly increased overhead)?

And once we have a way to store config records in memory, it might also
make sense to reuse the parsed values for later config inquiries (after
checking that the files have not changed since the last read).  After
this second step the net performance change might even be advantageous.

 which may not sound too bad at the first glance (especially, the
 first one is almost trivial).
 
 But the side effects these callbacks may cause are not limited to
 setting a simple scaler variable (like 'frotz' in the illustration)
 but would include things that are hard to undo once done
 (e.g. calling a set-up function with a lot of side effects).
 
 The latter, on the other hand, should be a change that is of a
 fairly limited scope, and would be a good fit for a GSoC project
 (incidentally, it has been one of the items on my leftover-bits list
 http://git-blame.blogspot.com/p/leftover-bits.html for quite some
 time).

But only the latter part would be a bit meager as a GSoC project, don't
you think?

Thanks for the feedback.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new git config features

2014-02-28 Thread Jeff King
On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote:

 I absolutely understand that changing all of the config parsers is not
 feasible.  But I had imagined a third route:
 
 (3) parse the config once, storing the raw values to records in
 memory.  When an unset is seen, delete any previous records that
 have accumulated for that key.  After the whole config has been
 read, iterate through the records, feeding the surviving values
 into the callback in the order they were originally read (minus
 deletions).
 
 Do you see any problems with this way of implementing the functionality
 (aside from slightly increased overhead)?

Yeah, this is something I have considered many times. It does have some
overhead, but the existing system is not great either. As you noted, we
often read the config several times for a given program invocation.

But moreover, we linearly strcmp each config key we find against each
one we know about. In some cases we return early if a sub-function is
looking for `starts_with(key, foo.)`, but in most cases we just look
for foo.bar, foo.baz, and so on.

If we had the keys in-memory, we could reverse this: config code asks
for keys it cares about, and we can do an optimized lookup (binary
search, hash, etc).

That also makes many constructs easier to express. Recently we had a
problem where the parsing order of remote.pushdefault and
branch.*.pushremote mattered, because they were read into the same
variable. The solution is to use two variables and reconcile them after
all config is read. But if you can just query the config subsystem
directly, the boilerplate of reading them into strings goes away, and
you can just do:

  x = config_string_getf(branch.%s.pushremote, current_branch);
  if (!x)
  x = config_string_get(remote.pushdefault);
  if (!x)
  x = config_string_getf(branch.%s.remote, current_branch);
  if (!x)
  x = origin;

As it is now, the code that does this has a lot more boilerplate, and is
split across several functions.

Another example is the way we have to manage deferred config in
git-status (see 84b4202). This might be more clear if we could simply
`config_get_bool(status.branch)` at the right moment.

The talk of efficiency is probably a red-herring here. I do not think
config-reading is a significant source of slow-down in the current code.
But I'd be in favor of something that reduced boilerplate and made the
code easier to read.

  But the side effects these callbacks may cause are not limited to
  setting a simple scaler variable (like 'frotz' in the illustration)
  but would include things that are hard to undo once done
  (e.g. calling a set-up function with a lot of side effects).

Most callbacks would convert to a query system in a pretty
straightforward way, but some that have side effects might be tricky.
Converting them all may be too large for a GSoC project, but I think you
could do it gradually:

  1. Convert the parser to read into an in-memory representation, but
 leave git_config() as a wrapper which iterates over it.

  2. Add query functions like config_string_get() above.

  3. Convert callbacks to query functions one by one.

  4. Eventually drop git_config().

A GSoC project could take us partway through (3).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html