Re: [GSOC 2014]idea:Git Configuration API Improvement

2014-03-21 Thread Matthieu Moy
Mustafa Orkun Acar  writes:

> Hi, 
> I have completed my proposal about this project. But in one of the previous 
> emails; it says that the aim of the project is not storing configuration data 
> in the memory instead of making multiple git_config() calls. I
> also understand the project in this way. I need a clarification about it. 
> Thanks.

See my explanations at the bottom of

http://article.gmane.org/gmane.comp.version-control.git/244522

The goal _is_ to keep the configuration in memory, inside a single git
process. Not to maintain it in memory when the process dies (this would
require an additional daemon, which would be really overkill in our
case).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSOC 2014]idea:Git Configuration API Improvement

2014-03-21 Thread Matthieu Moy
Yao Zhao  writes:

> Moy, thanks for explaining. You said API should be hided. Is that
> means I should indicate an arbitary feature in old version or new
> feature we added should be linked to a manipulation of inner
> structure? And I need to find the connection to make this abstraction?

Sorry, I do not understand what you mean.

The new code should be backward compatible with the old one, that is:
existing code using git_config() should continue working. There are a
lot of git_config() calls in the codebase, and a GSoC won't have time to
change them all into something new.

This does not mean we can't add new features, both on the file parsing
side (add the ability to unset a key) and on the user API side (allow
getting the value of a key more easily).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: [GSOC 2014]idea:Git Configuration API Improvement

2014-03-20 Thread Yao Zhao
I think I misunderstand the project as Moy said. I thought the purpose of 
project is to store configuration into memory between multiple git calls. Sorry.

Moy, thanks for explaining. You said API should be hided. Is that means I 
should indicate an arbitary feature in old version or new feature we added 
should be linked to a manipulation of inner structure? And I need to find the 
connection to make this abstraction?

Besides maybe I should focus on code part more in my proposal, like which part 
should be changed for this project?



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSOC 2014]idea:Git Configuration API Improvement

2014-03-20 Thread Junio C Hamano
Matthieu Moy  writes:

> Why?
>
> (In general, explaining why you chose something is more important than
> explaining what you chose)

Good educational comment.  Thanks.

> A tree (AST, Abstract syntax tree) can be interesting if you have some
> source-to-source transformations to do on the configuration files (i.e.
> edit the config files themselves).
>
> For read-only accesses, I would find it more natural to have a
> data-structure that reflects the configuration variables themselves, not
> the way they appear in the config file.

... and one important thing that was left unsaid is that the
read-only accesses happen far more often than updates, so the data
structure must be optimized for the read-only look-up case.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSOC 2014]idea:Git Configuration API Improvement

2014-03-20 Thread Matthieu Moy
Hi,

Yao Zhao  writes:

> First is about when to start reading configuration file to cache. My
> idea is the time user starts call command that need configuration
> information (need to read configuration file).

I'd actually load the configuration lazily, when Git first requires a
configuration variable's value. Something like

int config_has_been_loaded = 0;

git_config() {
if (!config_has_been_loaded) {
load_config();
config_has_been_loaded = 1;
} else if (cache_is_outdated()) {
load_config();
} else { /* Nothing to do, we're good */ }
do_something_with_loaded_config();
}

> Second is about data structure. I read Peff's email listed on idea
> page. He indicated two methods and I prefer syntax tree.

Why?

(In general, explaining why you chose something is more important than
explaining what you chose)

> I think there should be three or more syntax tree in the cache. One
> for system, one for global and one for local. If user indicate a file
> to be configuration file, add one more tree. Or maybe we can build one
> tree and tag every node to indicate where it belongs to.

A tree (AST, Abstract syntax tree) can be interesting if you have some
source-to-source transformations to do on the configuration files (i.e.
edit the config files themselves).

For read-only accesses, I would find it more natural to have a
data-structure that reflects the configuration variables themselves, not
the way they appear in the config file. For example, a map (hashtable)
associating to each config variable the corresponding value (which may
be a scalar value or a list, depending on the variable).

But the really important part here is the API exposed to the user, not
the internal data-structure. A map would be "more efficient" (O(1) or
O(log(n)) access), but traversing the AST for each config request would
not really harm: this is currently what we're doing, except that we
currently re-parse the file each time. OTOH, the API should hide the AST
for most uses. If the user wants the value of configuration variable
"foo", the code to do that should not be much more complex than
get_value_for_config_variable("foo"). (well, I did oversimplify a bit
here).

> Third one is about when to write back to file, I am really confused
> about it. I think one way could be when user leave git repository
> using "cd" to go back. But I am not sure if git could detect user
> calls "cd" to leave repository.

There semes to be a misunderstanding here. The point of the project is
to have a per-process cache, but Git does not normally store a state in
memory between two calls. IOW, when you run

  git status
  cd ../
  git log

The call to "git status" creates a process, but the process dies before
you run "cd". The call to "git log" is a different process. It can
re-use things that "git status" left on disk, but not in-memory data
structures.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSOC 2014]idea:Git Configuration API Improvement

2014-03-20 Thread Michael Haggerty
On 03/20/2014 08:23 AM, Yao Zhao wrote:
> Third one is about when to write back to file, I am really confused
> about it. I think one way could be when user leave git repository
> using "cd" to go back. But I am not sure if git could detect user
> calls "cd" to leave repository.

I don't understand.  The cache would be in memory, and would only live
as long as a single "git" process.  Within that process, if somebody
wants to change the config, they might (for example) call one function
to lock the config file, a second function to change the value(s) in
memory, and then a third function to flush the new config out to disk
and unlock the config file again.  The cache would usually only live for
milliseconds, not minutes/hours, so I don't think your question really
makes sense.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GSOC 2014]idea:Git Configuration API Improvement

2014-03-20 Thread Yao Zhao
Hello, Michael, Matthieu and peff,

My name is Yao and I am interested in Git Configuration API Improvements listed 
in idea page in Git. I came up some ideas and really want to discuss them with 
you.

First is about when to start reading configuration file to cache. My idea is 
the time user starts call command that need configuration information (need to 
read configuration file).

Second is about data structure. I read Peff's email listed on idea page. He 
indicated two methods and I prefer syntax tree. I think there should be three 
or more syntax tree in the cache. One for system, one for global and one for 
local. If user indicate a file to be configuration file, add one more tree. Or 
maybe we can build one tree and tag every node to indicate where it belongs to.

Third one is about when to write back to file, I am really confused about it. I 
think one way could be when user leave git repository using "cd" to go back. 
But I am not sure if git could detect user calls "cd" to leave repository.

Thank you,

Yao
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html