Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Junio C Hamano gits...@pobox.com writes: After thinking about it a bit more, I think file, line support needs to be done not as a mere client of the generic, uncached git_config() API that we have had for forever, like the current iteration, but needs to know a bit more about the state the caller of the callback (namely, git_parse_source()), and we obviously do not want to expose that machinery to anybody other than the implementation of the config subsystem (to which the new facility this GSoC project adds belongs to), so in that sense you have to have your code in the same config.c file anyway. I do not buy the argument. Why would callers of the callback-style API not be allowed to give file, line errors? To me, it is a weakness of the API that file, line information is not available to callback functions, regardless of the config-cache. It is somewhat dissapointing that this initial implementation was done as a client of the traditional git_config(), by the way. I would have expected that it would be the other way around, in that the traditional callers of git_config() would behefit automatically by having the cache layer below it. I disagree, and I agree ;-). I disagree that it is disapointing to use git_config(), and I think the beauty of the current implementation is to allow this cache mechanism without changing the parsing code. I agree that the callers of git_config() could benefit from the cache mechanism, i.e. use the in-memory pre-parsed config instead of re-parsing the file each time. But we have to start from somewhere. Perhaps the round after this one can rename the exisiting implementation of git_config() to something else internal to the caching layer and give the existing callers a compatible interface that is backed by this new caching layer in a transparent way. In PATCH v4, there was a git_config_iter function that did exactly that. I didn't notice it was gone for v5, but could be rather easily resurected. I suggest, on top of the current series: PATCH 3 : (re)introduce git_config_iter, compatible with git_config (one variant with a configset argument, another working on the_config_set). PATCH 4 : rename git_config to e.g. git_config_parse, and git_config_iter to git_config. Then, check that tests still pass (PATCH 4 automatically uses the new code essentially in every place where Git deals with config, so existing tests will start to stress the code a lot more), and check with e.g. strace -e open git ... that config files are now parsed only once where they used to be parsed multiple times. I'd do that as a separate series, to let the current one finally reach pu. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Tanay Abhra tanay...@gmail.com writes: Noted, still I want to add that even when GSoC finishes, I won't leave the work unfinished. I had said that I wanted to continue being part of the Git community and I mean that. :) This is a good thing, but you shouldn't rely on it for your GSoC. After the GSoC finishes, you will have much less time for Git. I have to decide on what to do next on moving the contents to config.c or not. Seeing Junio's comments on the topic seems that he wasn't convinced in the first place that we needed a new file. What should we do, as I am sending the revised patch tomorrow? The move will be trivial, just cutting and pasting the contents. Other approaches you mentioned are also doable but, after a certain amount of retooling. I am open to any method you think would be best. No strong opinion from me here. I like splitting stuff into relatively small files, and to me it makes sense to keep the parsing code and the caching code separate (although config-hash.c is no longer a good name, config-set.c or config-cache.c would be better). But I can for sure live with both in the same file. I guess you'll have to make the decision if others don't give better argument ;-). -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Matthieu Moy matthieu@grenoble-inp.fr writes: Junio C Hamano gits...@pobox.com writes: After thinking about it a bit more, I think file, line support needs to be done not as a mere client of the generic, uncached git_config() API that we have had for forever, like the current iteration, but needs to know a bit more about the state the caller of the callback (namely, git_parse_source()), and we obviously do not want to expose that machinery to anybody other than the implementation of the config subsystem (to which the new facility this GSoC project adds belongs to), so in that sense you have to have your code in the same config.c file anyway. I do not buy the argument. Why would callers of the callback-style API not be allowed to give file, line errors? To me, it is a weakness of the API that file, line information is not available to callback functions, regardless of the config-cache. I agree that it is good to allow scan-all-config-items-with-callback callers to learn file, line, but that is irrelevant. Perhaps you misread what I envision as the endgame of this thing to be and that is where our differences come from. One thing I think would be good to see in the endgame will be to give the benefit of the caching layer to the callback callers without having them change a single line of their code. One possible sequence of changes to make it happen would go like this, roughly in this order: - rename the current git_config() to git_config_raw(), and make it static to the config.c. - write a thin wrapper git_config() around git_config_raw() in config.c, until caching layer learns the iterator. - write caching layer to read from git_config_raw(). - teach git_config_raw() feed its callback functions to learn the file, line information. git_configset_get_type can then start using this in their error reporting. - implement iterator in the caching layer. - rewrite git_config() that was a thin wrapper around git_config_raw() by using the iterator over the cached values. - (optional) think about ways to give file, line information to the callback style callers. Perhaps we need git_config_2(). Perhaps we can rewrite all callers of git_config(). We do not have to decide it now. Between git_parse_source() and git_config_raw() we would need to pass the line-number information, but there is no reason for us to make public (all of these will be implementation details of the config system, including the config caching). My answer to why shouldn't the callbacks have file, line information? is there is no reason why they shouldn't. It is a good addition in the long run. But the optionality of the last step in the above list makes it an irrelevant question. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 1/2] add `config_set` API for caching config-like files
Currently `git_config()` uses a callback mechanism and file rereads for config values. Due to this approach, it is not uncommon for the config files to be parsed several times during the run of a git program, with different callbacks picking out different variables useful to themselves. Add a `config_set`, that can be used to construct an in-memory cache for config-like files that the caller specifies (i.e., files like `.gitmodules`, `~/.gitconfig` etc.). Add two external functions `git_configset_get_value` and `git_configset_get_value_multi` for querying from the config sets. `git_configset_get_value` follows `last one wins` semantic (i.e. if there are multiple matches for the queried key in the files of the configset the value returned will be the last entry in `value_list`). `git_configset_get_value_multi` returns a list of values sorted in order of increasing priority (i.e. last match will be at the end of the list). Add type specific query functions like `git_configset_get_bool` and similar. Add a default `config_set`, `the_config_set` to cache all key-value pairs read from usual config files (repo specific .git/config, user wide ~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set` is populated using `git_config()`. Add two external functions `git_config_get_value` and `git_config_get_value_multi` for querying in a non-callback manner from `the_config_set`. Also, add type specific query functions that are implemented as a thin wrapper around the `config_set` API. Signed-off-by: Tanay Abhra tanay...@gmail.com --- Documentation/technical/api-config.txt | 134 +++ Makefile | 1 + cache.h| 34 config-hash.c | 295 + config.c | 3 + 5 files changed, 467 insertions(+) create mode 100644 config-hash.c diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt index 230b3a0..65a6717 100644 --- a/Documentation/technical/api-config.txt +++ b/Documentation/technical/api-config.txt @@ -77,6 +77,81 @@ To read a specific file in git-config format, use `git_config_from_file`. This takes the same callback and data parameters as `git_config`. +Querying For Specific Variables +--- + +For programs wanting to query for specific variables in a non-callback +manner, the config API provides two functions `git_config_get_value` +and `git_config_get_value_multi`. They both read values from an internal +cache generated previously from reading the config files. + +`int git_config_get_value(const char *key, const char **value)`:: + + Finds the highest-priority value for the configuration variable `key`, + stores the pointer to it in `value` and returns 0. When the + configuration variable `key` is not found, returns 1 without touching + `value`. The caller should not free or modify `value`, as it is owned + by the cache. + +`const struct string_list *git_config_get_value_multi(const char *key)`:: + + Finds and returns the value list, sorted in order of increasing priority + for the configuration variable `key`. When the configuration variable + `key` is not found, returns NULL. The caller should not free or modify + the returned pointer, as it is owned by the cache. + +`void git_config_clear(void)`:: + + Resets and invalidates the config cache. + +The config API also provides type specific API functions which do conversion +as well as retrieval for the queried variable, including: + +`int git_config_get_int(const char *key, int *dest)`:: + + Finds and parses the value to an integer for the configuration variable + `key`. Dies on error; otherwise, stores the value of the parsed integer in + `dest` and returns 0. When the configuration variable `key` is not found, + returns 1 without touching `dest`. + +`int git_config_get_ulong(const char *key, unsigned long *dest)`:: + + Similar to `git_config_get_int` but for unsigned longs. + +`int git_config_get_bool(const char *key, int *dest)`:: + + Finds and parses the value into a boolean value, for the configuration + variable `key`respecting keywords like true and false. Integer + values are converted into true/false values (when they are non-zero or + zero, respectively). Other values cause a die(). If parsing is successful, + stores the value of the parsed result in `dest` and returns 0. When the + configuration variable `key` is not found, returns 1 without touching + `dest`. + +`int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)`:: + + Similar to `git_config_get_bool`, except that integers are copied as-is, + and `is_bool` flag is unset. + +`int git_config_get_maybe_bool(const char *key, int *dest)`:: + + Similar to `git_config_get_bool`, except
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Tanay Abhra tanay...@gmail.com writes: diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt index 230b3a0..65a6717 100644 --- a/Documentation/technical/api-config.txt +++ b/Documentation/technical/api-config.txt +`int git_config_get_bool(const char *key, int *dest)`:: + + Finds and parses the value into a boolean value, for the configuration + variable `key`respecting keywords like true and false. Integer Missing space after ` = badly formatted in HTML. I didn't find any other formatting error, but you may double-check with cd Documentation/ make technical/api-config.html firefox technical/api-config.html -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
On 7/9/2014 5:42 PM, Matthieu Moy wrote: Tanay Abhra tanay...@gmail.com writes: diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt index 230b3a0..65a6717 100644 --- a/Documentation/technical/api-config.txt +++ b/Documentation/technical/api-config.txt +`int git_config_get_bool(const char *key, int *dest)`:: + +Finds and parses the value into a boolean value, for the configuration +variable `key`respecting keywords like true and false. Integer Missing space after ` = badly formatted in HTML. I didn't find any other formatting error, but you may double-check with cd Documentation/ make technical/api-config.html firefox technical/api-config.html Sorry for the wrong version number on the cover letter. Thanks, for the review, I will check it ASAP. Also, I tried implementing Junio's request about saving the line number and the file name for each key-value pair. I had some success, but with some changes, 1. config-hash.c had to be shifted to config.c entirely. 2. string_list util pointer had to be used for each value. I am appending the diff below, the only changes from version 7 is that, a new struct 'key_source' has been added and `config_hash_add_value` has been altered a little. For debugging `git_configset_get_value` prints the line number and file name for each value. What are your views about it, too late for inclusion? too hackey or contrived? I has started on a fresh page for it, but I soon saw that I was reimplementing what was already available in git_config(), so I took this path. One more thing, Karsten's string-intern API can be used for saving file names as they are repeated a lot. -- 8 -- diff --git a/config.c b/config.c index a1aef1c..697ec1c 100644 --- a/config.c +++ b/config.c @@ -9,6 +9,8 @@ #include exec_cmd.h #include strbuf.h #include quote.h +#include string-list.h +#include hashmap.h struct config_source { struct config_source *prev; @@ -33,10 +35,314 @@ struct config_source { long (*do_ftell)(struct config_source *c); }; +struct config_hash_entry { + struct hashmap_entry ent; + char *key; + struct string_list value_list; +}; + +struct key_source { + const char *filename; + int linenr; +}; + static struct config_source *cf; static int zlib_compression_seen; +/* + * Default config_set that contains key-value pairs from the usual set of config + * config files (i.e repo specific .git/config, user wide ~/.gitconfig, XDG + * config file and the global /etc/gitconfig) + */ +static struct config_set the_config_set; + +static int config_hash_add_value(const char *, const char *, struct hashmap *); + +static int config_hash_entry_cmp(const struct config_hash_entry *e1, +const struct config_hash_entry *e2, const void *unused) +{ + return strcmp(e1-key, e2-key); +} + +static void configset_init(struct config_set *cs) +{ + if (!cs-hash_initialized) { + hashmap_init(cs-config_hash, (hashmap_cmp_fn)config_hash_entry_cmp, 0); + cs-hash_initialized = 1; + } +} + +static int config_hash_callback(const char *key, const char *value, void *cb) +{ + struct config_set *cs = cb; + config_hash_add_value(key, value, cs-config_hash); + return 0; +} + +int git_configset_add_file(struct config_set *cs, const char *filename) +{ + int ret = 0; + configset_init(cs); + ret = git_config_from_file(config_hash_callback, filename, cs); + if (!ret) + return 0; + else { + hashmap_free(cs-config_hash, 1); + cs-hash_initialized = 0; + return -1; + } +} + +static struct config_hash_entry *config_hash_find_entry(const char *key, + struct hashmap *config_hash) +{ + struct config_hash_entry k; + struct config_hash_entry *found_entry; + char *normalized_key; + int ret; + /* +* `key` may come from the user, so normalize it before using it +* for querying entries from the hashmap. +*/ + ret = git_config_parse_key(key, normalized_key, NULL); + + if (ret) + return NULL; + + hashmap_entry_init(k, strhash(normalized_key)); + k.key = normalized_key; + found_entry = hashmap_get(config_hash, k, NULL); + free(normalized_key); + return found_entry; +} + +static struct string_list *configset_get_list(const char *key, struct config_set *cs) +{ + struct config_hash_entry *e = config_hash_find_entry(key, cs-config_hash); + return e ? e-value_list : NULL; +} + +static int config_hash_add_value(const char *key, const char *value, struct hashmap *config_hash) +{ + struct config_hash_entry *e; + struct string_list_item *si; + struct key_source *ks = xmalloc(sizeof(*e)); + e = config_hash_find_entry(key,
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Tanay Abhra tanay...@gmail.com writes: Also, I tried implementing Junio's request about saving the line number and the file name for each key-value pair. I had some success, but with some changes, My opinion on this: * It's low priority. I think the order of priority should be (high to low) 1) Finish and get the current series into pu (we're almost there, it's not time to back off and restart something new). 2) Work on the other series that uses the new API. We don't need to change _all_ the uses, but we do need a few tens of them to validate the fact that the new API is nice and convenient to use. 3) Get new actual features for the user (tidy up config files, give error messages based on numbers). You probably won't finish everything, so just think: if the GSoC ends between 1) and 2), how serious is it? if it ends between 2) and 3), how serious? If reverse the order, then the risk of having nothing finished and mergeable at the end is high. If it happens, the community may or may not take over and finish it ... * Still, making sure that the (file, line) is doable later without too much change is good. So, if indeed, moving all code to another file is required, then it may make sense to do it now to avoid code move later. 1. config-hash.c had to be shifted to config.c entirely. Why? I guess one reason is the use of struct cf (are there others?), but moving just config_hash_callback config_hash_add_value git_configset_add_file to config.c should do it. Then, config.c could use config-hash.c. But a cleaner way would be to allow the callback to receive the config_file struct without accessing it as a global variable (currently, we have no way to parse two config files in parallel for example). In practice, it should be possible to pass a 4th pointer argument to the callback, and keep compatibility with 3 arguments callback (having too many arguments in not a problem will all ABI I know), but I'm don't think this is allowed in theory. On overall, I'm not convinced we should move the code. When the argument I need to merge these two things otherwise it doesn't compile comes in a discussion, it usually means there's an architecture issue somewhere ;-). One more thing, Karsten's string-intern API can be used for saving file names as they are repeated a lot. This, or storing the list of filenames in struct config_set (more or less as you did in previous patch), and store pointers to the same string in each config_hash_entry. But strdup-ing all filenames seems a bit heavy to me. Even though we're not talking about performance-critical part of Git, I don't like the idea of wasting too much ;-). -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Matthieu Moy matthieu@grenoble-inp.fr writes: My opinion on this: * It's low priority. I think the order of priority should be (high to low) 1) Finish and get the current series into pu (we're almost there, it's not time to back off and restart something new). 2) Work on the other series that uses the new API. We don't need to change _all_ the uses, but we do need a few tens of them to validate the fact that the new API is nice and convenient to use. 3) Get new actual features for the user (tidy up config files, give error messages based on numbers). You probably won't finish everything, so just think: if the GSoC ends between 1) and 2), how serious is it? if it ends between 2) and 3), how serious? If reverse the order, then the risk of having nothing finished and mergeable at the end is high. If it happens, the community may or may not take over and finish it ... * Still, making sure that the (file, line) is doable later without too much change is good. So, if indeed, moving all code to another file is required, then it may make sense to do it now to avoid code move later. Good thinking. As long as the code is prepared, it is a good idea to start small and bite off only we can chew at once, do things incrementally. 1. config-hash.c had to be shifted to config.c entirely. Why? I guess one reason is the use of struct cf (are there others?), but moving just config_hash_callback config_hash_add_value git_configset_add_file to config.c should do it. Then, config.c could use config-hash.c. I am not sure why you guys needed a new file config-hash.c to begin with, actually. Besides, hash in its name is an implementation detail (what it gives us is a way to grab values for configuration variables from a config set) which we would rather not want to see. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
On 7/9/2014 7:49 PM, Matthieu Moy wrote: Tanay Abhra tanay...@gmail.com writes: My opinion on this: * It's low priority. I think the order of priority should be (high to low) 1) Finish and get the current series into pu (we're almost there, it's not time to back off and restart something new). Noted. I will send the revised series tomorrow ASAP. 2) Work on the other series that uses the new API. We don't need to change _all_ the uses, but we do need a few tens of them to validate the fact that the new API is nice and convenient to use. Okay, I have updated the series and will send the new one by friday. Still, I am aiming to rewrite at least half of total calls next weeks when (1) is in pu. 3) Get new actual features for the user (tidy up config files, give error messages based on numbers). You probably won't finish everything, so just think: if the GSoC ends between 1) and 2), how serious is it? if it ends between 2) and 3), how serious? If reverse the order, then the risk of having nothing finished and mergeable at the end is high. If it happens, the community may or may not take over and finish it ... Noted, still I want to add that even when GSoC finishes, I won't leave the work unfinished. I had said that I wanted to continue being part of the Git community and I mean that. :) * Still, making sure that the (file, line) is doable later without too much change is good. So, if indeed, moving all code to another file is required, then it may make sense to do it now to avoid code move later. Yes, the only problem I see is that the future readers of config.c might read two versions of the git_config_type helpers and may get confused, as they have similar names as git_config_*() git_config_get_*(). That was the reason in the first place that I moved the code into a new file. 1. config-hash.c had to be shifted to config.c entirely. Why? I guess one reason is the use of struct cf (are there others?), but Nope, just for using struct cf. All old API functions raise error by accessing cf globally for the file name and line number. moving just config_hash_callback config_hash_add_value git_configset_add_file to config.c should do it. Then, config.c could use config-hash.c. But a cleaner way would be to allow the callback to receive the config_file struct without accessing it as a global variable (currently, we have no way to parse two config files in parallel for example). In practice, it should be possible to pass a 4th pointer argument to the callback, and keep compatibility with 3 arguments callback (having too many arguments in not a problem will all ABI I know), but I'm don't think this is allowed in theory. On overall, I'm not convinced we should move the code. When the argument I need to merge these two things otherwise it doesn't compile comes in a discussion, it usually means there's an architecture issue somewhere ;-). I have to decide on what to do next on moving the contents to config.c or not. Seeing Junio's comments on the topic seems that he wasn't convinced in the first place that we needed a new file. What should we do, as I am sending the revised patch tomorrow? The move will be trivial, just cutting and pasting the contents. Other approaches you mentioned are also doable but, after a certain amount of retooling. I am open to any method you think would be best. Cheers, Tanay Abhra. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/2] add `config_set` API for caching config-like files
Junio C Hamano gits...@pobox.com writes: * Still, making sure that the (file, line) is doable later without too much change is good. So, if indeed, moving all code to another file is required, then it may make sense to do it now to avoid code move later. Good thinking. As long as the code is prepared, it is a good idea to start small and bite off only we can chew at once, do things incrementally. After thinking about it a bit more, I think file, line support needs to be done not as a mere client of the generic, uncached git_config() API that we have had for forever, like the current iteration, but needs to know a bit more about the state the caller of the callback (namely, git_parse_source()), and we obviously do not want to expose that machinery to anybody other than the implementation of the config subsystem (to which the new facility this GSoC project adds belongs to), so in that sense you have to have your code in the same config.c file anyway. It is somewhat dissapointing that this initial implementation was done as a client of the traditional git_config(), by the way. I would have expected that it would be the other way around, in that the traditional callers of git_config() would behefit automatically by having the cache layer below it. But we have to start from somewhere. Perhaps the round after this one can rename the exisiting implementation of git_config() to something else internal to the caching layer and give the existing callers a compatible interface that is backed by this new caching layer in a transparent way. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html