Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On 07/11, Stefan Beller wrote: > On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams wrote: > > > + if (repo_submodule_init(&submodule, superproject, path)) > > + return 0; > > What happens if we go through the "return 0", do we rather want to > print an error ? Should just indicate that there is no hit in the submodule, but if we couldn't init the submodule maybe you're right and we should issue a warning. > > > + /* add objects to alternates */ > > + add_to_alternates_memory(submodule.objectdir); > > Not trying to make my object series more important than it is... but > we really don't want to spread this add_to_alternates_memory hack. :/ Nope your object series is definitely important IMO. As I commented in my reply to Jonathan, I'm not sure if we want to wait till that becomes a reality or not. > > I agree with Jacob that a patch with such a diffstat is a joy to review. :) > > Thanks, > Stefan -- Brandon Williams
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On 07/11, Jacob Keller wrote: > On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams wrote: > > Convert grep to use 'struct repository' which enables recursing into > > submodules to be handled in-process. > > > > Signed-off-by: Brandon Williams > > --- > > Documentation/git-grep.txt | 7 - > > builtin/grep.c | 390 > > + > > cache.h| 1 - > > git.c | 2 +- > > grep.c | 13 -- > > grep.h | 1 - > > setup.c| 12 +- > > 7 files changed, 81 insertions(+), 345 deletions(-) > > > > No real indepth comments here, but it's nice to see how much code > reduction this has enabled! Yeah overall, with this and the ls-files conversion, I'm really pleased with how much cleaner the code looks moving to working in-process. -- Brandon Williams
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On 07/11, Jonathan Nieder wrote: > Hi, > > Brandon Williams wrote: > > > Convert grep to use 'struct repository' which enables recursing into > > submodules to be handled in-process. > > \o/ > > This will be even nicer with the changes described at > https://public-inbox.org/git/20170706202739.6056-1-sbel...@google.com/. > Until then, I fear it will cause a regression --- see (*) below. > > [...] > > Documentation/git-grep.txt | 7 - > > builtin/grep.c | 390 > > + > > cache.h| 1 - > > git.c | 2 +- > > grep.c | 13 -- > > grep.h | 1 - > > setup.c| 12 +- > > 7 files changed, 81 insertions(+), 345 deletions(-) > > Yay, tests still pass. > > [..] > > --- a/Documentation/git-grep.txt > > +++ b/Documentation/git-grep.txt > > @@ -95,13 +95,6 @@ OPTIONS > > option the prefix of all submodule output will be the name of > > the parent project's object. > > > > ---parent-basename :: > > - For internal use only. In order to produce uniform output with the > > - --recurse-submodules option, this option can be used to provide the > > - basename of a parent's object to a submodule so the submodule > > - can prefix its output with the parent's name rather than the SHA1 of > > - the submodule. > > Being able to get rid of this is a very nice change. > > [...] > > +++ b/builtin/grep.c > [...] > > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char > > *filename) > > { > > struct strbuf buf = STRBUF_INIT; > > > > - if (super_prefix) > > - strbuf_addstr(&buf, super_prefix); > > - strbuf_addstr(&buf, filename); > > - > > if (opt->relative && opt->prefix_length) { > > - char *name = strbuf_detach(&buf, NULL); > > - quote_path_relative(name, opt->prefix, &buf); > > - free(name); > > + quote_path_relative(filename, opt->prefix, &buf); > > + } else { > > + strbuf_addstr(&buf, filename); > > } > > style micronit: can avoid these braces since both branches are > single-line. Didn't realize that with all the deleted lines, I'll fix for the next version. > > [...] > > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const > > char *prefix) > > exit(status); > > } > > > > -static void compile_submodule_options(const struct grep_opt *opt, > > - const char **argv, > > - int cached, int untracked, > > - int opt_exclude, int use_index, > > - int pattern_type_arg) > > -{ > [...] > > - /* > > -* Limit number of threads for child process to use. > > -* This is to prevent potential fork-bomb behavior of git-grep as each > > -* submodule process has its own thread pool. > > -*/ > > - argv_array_pushf(&submodule_options, "--threads=%d", > > -(num_threads + 1) / 2); > > Being able to get rid of this is another very nice change. > > [...] > > + /* add objects to alternates */ > > + add_to_alternates_memory(submodule.objectdir); > > (*) This sets up a single in-memory object store with all the > processed submodules. Processed objects are never freed. > This means that if I run a command like > > git grep --recurse-submodules -e neverfound HEAD > > in a project with many submodules then memory consumption scales in > the same way as if the project were all one repository. By contrast, > without this patch, git is able to take advantage of the implicit > free() when each child exits to limit its memory usage. > > Worse, this increases the number of pack files git has to pay > attention to the sum of the numbers of pack files in all the > repositories processed so far. A single object lookup can take > O(number of packs * log(number of objects in each pack)) time. That > means performance is likely to suffer as the number of submodules > increases (n^2 performance) even on systems with a lot of memory. > > Once the object store is part of the repository struct and freeable, > those problems go away and this patch becomes a no-brainer. > > What should happen until then? Should this go in "next" so we can get > experience with it but with care not to let it graduate to "master"? I agree that this is an issue and that we need to address by having an object store per repository. While that is being worked on (by Stefan) I don't know how long it would take to have it be a reality. So the question ends up being do we care more about the state of the code and cleaning up a lot of 'hacks' that I introduced to get grep working with submodules, or do we care about the performance more. I don't know which is the right answer but I'd personally like to see the hacks I added to be removed sooner rather than later. That and I th
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
Hi, Brandon Williams wrote: > Convert grep to use 'struct repository' which enables recursing into > submodules to be handled in-process. \o/ This will be even nicer with the changes described at https://public-inbox.org/git/20170706202739.6056-1-sbel...@google.com/. Until then, I fear it will cause a regression --- see (*) below. [...] > Documentation/git-grep.txt | 7 - > builtin/grep.c | 390 > + > cache.h| 1 - > git.c | 2 +- > grep.c | 13 -- > grep.h | 1 - > setup.c| 12 +- > 7 files changed, 81 insertions(+), 345 deletions(-) Yay, tests still pass. [..] > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -95,13 +95,6 @@ OPTIONS >option the prefix of all submodule output will be the name of > the parent project's object. > > ---parent-basename :: > - For internal use only. In order to produce uniform output with the > - --recurse-submodules option, this option can be used to provide the > - basename of a parent's object to a submodule so the submodule > - can prefix its output with the parent's name rather than the SHA1 of > - the submodule. Being able to get rid of this is a very nice change. [...] > +++ b/builtin/grep.c [...] > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char > *filename) > { > struct strbuf buf = STRBUF_INIT; > > - if (super_prefix) > - strbuf_addstr(&buf, super_prefix); > - strbuf_addstr(&buf, filename); > - > if (opt->relative && opt->prefix_length) { > - char *name = strbuf_detach(&buf, NULL); > - quote_path_relative(name, opt->prefix, &buf); > - free(name); > + quote_path_relative(filename, opt->prefix, &buf); > + } else { > + strbuf_addstr(&buf, filename); > } style micronit: can avoid these braces since both branches are single-line. [...] > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char > *prefix) > exit(status); > } > > -static void compile_submodule_options(const struct grep_opt *opt, > - const char **argv, > - int cached, int untracked, > - int opt_exclude, int use_index, > - int pattern_type_arg) > -{ [...] > - /* > - * Limit number of threads for child process to use. > - * This is to prevent potential fork-bomb behavior of git-grep as each > - * submodule process has its own thread pool. > - */ > - argv_array_pushf(&submodule_options, "--threads=%d", > - (num_threads + 1) / 2); Being able to get rid of this is another very nice change. [...] > + /* add objects to alternates */ > + add_to_alternates_memory(submodule.objectdir); (*) This sets up a single in-memory object store with all the processed submodules. Processed objects are never freed. This means that if I run a command like git grep --recurse-submodules -e neverfound HEAD in a project with many submodules then memory consumption scales in the same way as if the project were all one repository. By contrast, without this patch, git is able to take advantage of the implicit free() when each child exits to limit its memory usage. Worse, this increases the number of pack files git has to pay attention to the sum of the numbers of pack files in all the repositories processed so far. A single object lookup can take O(number of packs * log(number of objects in each pack)) time. That means performance is likely to suffer as the number of submodules increases (n^2 performance) even on systems with a lot of memory. Once the object store is part of the repository struct and freeable, those problems go away and this patch becomes a no-brainer. What should happen until then? Should this go in "next" so we can get experience with it but with care not to let it graduate to "master"? Aside from those two concerns, this patch looks very good from a quick skim, though I haven't reviewed it closely line-by-line. Once we know how to go forward, I'm happy to look at it again. Thanks, Jonathan
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams wrote: > + if (repo_submodule_init(&submodule, superproject, path)) > + return 0; What happens if we go through the "return 0", do we rather want to print an error ? > + /* add objects to alternates */ > + add_to_alternates_memory(submodule.objectdir); Not trying to make my object series more important than it is... but we really don't want to spread this add_to_alternates_memory hack. :/ I agree with Jacob that a patch with such a diffstat is a joy to review. :) Thanks, Stefan
Re: [PATCH 3/3] grep: recurse in-process using 'struct repository'
On Tue, Jul 11, 2017 at 3:04 PM, Brandon Williams wrote: > Convert grep to use 'struct repository' which enables recursing into > submodules to be handled in-process. > > Signed-off-by: Brandon Williams > --- > Documentation/git-grep.txt | 7 - > builtin/grep.c | 390 > + > cache.h| 1 - > git.c | 2 +- > grep.c | 13 -- > grep.h | 1 - > setup.c| 12 +- > 7 files changed, 81 insertions(+), 345 deletions(-) > No real indepth comments here, but it's nice to see how much code reduction this has enabled! Thanks, Jake > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 5033483db..720c7850e 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -95,13 +95,6 @@ OPTIONS > option the prefix of all submodule output will be the name of > the parent project's object. > > ---parent-basename :: > - For internal use only. In order to produce uniform output with the > - --recurse-submodules option, this option can be used to provide the > - basename of a parent's object to a submodule so the submodule > - can prefix its output with the parent's name rather than the SHA1 of > - the submodule. > - > -a:: > --text:: > Process binary files as if they were text. > diff --git a/builtin/grep.c b/builtin/grep.c > index fa351c49f..0b0a8459e 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -28,13 +28,7 @@ static char const * const grep_usage[] = { > NULL > }; > > -static const char *super_prefix; > static int recurse_submodules; > -static struct argv_array submodule_options = ARGV_ARRAY_INIT; > -static const char *parent_basename; > - > -static int grep_submodule_launch(struct grep_opt *opt, > -const struct grep_source *gs); > > #define GREP_NUM_THREADS_DEFAULT 8 > static int num_threads; > @@ -186,10 +180,7 @@ static void *run(void *arg) > break; > > opt->output_priv = w; > - if (w->source.type == GREP_SOURCE_SUBMODULE) > - hit |= grep_submodule_launch(opt, &w->source); > - else > - hit |= grep_source(opt, &w->source); > + hit |= grep_source(opt, &w->source); > grep_source_clear_data(&w->source); > work_done(w); > } > @@ -327,21 +318,13 @@ static int grep_oid(struct grep_opt *opt, const struct > object_id *oid, > { > struct strbuf pathbuf = STRBUF_INIT; > > - if (super_prefix) { > - strbuf_add(&pathbuf, filename, tree_name_len); > - strbuf_addstr(&pathbuf, super_prefix); > - strbuf_addstr(&pathbuf, filename + tree_name_len); > + if (opt->relative && opt->prefix_length) { > + quote_path_relative(filename + tree_name_len, opt->prefix, > &pathbuf); > + strbuf_insert(&pathbuf, 0, filename, tree_name_len); > } else { > strbuf_addstr(&pathbuf, filename); > } > > - if (opt->relative && opt->prefix_length) { > - char *name = strbuf_detach(&pathbuf, NULL); > - quote_path_relative(name + tree_name_len, opt->prefix, > &pathbuf); > - strbuf_insert(&pathbuf, 0, name, tree_name_len); > - free(name); > - } > - > #ifndef NO_PTHREADS > if (num_threads) { > add_work(opt, GREP_SOURCE_OID, pathbuf.buf, path, oid); > @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char > *filename) > { > struct strbuf buf = STRBUF_INIT; > > - if (super_prefix) > - strbuf_addstr(&buf, super_prefix); > - strbuf_addstr(&buf, filename); > - > if (opt->relative && opt->prefix_length) { > - char *name = strbuf_detach(&buf, NULL); > - quote_path_relative(name, opt->prefix, &buf); > - free(name); > + quote_path_relative(filename, opt->prefix, &buf); > + } else { > + strbuf_addstr(&buf, filename); > } > > #ifndef NO_PTHREADS > @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char > *prefix) > exit(status); > } > > -static void compile_submodule_options(const struct grep_opt *opt, > - const char **argv, > - int cached, int untracked, > - int opt_exclude, int use_index, > - int pattern_type_arg) > -{ > - struct grep_pat *pattern; > - > - if (recurse_submodules) > - argv_array_push(&submodule_options, "--recurse-submodules"); > - > - if (cached) > - argv_array_push(&submodule_options, "--ca
[PATCH 3/3] grep: recurse in-process using 'struct repository'
Convert grep to use 'struct repository' which enables recursing into submodules to be handled in-process. Signed-off-by: Brandon Williams --- Documentation/git-grep.txt | 7 - builtin/grep.c | 390 + cache.h| 1 - git.c | 2 +- grep.c | 13 -- grep.h | 1 - setup.c| 12 +- 7 files changed, 81 insertions(+), 345 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 5033483db..720c7850e 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -95,13 +95,6 @@ OPTIONS option the prefix of all submodule output will be the name of the parent project's object. ---parent-basename :: - For internal use only. In order to produce uniform output with the - --recurse-submodules option, this option can be used to provide the - basename of a parent's object to a submodule so the submodule - can prefix its output with the parent's name rather than the SHA1 of - the submodule. - -a:: --text:: Process binary files as if they were text. diff --git a/builtin/grep.c b/builtin/grep.c index fa351c49f..0b0a8459e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -28,13 +28,7 @@ static char const * const grep_usage[] = { NULL }; -static const char *super_prefix; static int recurse_submodules; -static struct argv_array submodule_options = ARGV_ARRAY_INIT; -static const char *parent_basename; - -static int grep_submodule_launch(struct grep_opt *opt, -const struct grep_source *gs); #define GREP_NUM_THREADS_DEFAULT 8 static int num_threads; @@ -186,10 +180,7 @@ static void *run(void *arg) break; opt->output_priv = w; - if (w->source.type == GREP_SOURCE_SUBMODULE) - hit |= grep_submodule_launch(opt, &w->source); - else - hit |= grep_source(opt, &w->source); + hit |= grep_source(opt, &w->source); grep_source_clear_data(&w->source); work_done(w); } @@ -327,21 +318,13 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid, { struct strbuf pathbuf = STRBUF_INIT; - if (super_prefix) { - strbuf_add(&pathbuf, filename, tree_name_len); - strbuf_addstr(&pathbuf, super_prefix); - strbuf_addstr(&pathbuf, filename + tree_name_len); + if (opt->relative && opt->prefix_length) { + quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf); + strbuf_insert(&pathbuf, 0, filename, tree_name_len); } else { strbuf_addstr(&pathbuf, filename); } - if (opt->relative && opt->prefix_length) { - char *name = strbuf_detach(&pathbuf, NULL); - quote_path_relative(name + tree_name_len, opt->prefix, &pathbuf); - strbuf_insert(&pathbuf, 0, name, tree_name_len); - free(name); - } - #ifndef NO_PTHREADS if (num_threads) { add_work(opt, GREP_SOURCE_OID, pathbuf.buf, path, oid); @@ -366,14 +349,10 @@ static int grep_file(struct grep_opt *opt, const char *filename) { struct strbuf buf = STRBUF_INIT; - if (super_prefix) - strbuf_addstr(&buf, super_prefix); - strbuf_addstr(&buf, filename); - if (opt->relative && opt->prefix_length) { - char *name = strbuf_detach(&buf, NULL); - quote_path_relative(name, opt->prefix, &buf); - free(name); + quote_path_relative(filename, opt->prefix, &buf); + } else { + strbuf_addstr(&buf, filename); } #ifndef NO_PTHREADS @@ -421,284 +400,80 @@ static void run_pager(struct grep_opt *opt, const char *prefix) exit(status); } -static void compile_submodule_options(const struct grep_opt *opt, - const char **argv, - int cached, int untracked, - int opt_exclude, int use_index, - int pattern_type_arg) -{ - struct grep_pat *pattern; - - if (recurse_submodules) - argv_array_push(&submodule_options, "--recurse-submodules"); - - if (cached) - argv_array_push(&submodule_options, "--cached"); - if (!use_index) - argv_array_push(&submodule_options, "--no-index"); - if (untracked) - argv_array_push(&submodule_options, "--untracked"); - if (opt_exclude > 0) - argv_array_push(&submodule_options, "--exclude-standard"); - - if (opt->invert) - argv_array_push(&submodule_options, "-v"); - if (opt->ignore_c