Re: git log -S or -G
On Tue, 9 Oct 2018, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Oct 09 2018, Junio C Hamano wrote: > > > Jeff King writes: > > > >> I think that is the best we could do for "-S", though, which is > >> inherently about counting hits. > >> > >> For "-G", we are literally grepping the diff. It does not seem > >> unreasonable to add the ability to grep only "-" or "+" lines, and the > >> interface for that should be pretty straightforward (a tri-state flag to > >> look in remove, added, or both lines). > > > > Yeah, here is a lunchtime hack that hasn't even been compile tested. > > > > diff.c | 4 > > diff.h | 2 ++ > > diffcore-pickaxe.c | 22 -- > > 3 files changed, 26 insertions(+), 2 deletions(-) > > > > diff --git a/diff.c b/diff.c > > index f0c7557b40..d1f2780844 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -5068,6 +5068,10 @@ int diff_opt_parse(struct diff_options *options, > > } > > else if (!strcmp(arg, "--pickaxe-all")) > > options->pickaxe_opts |= DIFF_PICKAXE_ALL; > > + else if (!strcmp(arg, "--pickaxe-ignore-add")) > > + options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_ADD; > > + else if (!strcmp(arg, "--pickaxe-ignore-del")) > > + options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_DEL; > > else if (!strcmp(arg, "--pickaxe-regex")) > > options->pickaxe_opts |= DIFF_PICKAXE_REGEX; > > else if ((argcount = short_opt('O', av, ))) { > > diff --git a/diff.h b/diff.h > > index a30cc35ec3..147c47ace7 100644 > > --- a/diff.h > > +++ b/diff.h > > @@ -358,6 +358,8 @@ int git_config_rename(const char *var, const char > > *value); > > DIFF_PICKAXE_KIND_OBJFIND) > > > > #define DIFF_PICKAXE_IGNORE_CASE 32 > > +#define DIFF_PICKAXE_IGNORE_ADD64 > > +#define DIFF_PICKAXE_IGNORE_DEL 128 > > > > void diffcore_std(struct diff_options *); > > void diffcore_fix_diff_index(struct diff_options *); > > diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c > > index 800a899c86..826dde6bd4 100644 > > --- a/diffcore-pickaxe.c > > +++ b/diffcore-pickaxe.c > > @@ -16,6 +16,7 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two, > > > > struct diffgrep_cb { > > regex_t *regexp; > > + struct diff_options *diff_options; > > int hit; > > }; > > > > @@ -23,9 +24,14 @@ static void diffgrep_consume(void *priv, char *line, > > unsigned long len) > > { > > struct diffgrep_cb *data = priv; > > regmatch_t regmatch; > > + unsigned pickaxe_opts = data->diff_options->pickaxe_opts; > > > > if (line[0] != '+' && line[0] != '-') > > return; > > + if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) && line[0] == '+') > > + return; > > + if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) && line[0] == '-') > > + return; > > if (data->hit) > > /* > > Looks good, but I wonder if a more general version of this couldn't be > that instead of returning early if the line doesn't start with +/- > above, we have an option to skip that early return. > > Then you can simply specify a regex that starts by matching a + or - at > the start of the line, but you also get the poweruser tool of matching > lines around those lines, as tweaked by the -U option. I.e. this: > > diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c > index 800a899c86..90625a110c 100644 > --- a/diffcore-pickaxe.c > +++ b/diffcore-pickaxe.c > @@ -24,15 +24,13 @@ static void diffgrep_consume(void *priv, char *line, > unsigned long len) > struct diffgrep_cb *data = priv; > regmatch_t regmatch; > > - if (line[0] != '+' && line[0] != '-') > - return; > if (data->hit) > /* > * NEEDSWORK: we should have a way to terminate the > * caller early. > */ > return; > - data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1, > + data->hit = !regexec_buf(data->regexp, line, len, 1, > , 0); > } > > That patch obviously breaks existing -G, so it would need to be > optional, but allows me e.g. on git.git to do: > > ~/g/git/git --exec-path=/home/avar/g/git log -G'^ .*marc\.info' -p -U2 -- > README.md > > To find a change whose first line of context is a line mentioning > marc.info, and then I can use -G'^\+' to find added lines matching > etc. Do -G's accumulate? I had the impression that only the last one was taken into account, but I didn't check the code on that. julia > > Then the --pickaxe-ignore-add and --pickaxe-ignore-del options in your > patch could just be implemented in terms of that feature, i.e. by > implicitly adding a "^-" or "^\+" to the beginning of the regex, > respectively, and implicitly turning on a new --pickaxe-raw-lines or > whatever we'd call it. > > > * NEEDSWORK: we should have a way to terminate the > > @@ -45,13 +51,20 @@ static int diff_grep(mmfile_t *one,
Re: git log -S or -G
On Tue, Oct 09 2018, Junio C Hamano wrote: > Jeff King writes: > >> I think that is the best we could do for "-S", though, which is >> inherently about counting hits. >> >> For "-G", we are literally grepping the diff. It does not seem >> unreasonable to add the ability to grep only "-" or "+" lines, and the >> interface for that should be pretty straightforward (a tri-state flag to >> look in remove, added, or both lines). > > Yeah, here is a lunchtime hack that hasn't even been compile tested. > > diff.c | 4 > diff.h | 2 ++ > diffcore-pickaxe.c | 22 -- > 3 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/diff.c b/diff.c > index f0c7557b40..d1f2780844 100644 > --- a/diff.c > +++ b/diff.c > @@ -5068,6 +5068,10 @@ int diff_opt_parse(struct diff_options *options, > } > else if (!strcmp(arg, "--pickaxe-all")) > options->pickaxe_opts |= DIFF_PICKAXE_ALL; > + else if (!strcmp(arg, "--pickaxe-ignore-add")) > + options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_ADD; > + else if (!strcmp(arg, "--pickaxe-ignore-del")) > + options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_DEL; > else if (!strcmp(arg, "--pickaxe-regex")) > options->pickaxe_opts |= DIFF_PICKAXE_REGEX; > else if ((argcount = short_opt('O', av, ))) { > diff --git a/diff.h b/diff.h > index a30cc35ec3..147c47ace7 100644 > --- a/diff.h > +++ b/diff.h > @@ -358,6 +358,8 @@ int git_config_rename(const char *var, const char *value); >DIFF_PICKAXE_KIND_OBJFIND) > > #define DIFF_PICKAXE_IGNORE_CASE 32 > +#define DIFF_PICKAXE_IGNORE_ADD 64 > +#define DIFF_PICKAXE_IGNORE_DEL 128 > > void diffcore_std(struct diff_options *); > void diffcore_fix_diff_index(struct diff_options *); > diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c > index 800a899c86..826dde6bd4 100644 > --- a/diffcore-pickaxe.c > +++ b/diffcore-pickaxe.c > @@ -16,6 +16,7 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two, > > struct diffgrep_cb { > regex_t *regexp; > + struct diff_options *diff_options; > int hit; > }; > > @@ -23,9 +24,14 @@ static void diffgrep_consume(void *priv, char *line, > unsigned long len) > { > struct diffgrep_cb *data = priv; > regmatch_t regmatch; > + unsigned pickaxe_opts = data->diff_options->pickaxe_opts; > > if (line[0] != '+' && line[0] != '-') > return; > + if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) && line[0] == '+') > + return; > + if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) && line[0] == '-') > + return; > if (data->hit) > /* Looks good, but I wonder if a more general version of this couldn't be that instead of returning early if the line doesn't start with +/- above, we have an option to skip that early return. Then you can simply specify a regex that starts by matching a + or - at the start of the line, but you also get the poweruser tool of matching lines around those lines, as tweaked by the -U option. I.e. this: diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index 800a899c86..90625a110c 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -24,15 +24,13 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len) struct diffgrep_cb *data = priv; regmatch_t regmatch; - if (line[0] != '+' && line[0] != '-') - return; if (data->hit) /* * NEEDSWORK: we should have a way to terminate the * caller early. */ return; - data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1, + data->hit = !regexec_buf(data->regexp, line, len, 1, , 0); } That patch obviously breaks existing -G, so it would need to be optional, but allows me e.g. on git.git to do: ~/g/git/git --exec-path=/home/avar/g/git log -G'^ .*marc\.info' -p -U2 -- README.md To find a change whose first line of context is a line mentioning marc.info, and then I can use -G'^\+' to find added lines matching etc. Then the --pickaxe-ignore-add and --pickaxe-ignore-del options in your patch could just be implemented in terms of that feature, i.e. by implicitly adding a "^-" or "^\+" to the beginning of the regex, respectively, and implicitly turning on a new --pickaxe-raw-lines or whatever we'd call it. >* NEEDSWORK: we should have a way to terminate the > @@ -45,13 +51,20 @@ static int diff_grep(mmfile_t *one, mmfile_t *two, > xpparam_t xpp; > xdemitconf_t xecfg; > > - if (!one) > + if (!one) { > + if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) > + return 0; > return !regexec_buf(regexp, two->ptr, two->size, > 1, , 0); > - if (!two) > + } > + if (!two) { > +
Re: git log -S or -G
On Mon, 8 Oct 2018, Jacob Keller wrote: > On Mon, Oct 8, 2018 at 8:22 PM Jeff King wrote: > > > > On Tue, Oct 09, 2018 at 08:09:32AM +0900, Junio C Hamano wrote: > > > > > Julia Lawall writes: > > > > > > >> Doing the same for -S is much harder at the machinery level, as it > > > >> performs its thing without internally running "diff" twice, but just > > > >> counts the number of occurrences of 'foo'---that is sufficient for > > > >> its intended use, and more efficient. > > > > > > > > There is still the question of whether the number of occurrences of foo > > > > decreases or increases. > > > > > > Hmph, taking the changes that makes the number of hits decrease > > > would catch a subset of "changes that removes 'foo' only---I am not > > > interested in the ones that adds 'foo'". It will avoid getting > > > confused by a change that moves an existing 'foo' to another place > > > in the same file (as the number of hits does not change), but at the > > > same time, it will miss a change that genuinely removes an existing > > > 'foo' and happens to add a 'foo' at a different place in the same > > > file that is unrelated to the original 'foo'. Depending on the > > > definition of "I am only interested in removed ones", that may or > > > may not be acceptable. > > > > I think that is the best we could do for "-S", though, which is > > inherently about counting hits. > > > > For "-G", we are literally grepping the diff. It does not seem > > unreasonable to add the ability to grep only "-" or "+" lines, and the > > interface for that should be pretty straightforward (a tri-state flag to > > look in remove, added, or both lines). > > > > -Peff > > Yea. I know I've wanted something like this in the past. It could also be nice to be able to specify multiple patterns, with and and or between them. So -A&-B would be remove A somewhere and remove B somewhere. julia
Re: git log -S or -G
Jeff King writes: > I think that is the best we could do for "-S", though, which is > inherently about counting hits. > > For "-G", we are literally grepping the diff. It does not seem > unreasonable to add the ability to grep only "-" or "+" lines, and the > interface for that should be pretty straightforward (a tri-state flag to > look in remove, added, or both lines). Yeah, here is a lunchtime hack that hasn't even been compile tested. diff.c | 4 diff.h | 2 ++ diffcore-pickaxe.c | 22 -- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/diff.c b/diff.c index f0c7557b40..d1f2780844 100644 --- a/diff.c +++ b/diff.c @@ -5068,6 +5068,10 @@ int diff_opt_parse(struct diff_options *options, } else if (!strcmp(arg, "--pickaxe-all")) options->pickaxe_opts |= DIFF_PICKAXE_ALL; + else if (!strcmp(arg, "--pickaxe-ignore-add")) + options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_ADD; + else if (!strcmp(arg, "--pickaxe-ignore-del")) + options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_DEL; else if (!strcmp(arg, "--pickaxe-regex")) options->pickaxe_opts |= DIFF_PICKAXE_REGEX; else if ((argcount = short_opt('O', av, ))) { diff --git a/diff.h b/diff.h index a30cc35ec3..147c47ace7 100644 --- a/diff.h +++ b/diff.h @@ -358,6 +358,8 @@ int git_config_rename(const char *var, const char *value); DIFF_PICKAXE_KIND_OBJFIND) #define DIFF_PICKAXE_IGNORE_CASE 32 +#define DIFF_PICKAXE_IGNORE_ADD64 +#define DIFF_PICKAXE_IGNORE_DEL 128 void diffcore_std(struct diff_options *); void diffcore_fix_diff_index(struct diff_options *); diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index 800a899c86..826dde6bd4 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -16,6 +16,7 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two, struct diffgrep_cb { regex_t *regexp; + struct diff_options *diff_options; int hit; }; @@ -23,9 +24,14 @@ static void diffgrep_consume(void *priv, char *line, unsigned long len) { struct diffgrep_cb *data = priv; regmatch_t regmatch; + unsigned pickaxe_opts = data->diff_options->pickaxe_opts; if (line[0] != '+' && line[0] != '-') return; + if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) && line[0] == '+') + return; + if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) && line[0] == '-') + return; if (data->hit) /* * NEEDSWORK: we should have a way to terminate the @@ -45,13 +51,20 @@ static int diff_grep(mmfile_t *one, mmfile_t *two, xpparam_t xpp; xdemitconf_t xecfg; - if (!one) + if (!one) { + if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) + return 0; return !regexec_buf(regexp, two->ptr, two->size, 1, , 0); - if (!two) + } + if (!two) { + if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) + return 0; return !regexec_buf(regexp, one->ptr, one->size, 1, , 0); + } + ecbdata.diff_options = o; /* * We have both sides; need to run textual diff and see if * the pattern appears on added/deleted lines. @@ -113,6 +126,11 @@ static int has_changes(mmfile_t *one, mmfile_t *two, { unsigned int one_contains = one ? contains(one, regexp, kws) : 0; unsigned int two_contains = two ? contains(two, regexp, kws) : 0; + + if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) + return one_contains > two_contains; + if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) + return one_contains < two_contains; return one_contains != two_contains; }
Re: git log -S or -G
On Mon, Oct 8, 2018 at 8:22 PM Jeff King wrote: > > On Tue, Oct 09, 2018 at 08:09:32AM +0900, Junio C Hamano wrote: > > > Julia Lawall writes: > > > > >> Doing the same for -S is much harder at the machinery level, as it > > >> performs its thing without internally running "diff" twice, but just > > >> counts the number of occurrences of 'foo'---that is sufficient for > > >> its intended use, and more efficient. > > > > > > There is still the question of whether the number of occurrences of foo > > > decreases or increases. > > > > Hmph, taking the changes that makes the number of hits decrease > > would catch a subset of "changes that removes 'foo' only---I am not > > interested in the ones that adds 'foo'". It will avoid getting > > confused by a change that moves an existing 'foo' to another place > > in the same file (as the number of hits does not change), but at the > > same time, it will miss a change that genuinely removes an existing > > 'foo' and happens to add a 'foo' at a different place in the same > > file that is unrelated to the original 'foo'. Depending on the > > definition of "I am only interested in removed ones", that may or > > may not be acceptable. > > I think that is the best we could do for "-S", though, which is > inherently about counting hits. > > For "-G", we are literally grepping the diff. It does not seem > unreasonable to add the ability to grep only "-" or "+" lines, and the > interface for that should be pretty straightforward (a tri-state flag to > look in remove, added, or both lines). > > -Peff Yea. I know I've wanted something like this in the past. Thanks, Jake
Re: git log -S or -G
On Tue, Oct 09, 2018 at 08:09:32AM +0900, Junio C Hamano wrote: > Julia Lawall writes: > > >> Doing the same for -S is much harder at the machinery level, as it > >> performs its thing without internally running "diff" twice, but just > >> counts the number of occurrences of 'foo'---that is sufficient for > >> its intended use, and more efficient. > > > > There is still the question of whether the number of occurrences of foo > > decreases or increases. > > Hmph, taking the changes that makes the number of hits decrease > would catch a subset of "changes that removes 'foo' only---I am not > interested in the ones that adds 'foo'". It will avoid getting > confused by a change that moves an existing 'foo' to another place > in the same file (as the number of hits does not change), but at the > same time, it will miss a change that genuinely removes an existing > 'foo' and happens to add a 'foo' at a different place in the same > file that is unrelated to the original 'foo'. Depending on the > definition of "I am only interested in removed ones", that may or > may not be acceptable. I think that is the best we could do for "-S", though, which is inherently about counting hits. For "-G", we are literally grepping the diff. It does not seem unreasonable to add the ability to grep only "-" or "+" lines, and the interface for that should be pretty straightforward (a tri-state flag to look in remove, added, or both lines). -Peff
Re: git log -S or -G
Julia Lawall writes: >> Doing the same for -S is much harder at the machinery level, as it >> performs its thing without internally running "diff" twice, but just >> counts the number of occurrences of 'foo'---that is sufficient for >> its intended use, and more efficient. > > There is still the question of whether the number of occurrences of foo > decreases or increases. Hmph, taking the changes that makes the number of hits decrease would catch a subset of "changes that removes 'foo' only---I am not interested in the ones that adds 'foo'". It will avoid getting confused by a change that moves an existing 'foo' to another place in the same file (as the number of hits does not change), but at the same time, it will miss a change that genuinely removes an existing 'foo' and happens to add a 'foo' at a different place in the same file that is unrelated to the original 'foo'. Depending on the definition of "I am only interested in removed ones", that may or may not be acceptable.
Re: git log -S or -G
On Sun, 7 Oct 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > > > On Sat, Oct 6, 2018 at 5:16 PM Julia Lawall wrote: > >> Git log -S or -G make it possible to find commits that have particular > >> words in the changed lines. Sometimes it would be helpful to search for > >> words in the removed lines or in the added lines specifically. From the > >> implementation, I had the impression that this would be easy to implement. > >> The main question would be how to allow the user to specify what is > >> wanted. > > > > As far as I know this isn't possible. The --diff-filter option is > > similar in spirit, but e.g. adding "foo" and then removing it from an > > existing file will both be covered under --diff-filter=M, so that > > isn't what you're looking for. > > I agree with Julia that UI to the feature is harder than the > machinery to implement the feature to add "I am interested in seeing > a patch that contains a hunk that adds 'foo' but am not interested > in removal" (or vice versa) for -G. You tweak > diffcore-pickaxe.c::diffgrep_consume() and you'are done. > > Doing the same for -S is much harder at the machinery level, as it > performs its thing without internally running "diff" twice, but just > counts the number of occurrences of 'foo'---that is sufficient for > its intended use, and more efficient. There is still the question of whether the number of occurrences of foo decreases or increases. julia
Re: git log -S or -G
Ævar Arnfjörð Bjarmason writes: > On Sat, Oct 6, 2018 at 5:16 PM Julia Lawall wrote: >> Git log -S or -G make it possible to find commits that have particular >> words in the changed lines. Sometimes it would be helpful to search for >> words in the removed lines or in the added lines specifically. From the >> implementation, I had the impression that this would be easy to implement. >> The main question would be how to allow the user to specify what is >> wanted. > > As far as I know this isn't possible. The --diff-filter option is > similar in spirit, but e.g. adding "foo" and then removing it from an > existing file will both be covered under --diff-filter=M, so that > isn't what you're looking for. I agree with Julia that UI to the feature is harder than the machinery to implement the feature to add "I am interested in seeing a patch that contains a hunk that adds 'foo' but am not interested in removal" (or vice versa) for -G. You tweak diffcore-pickaxe.c::diffgrep_consume() and you'are done. Doing the same for -S is much harder at the machinery level, as it performs its thing without internally running "diff" twice, but just counts the number of occurrences of 'foo'---that is sufficient for its intended use, and more efficient.
Re: git log -S or -G
On Sat, Oct 6, 2018 at 5:16 PM Julia Lawall wrote: > Git log -S or -G make it possible to find commits that have particular > words in the changed lines. Sometimes it would be helpful to search for > words in the removed lines or in the added lines specifically. From the > implementation, I had the impression that this would be easy to implement. > The main question would be how to allow the user to specify what is > wanted. As far as I know this isn't possible. The --diff-filter option is similar in spirit, but e.g. adding "foo" and then removing it from an existing file will both be covered under --diff-filter=M, so that isn't what you're looking for.
git log -S or -G
Hello, Git log -S or -G make it possible to find commits that have particular words in the changed lines. Sometimes it would be helpful to search for words in the removed lines or in the added lines specifically. From the implementation, I had the impression that this would be easy to implement. The main question would be how to allow the user to specify what is wanted. thanks, julia