[second try, now with text format] Thanks a lot for the reviews. Replying to both.
If I send a follow up, I'll fix the commit description and the help string, remove the shorthand -M, write a more sensible test. 2017-11-23 14:24 GMT-05:00 Eric Sunshine <sunsh...@sunshineco.com>: >> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt >> @@ -127,6 +128,10 @@ OPTIONS >> +-M<num>:: >> +--max-line-len<num>:: >> + Match the pattern only for line shorter or equal to this length. > > This documentation doesn't explain what it means by a line's length. > This implementation seems to take into consideration only the line's > byte count, however, given that displayed lines are normally > tab-expanded, people might intuitively expect that expansion to count > toward the length. A similar question arises for Unicode characters. > > Should this option take tab-expansion and Unicode into account? > Regardless of the answer to that question, the documentation should > make it clear what "line length" means. Excellent question. I don't have an immediate answer. >> diff --git a/grep.c b/grep.c >> @@ -1273,6 +1275,8 @@ static int match_line(struct grep_opt *opt, char *bol, >> char *eol, >> + if (opt->max_line_length > 0 && eol-bol > opt->max_line_length) >> + return 0; > > If the user specifies "-M0", should that error out or at least warn > the user that the value is non-sensical? What about -1, etc.? (These > are UX-related questions; the implementation obviously doesn't care > one way or the other.) Precedent with -A is to ignore the negative value. I don't have a strong opinion. 2017-11-23 20:44 GMT-05:00 Junio C Hamano <gits...@pobox.com>: > > Marc-Antoine Ruel <mar...@chromium.org> writes: > > > This tells git grep to skip files longer than a specified length, > > which is often the result of generators and not actual source files. > > > > ... > > +-M<num>:: > > +--max-line-len<num>:: > > + Match the pattern only for line shorter or equal to this length. > > + > > All the excellent review comments from Eric I agree with. > > With the name of the option and the above end-user facing > description, it is very clear that the only thing this feature does > is to declare that an overlong line does _not_ match when trying to > check against any pattern. > > That is a much clearer definition and description than random new > features people propose here (and kicked back by reviewers, telling > them to make the specification clearer), and I'd commend you for that. > > But it still leaves at least one thing unclear. How should it > interact with "-v"? If we consider an overlong line never matches, > would "git grep -v <pattern>" should include the line in its output? Ah! No idea. :/ > Speaking of the output, it also makes me wonder if the feature > really wants to include an overlong line as a context line when > showing a near-by line that matches the pattern when -A/-B/-C/-W > options are in use. Even though it is clear that it does from the > above description, is it really the best thing the feature can do to > help the end users? > > Which leads me to suspect that this "feature" might not be the ideal > you wanted to achive, but is an approximate substitution that you > found is "good enough" to simulate what the real thing you wanted to > do, especially when I go back and read the justfication in the > proposed log message that talks about "result of generators". > > Isn't it a property of the entire file, not individual lines, if you > find it uninteresting to see reported by "git grep"? I cannot shake > the suspicion that this feature happened to have ended up in this > shape, instead of "ignore a file with a line this long", only > because your starting point was to use "has overlong lines" as the > heuristic for "not interesting", and because "git grep" code is not > structured to first scan the entire file to decide if it is worth > working on it, and it is extra work to restructure the codeflow to > make it so (which you avoided). > > If your real motivation was either > > (1) whether the file has or does not have the pattern for certain > class of files are uninteresting; do not even run "grep" > processing for them; or > > (2) hits or no-hits may be intereseting but output of overlong > lines from certain class of files I do not wish to see; > > then I can think of two alternatives. > > For (1), can't we tell "result of generators" and other files with > pathspec? If so, perhaps a negative pathspec can rescue. e.g. > > git grep <pattern> -- '*.cc' ':!*-autogen.cc' > > For (2), can't we model this after how users can tell "git diff" > that certain paths are not worth computing and showing textual > patches for, which is to Unset the 'diff' attribute? When you have > > *-autogen.cc -diff > > in your .gitattributes, "git diff" would say "Binary files A and B > differ" instead of explaining line-by-line differences in the patch > form. Perhaps we can also have a 'grep' attribute and squelch the > output if it is Unset? > > It is debatable but one could propose extending the use of existing > 'diff' attribute to cover 'grep' too, with an argument that anything > not worth showing patch (i.e. 'diff' attribute is Unset) is not > worth showing grep hits from. Thanks for the thoughtful analysis. My main motivation was (1), thus filtering with a pathspec is much better than trying to work around the issue. The issues raised in the review are significant enough that committing this patch could cause significant issues; I don't know how to resolve handling with -v and how to handle tabs. After further thinking, what I'd like is a smarter version of the git-gs shortcut wrapper that limits the search space on well known extensions but I'd like it to also limit itself to "source-like" files, similar in some ways to the -I flag. So in some ways this could be better served as a git config but I'm not even sure what kind of heuristic would be generic enough to be valuable to a large number of users. As such I'll drop the patch as I don't see a clear path forward with the current one. Thanks, M-A