On Mon, Nov 06 2017, Jeff King jotted: > On Sun, Nov 05, 2017 at 10:41:17AM +0100, Дилян Палаузов wrote: > >> I understand that the PCRE's stack can get exhausted for some files, but in >> such cases, git grep shall proceed with the other files, and print at the >> end/stderr for which files the pattern was not applied. Such behaviour >> would be more usefull than the current one. > > Yes, I had a similar thought. It does feel a little funny for us to > basically treat an error as "no match" for non-interactive use, but then > the current behavior works out to be more or less the same (we return an > error code which most shell scripts would interpret as failure). > > IOW, I think something like this is probably the right direction: > > diff --git a/grep.c b/grep.c > index ce6a48e634..2c152e5908 100644 > --- a/grep.c > +++ b/grep.c > @@ -427,7 +427,7 @@ static int pcre1match(struct grep_pat *p, const char > *line, const char *eol, > } > > if (ret < 0 && ret != PCRE_ERROR_NOMATCH) > - die("pcre_exec failed with error code %d", ret); > + warning("pcre_exec failed with error code %d", ret); > if (ret > 0) { > ret = 0; > match->rm_so = ovector[0]; > > but possibly: > > 1. It would be nice to report the filename that we couldn't match on. > But we don't know it at this level of the code (and it might not be > a file at all that we are matching). So probably we'd want to pass > the error much further up the call stack. This is tricky as there > are multiple regex libraries we can use, and the return value gets > normalized to 1/0 for hit/not-hit long before we get as far as > something that knows the filename. > > We might need to do something invasive like adding an extra > parameter to hold the error message, and passing it through the > whole stack. > > 2. We should still try to exit with an exit code other than "1" to > indicate we hit an error besides "no lines were found". > > 3. Other regex libraries might need similar treatment. Probably > pcre2match() needs it. It doesn't look like regexec() can ever > return an error besides REG_NOMATCH. > > -Peff
Some replies to the thread in general, didn't want to spread this out into different replies. * Yes this sucks. * Just emitting a warning without an appropriate exit code would suck more, would break batch jobs & whatnot that expcept certain results from grep. * As you point out it would be nice to print out the file name we didn't match on, we'd need to pass the grep_source struct down further, it goes as far as grep_source_1 but stops there and isn't passed to e.g. look_ahead(), which calls patmatch() which calls the engine-specific matcher and would need to report the error. We could just do this, would slow down things a bit (probably trivally) but we could emit better error messages in genreal. * You can adjust these limts in PCRE in Git, although it doesn't help in this case, you just add (*LIMIT_MATCH=NUM) or (*LIMIT_RECURSION=NUM) (or both) to the start of the pattern. See pcresyntax(3) or pcre2syntax(3) man pages depending on what version you have installed. * While regexec() won't return an error its version of dealing with this is (at least under glibc) to balloon CPU/memory use until the OOMkiller kills git (although not on this particular pattern).