On Mon, Nov 06 2017, Jeff King jotted:

> On Mon, Nov 06, 2017 at 12:50:45PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Some replies to the thread in general, didn't want to spread this out
>> into different replies.
>>
>>  * Yes this sucks.
>>
>>  * Just emitting a warning without an appropriate exit code would suck
>>    more, would break batch jobs & whatnot that expcept certain results
>>    from grep.
>
> That was my first thought, too, but something that does:
>
>   git grep foo && echo found
>
> would behave basically the same. Do you mean here scripts that actually
> do:
>
>   git grep foo
>   case "$?" in
>   0) echo found ;;
>   1) echo not found ;;
>   *) echo wtf? ;;
>   esac
>
> I agree it would be nice to at least have _some_ way to distinguish
> between those final two cases.

Maybe we should emit a different exit code, but I just had in mind that
we could continue but the same non-zero as when the grep fails now, but
with an additional warning.

> Though something like "git log --grep" is even more complicated. We
> perhaps _would_ want to distinguish between:
>
>   - match (in which case we print the commit)
>
>   - no match (in which case we do not)
>
>   - error (in which case we do not print, but also mark the exit code as
>     failing)
>
>>  * As you point out it would be nice to print out the file name we
>>    didn't match on, we'd need to pass the grep_source struct down
>>    further, it goes as far as grep_source_1 but stops there and isn't
>>    passed to e.g. look_ahead(), which calls patmatch() which calls the
>>    engine-specific matcher and would need to report the error. We could
>>    just do this, would slow down things a bit (probably trivally) but we
>>    could emit better error messages in genreal.
>
> I'm not sure if the grep_source has enough information for all cases.
> E.g., if you hit an error while grepping in commit headers, you'd
> probably want to mention the oid of the commit. There's an "identifier"
> field in the grep_source, but it's opaque.
>
> The caller may also want to do more things than just print an error
> (like the exit code adjustment I mentioned above). Which implies to me
> we should be passing the error information up, not trying to bring the
> context down.

Indeed, I was just thinking of the case where we're grepping a file in
the tree, not when the engine is used for --grep et al.

>>  * You can adjust these limts in PCRE in Git, although it doesn't help
>>    in this case, you just add (*LIMIT_MATCH=NUM) or
>>    (*LIMIT_RECURSION=NUM) (or both) to the start of the pattern. See
>>    pcresyntax(3) or pcre2syntax(3) man pages depending on what version
>>    you have installed.
>
> I saw that in the pcre manual, but I had the impression you can't
> actually raise the limits above the defaults with that, only lower them.

You can, you just can't set them to anything less conservative than
limits already set via the C API, but we don't set any of those.

I.e. PCRE allows you to say expose a regex field in a web form (as we do
with gitweb) with really conservative settings that can't be overriden
via a (*LIMIT) set in the pattern.

But since we don't use that C API PCRE runs in a mode where the user
gets set whatever limit they want in the pattern (or other
pattern-altering switch), which makes sense for interactive git-grep
use.

>>  * While regexec() won't return an error its version of dealing with
>>    this is (at least under glibc) to balloon CPU/memory use until the
>>    OOMkiller kills git (although not on this particular pattern).
>
> So in a sense our current behavior with pcre is the same. We just have
> to provoke the death ourselves. ;)

Indeed :)

Reply via email to