On Mon, Jun 19, 2017 at 3:32 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
>
> On Mon, Jun 19 2017, Stefan Beller jotted:
>
>>> Now, git-grep could make use of the pluggable error facility added in
>>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>>> 2013-04-16).
>>
>> I think we should do that instead (though I have not looked at the downsides
>> of this), because...
>
> It makes sense to do that in addition to what I'm doing here (or if the
> approach I'm taking doesn't make sense, some other patch to fix the
> general issue in the default handler).
>
> I'm going to try to get around to fixing the grep behavior in a
> follow-up patch, this is a fix for the overzealous recursion detection
> in the default handler needlessly causing other issues.
>
>>>
>>> So let's just set the recursion limit to a number higher than the
>>> number of threads we're ever likely to spawn. Now we won't lose
>>> errors, and if we have a recursing die handler we'll still die within
>>> microseconds.
>>
>> how are we handling access to that global variable?
>> Do we need to hold a mutex to be correct? or rather hope that
>> it works across threads, not counting on it, because each thread
>> individually would count up to 1024?
>
> It's not guarded by a mutex and the ++ here and the reads from it are
> both racy.
>
> However, for its stated purpose that's fine, even if we're racily
> incrementing it and losing some updates some will get through, which is
> good enough for an infinite recursion detection. We don't really care if
> we die at exactly 1 or exactly 1024.
>
>> I would prefer if we kept the number as low as "at most
>> one screen of lines".
>
> In practice this is the case in git, because the programs that would
> encounter this are going to be spawning less than screenfull of threads,
> assuming (as is the case) that each thread might print out one error.
>
> The semantics of that aren't changed with this patch, the difference is
> that you're going to get e.g. N repeats of a meaningful error instead of
> N repeats of either the meaningful error OR "recursion detected in die
> handler", depending on your luck.
>
> I.e. in current git (after a few runs to get an unlucky one):
>
>     $ git grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>
> Or if you're lucky at least one of these will be the actual error:
>
>     $ git grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
>     fatal: recursion detected in die handler
>     fatal: pcre_exec failed with error code -8
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>
> But with this change:
>
>     $ ~/g/git/git-grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
>     fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>     fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>     fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>
> (The error message is different because I compiled with PCRE v2 locally,
> instead of the system PCRE v1, but that doesn't matter for this example)
>
> Over 1000 runs thish is how that breaks down on my machine, without this
> patch. I've replaced the recursion error with just "R" and the PCRE
> error with "P", and shown them in descending order by occurrence, lines
> without a "P" only printed out the recursion error:
>
>     $ (for i in {1..1000}; do git grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
> r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort 
> -nr|head -n 10
>         247 R R
>         136 P R R
>         122 P R
>         112 R R R
>         108 R
>          59 R P R
>          54 R P
>          54 P
>          31 P R R R
>          21 R R P
>
> There's a long tail I've omitted there of alterations to that. As this
> shows in >10% of cases we don't print out any meaningful error at
> all. But with this change:
>
>     $ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 
> '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
> r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort 
> -nr|head -n 10
>         377 P P
>         358 P P P
>         192 P
>          63 P P P P
>           8 P P P P P
>           2 P P P P P P
>
> We will always show a meaningful error, but may of course do so multiple
> times, which is a subject for a fix in git-grep in particular, but the
> point is again, to fix the general case for the default handler.
>
> Something something sorry about the long mail didn't have time to write
> a shorter one :)
>

Actually this convinced me (and it would be lovely to have such observations
in the commit message).

Thanks,
Stefan

Reply via email to