On Wed, Jun 21, 2017 at 1:47 PM, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote: > Change the recursion limit for the default die routine from a *very* > low 1 to 1024. This ensures that infinite recursions are broken, but > doesn't lose the meaningful error messages under threaded execution > where threads concurrently start to die. > > The intent of the existing code, as explained in commit > cd163d4b4e ("usage.c: detect recursion in die routines and bail out > immediately", 2012-11-14), is to break infinite recursion in cases > where the die routine itself calls die(), and would thus infinitely > recurse. > > However, doing that very aggressively by immediately printing out > "recursion detected in die handler" if we've already called die() once > means that threaded invocations of git can end up only printing out > the "recursion detected" error, while hiding the meaningful error. > > An example of this is running a threaded grep which dies on execution > against pretty much any repo, git.git will do: > > git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' > > With the current version of git this will print some combination of > multiple PCRE failures that caused the abort and multiple "recursion > detected", some invocations will print out multiple "recursion > detected" errors with no PCRE error at all! > > Before this change, running the above grep command 1000 times against > git.git[1] and taking the top 20 results will on my system yield the > following distribution of actual errors ("E") and recursion > errors ("R"): > > 322 E R > 306 E > 116 E R R > 65 R R > 54 R E > 49 E E > 44 R > 15 E R R R > 9 R R R > 7 R E R > 5 R R E > 3 E R R R R > 2 E E R > 1 R R R R > 1 R R R E > 1 R E R R > > The exact results are obviously random and system-dependent, but this > shows the race condition in this code. Some small part of the time > we're about to print out the actual error ("E") but another thread's > recursion error beats us to it, and sometimes we print out nothing but > the recursion error. > > With this change we get, now with "W" to mean the new warning being > emitted indicating that we've called die() many times: > > 502 E > 160 E W E > 120 E E > 53 E W > 35 E W E E > 34 W E E > 29 W E E E > 16 E E W > 16 E E E > 11 W E E E E > 7 E E W E > 4 W E > 3 W W E E > 2 E W E E E > 1 W W E > 1 W E W E > 1 E W W E E E > 1 E W W E E > 1 E W W E > 1 E W E E W > > Which still sucks a bit, due to a still present race-condition in this > code we're sometimes going to print out several errors still, or > several warnings, or two duplicate errors without the warning. > > But we will never have a case where we completely hide the actual > error as we do now. > > Now, git-grep could make use of the pluggable error facility added in > commit c19a490e37 ("usage: allow pluggable die-recursion checks", > 2013-04-16). There's other threaded code that calls set_die_routine() > or set_die_is_recursing_routine(). > > But this is about fixing the general die() behavior with threading > when we don't have such a custom routine yet. Right now the common > case is not an infinite recursion in the handler, but us losing error > messages by default because we're overly paranoid about our recursion > check. > > So let's just set the recursion limit to a number higher than the > number of threads we're ever likely to spawn. Now we won't lose > errors, and if we have a recursing die handler we'll still die within > microseconds. > > There are race conditions in this code itself, in particular the > "dying" variable is not thread mutexed, so we e.g. won't be dying at > exactly 1024, or for that matter even be able to accurately test > "dying == 2", see the cases where we print out more than one "W" > above. > > But that doesn't really matter, for the recursion guard we just need > to die "soon", not at exactly 1024 calls, and for printing the correct > error and only one warning most of the time in the face of threaded > death this is good enough and a net improvement on the current code. > > 1. for i in {1..1000}; do git grep -P --threads=8 > '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: > p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort > -nr | head -n 20 > > Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> > ---
Reviewed-by-and-found-no-nits: Stefan Beller <sbel...@google.com> ;) > > This replaces v1 and takes into account the feedback in this thread > (thanks everyone!). > > The commit message is also much improved and includes more rationale > originally in my reply to Stefan in 87podz8v6v....@gmail.com Thanks! Stefan > > usage.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/usage.c b/usage.c > index 2f87ca69a8..1ea7df9a20 100644 > --- a/usage.c > +++ b/usage.c > @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params) > static int die_is_recursing_builtin(void) > { > static int dying; > - return dying++; > + /* > + * Just an arbitrary number X where "a < x < b" where "a" is > + * "maximum number of pthreads we'll ever plausibly spawn" and > + * "b" is "something less than Inf", since the point is to > + * prevent infinite recursion. > + */ > + static const int recursion_limit = 1024; > + > + dying++; > + if (dying > recursion_limit) { > + return 1; > + } else if (dying == 2) { > + warning("die() called many times. Recursion error or racy > threaded death!"); > + return 0; > + } else { > + return 0; > + } > } > > /* If we are in a dlopen()ed .so write to a global variable would segfault > -- > 2.13.1.611.g7e3b11ae1 >