On Wed, Jun 21, 2017 at 1:47 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose the meaningful error messages under threaded execution
> where threads concurrently start to die.
>
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself calls die(), and would thus infinitely
> recurse.
>
> However, doing that very aggressively by immediately printing out
> "recursion detected in die handler" if we've already called die() once
> means that threaded invocations of git can end up only printing out
> the "recursion detected" error, while hiding the meaningful error.
>
> An example of this is running a threaded grep which dies on execution
> against pretty much any repo, git.git will do:
>
>     git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>
> With the current version of git this will print some combination of
> multiple PCRE failures that caused the abort and multiple "recursion
> detected", some invocations will print out multiple "recursion
> detected" errors with no PCRE error at all!
>
> Before this change, running the above grep command 1000 times against
> git.git[1] and taking the top 20 results will on my system yield the
> following distribution of actual errors ("E") and recursion
> errors ("R"):
>
>     322 E R
>     306 E
>     116 E R R
>      65 R R
>      54 R E
>      49 E E
>      44 R
>      15 E R R R
>       9 R R R
>       7 R E R
>       5 R R E
>       3 E R R R R
>       2 E E R
>       1 R R R R
>       1 R R R E
>       1 R E R R
>
> The exact results are obviously random and system-dependent, but this
> shows the race condition in this code. Some small part of the time
> we're about to print out the actual error ("E") but another thread's
> recursion error beats us to it, and sometimes we print out nothing but
> the recursion error.
>
> With this change we get, now with "W" to mean the new warning being
> emitted indicating that we've called die() many times:
>
>     502 E
>     160 E W E
>     120 E E
>      53 E W
>      35 E W E E
>      34 W E E
>      29 W E E E
>      16 E E W
>      16 E E E
>      11 W E E E E
>       7 E E W E
>       4 W E
>       3 W W E E
>       2 E W E E E
>       1 W W E
>       1 W E W E
>       1 E W W E E E
>       1 E W W E E
>       1 E W W E
>       1 E W E E W
>
> Which still sucks a bit, due to a still present race-condition in this
> code we're sometimes going to print out several errors still, or
> several warnings, or two duplicate errors without the warning.
>
> But we will never have a case where we completely hide the actual
> error as we do now.
>
> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16). There's other threaded code that calls set_die_routine()
> or set_die_is_recursing_routine().
>
> But this is about fixing the general die() behavior with threading
> when we don't have such a custom routine yet. Right now the common
> case is not an infinite recursion in the handler, but us losing error
> messages by default because we're overly paranoid about our recursion
> check.
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.
>
> But that doesn't really matter, for the recursion guard we just need
> to die "soon", not at exactly 1024 calls, and for printing the correct
> error and only one warning most of the time in the face of threaded
> death this is good enough and a net improvement on the current code.
>
> 1. for i in {1..1000}; do git grep -P --threads=8 
> '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: 
> p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort 
> -nr | head -n 20
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
> ---

Reviewed-by-and-found-no-nits: Stefan Beller <sbel...@google.com>
;)

>
> This replaces v1 and takes into account the feedback in this thread
> (thanks everyone!).
>
> The commit message is also much improved and includes more rationale
> originally in my reply to Stefan in 87podz8v6v....@gmail.com

Thanks!
Stefan

>
>  usage.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1ea7df9a20 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
>         static int dying;
> -       return dying++;
> +       /*
> +        * Just an arbitrary number X where "a < x < b" where "a" is
> +        * "maximum number of pthreads we'll ever plausibly spawn" and
> +        * "b" is "something less than Inf", since the point is to
> +        * prevent infinite recursion.
> +        */
> +       static const int recursion_limit = 1024;
> +
> +       dying++;
> +       if (dying > recursion_limit) {
> +               return 1;
> +       } else if (dying == 2) {
> +               warning("die() called many times. Recursion error or racy 
> threaded death!");
> +               return 0;
> +       } else {
> +               return 0;
> +       }
>  }
>
>  /* If we are in a dlopen()ed .so write to a global variable would segfault
> --
> 2.13.1.611.g7e3b11ae1
>

Reply via email to