Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard

2017-06-24 Thread Junio C Hamano
Jeff King  writes:

>> One case I'd be worried about would be that the race is so bad that
>> die-is-recursing-builtin never returns 0 even once.  Everybody will
>> just say "recursing" and die, without giving any useful information.
>
> I was trying to think how that would happen. If nobody's actually
> recursing indefinitely, then the value in theory peaks at the number of
> threads (modulo the fact that we're modifying a variable from multiple
> threads without any locking; I'm not sure how reasonable it is to assume
> in practice that sheared writes may cause us to lose an increment but
> not to put nonsense in to the variable). If they are, then one thread
> may increment it to 1024 before another thread gets a chance to say
> anything. But in that case, the recursion-die is our expected outcome.
>
> Anyway, it might be reasonable to protect the counter with a mutex.
> Like:
> ...
> To be honest, I'm not sure if it's worth giving it much more time,
> though. I'd be fine with Ævar's patch as-is.

The scenario I had in mind was three or more threads simultaneously
dying, each incrementing dying counter by one and before any of them
have a chance to say "called many times, error or racy threaded
death!", because they all observe three (or more).  

But I was incorrectly reading the code---in that case, as long as
dying is small enough, we'll return 0 and let at least one of the
caller give a chance to give a message that came in "err" from their
invocations of die()'s.

So I do not think it is worth worrying about too deeply.

Thanks.


Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard

2017-06-24 Thread Jeff King
On Wed, Jun 21, 2017 at 02:32:16PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
> 
> > So let's just set the recursion limit to a number higher than the
> > number of threads we're ever likely to spawn. Now we won't lose
> > errors, and if we have a recursing die handler we'll still die within
> > microseconds.
> >
> > There are race conditions in this code itself, in particular the
> > "dying" variable is not thread mutexed, so we e.g. won't be dying at
> > exactly 1024, or for that matter even be able to accurately test
> > "dying == 2", see the cases where we print out more than one "W"
> > above.
> 
> One case I'd be worried about would be that the race is so bad that
> die-is-recursing-builtin never returns 0 even once.  Everybody will
> just say "recursing" and die, without giving any useful information.

I was trying to think how that would happen. If nobody's actually
recursing indefinitely, then the value in theory peaks at the number of
threads (modulo the fact that we're modifying a variable from multiple
threads without any locking; I'm not sure how reasonable it is to assume
in practice that sheared writes may cause us to lose an increment but
not to put nonsense in to the variable). If they are, then one thread
may increment it to 1024 before another thread gets a chance to say
anything. But in that case, the recursion-die is our expected outcome.

Anyway, it might be reasonable to protect the counter with a mutex.
Like:

diff --git a/usage.c b/usage.c
index fc2b31c54b..34fef0f9fa 100644
--- a/usage.c
+++ b/usage.c
@@ -44,9 +44,19 @@ static void warn_builtin(const char *warn, va_list params)
vreportf("warning: ", warn, params);
 }
 
+#ifndef NO_PTHREADS
+static pthread_mutex_t recursion_mutex = PTHREAD_MUTEX_INITIALIZER;
+#define recursion_lock() pthread_mutex_lock(&recursion_mutex)
+#define recursion_unlock() pthread_mutex_unlock(&recursion_mutex)
+#else
+#define recursion_lock()
+#define recursion_unlock()
+#endif
+static int recursion_counter;
+
 static int die_is_recursing_builtin(void)
 {
-   static int dying;
+   int dying;
/*
 * Just an arbitrary number X where "a < x < b" where "a" is
 * "maximum number of pthreads we'll ever plausibly spawn" and
@@ -55,7 +65,10 @@ static int die_is_recursing_builtin(void)
 */
static const int recursion_limit = 1024;
 
-   dying++;
+   recursion_lock();
+   dying = ++recursion_counter;
+   recursion_unlock();
+
if (dying > recursion_limit) {
return 1;
} else if (dying == 2) {

I can't remember if there are problems on Windows with using constant
mutex initializers, though. If so, I guess common-main would have to
initialize it.

I left the rest of the logic as-is, but if we switched to post-increment:

  dying = recursion_counter++;

then I think the numbers around "dying" would make more sense (e.g.,
"dying == 2" would make more sense to me as "dying == 1" to check that
we were already dying).

To be honest, I'm not sure if it's worth giving it much more time,
though. I'd be fine with Ævar's patch as-is.

-Peff


Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard

2017-06-21 Thread Ævar Arnfjörð Bjarmason

On Wed, Jun 21 2017, Morten Welinder jotted:

> If threading is the issue, how do you get meaningful results from
> reading and updating
> "dying" with no use of atomic types or locks?  Other than winning the
> implied race, of
> course.

Threading isn't the issue. The issue is that we have an overzelous
recursion guard that will demonstrably cause us to lose errors in the
face of threading.

By amending the guard so that we won't run into it in practice so soon
that we'll hide errors (see the empirical results in the commit message)
we solve *that* issue in practice.

The current code & the code I'm adding here suffers from race conditions
& non-atomic updates, but for the reasons explained at the bottom of the
the commit message that's OK.

We're not relying on being able to do x++ and have x be 1, 2, 3 etc. in
the face of threading, we're just currently relying on it being larger
than 1, or with my patch eventually larger than 1024.

It is possible with my patch that we'll never take the "dying == 2" ->
"warning(..)" branch (and empirical results show that happens), but it's
enough for the purposes of the default die handler (which we really
should be overriding if we're doing threading, but sometimes we're lazy)
that it works most of the time, and that we at least don't hide real
errors, which is the issue with it right now.

> On Wed, Jun 21, 2017 at 4:47 PM, Ævar Arnfjörð Bjarmason
>  wrote:
>> Change the recursion limit for the default die routine from a *very*
>> low 1 to 1024. This ensures that infinite recursions are broken, but
>> doesn't lose the meaningful error messages under threaded execution
>> where threads concurrently start to die.
>>
>> The intent of the existing code, as explained in commit
>> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
>> immediately", 2012-11-14), is to break infinite recursion in cases
>> where the die routine itself calls die(), and would thus infinitely
>> recurse.
>>
>> However, doing that very aggressively by immediately printing out
>> "recursion detected in die handler" if we've already called die() once
>> means that threaded invocations of git can end up only printing out
>> the "recursion detected" error, while hiding the meaningful error.
>>
>> An example of this is running a threaded grep which dies on execution
>> against pretty much any repo, git.git will do:
>>
>> git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>>
>> With the current version of git this will print some combination of
>> multiple PCRE failures that caused the abort and multiple "recursion
>> detected", some invocations will print out multiple "recursion
>> detected" errors with no PCRE error at all!
>>
>> Before this change, running the above grep command 1000 times against
>> git.git[1] and taking the top 20 results will on my system yield the
>> following distribution of actual errors ("E") and recursion
>> errors ("R"):
>>
>> 322 E R
>> 306 E
>> 116 E R R
>>  65 R R
>>  54 R E
>>  49 E E
>>  44 R
>>  15 E R R R
>>   9 R R R
>>   7 R E R
>>   5 R R E
>>   3 E R R R R
>>   2 E E R
>>   1 R R R R
>>   1 R R R E
>>   1 R E R R
>>
>> The exact results are obviously random and system-dependent, but this
>> shows the race condition in this code. Some small part of the time
>> we're about to print out the actual error ("E") but another thread's
>> recursion error beats us to it, and sometimes we print out nothing but
>> the recursion error.
>>
>> With this change we get, now with "W" to mean the new warning being
>> emitted indicating that we've called die() many times:
>>
>> 502 E
>> 160 E W E
>> 120 E E
>>  53 E W
>>  35 E W E E
>>  34 W E E
>>  29 W E E E
>>  16 E E W
>>  16 E E E
>>  11 W E E E E
>>   7 E E W E
>>   4 W E
>>   3 W W E E
>>   2 E W E E E
>>   1 W W E
>>   1 W E W E
>>   1 E W W E E E
>>   1 E W W E E
>>   1 E W W E
>>   1 E W E E W
>>
>> Which still sucks a bit, due to a still present race-condition in this
>> code we're sometimes going to print out several errors still, or
>> several warnings, or two duplicate errors without the warning.
>>
>> But we will never have a case where we completely hide the actual
>> error as we do now.
>>
>> Now, git-grep could make use of the pluggable error facility added in
>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>> 2013-04-16). There's other threaded code that calls set_die_routine()
>> or set_die_is_recursing_routine().
>>
>> But this is about fixing the general die() behavior with threading
>> when we don't have such a custom routine yet. Right now the common
>> case is not an infinite recursion in the handler, but us losing error
>> messages by default because we're overly paranoid about our recursion
>> check.
>>
>> So let's just set the recursion limit to a number higher than the
>> number of threads we're ever likely to spawn. N

Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard

2017-06-21 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.

One case I'd be worried about would be that the race is so bad that
die-is-recursing-builtin never returns 0 even once.  Everybody will
just say "recursing" and die, without giving any useful information.

Will queue, as it is nevertheless an improvement over the current
code.

Thanks.

>  usage.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1ea7df9a20 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
>   static int dying;
> - return dying++;
> + /*
> +  * Just an arbitrary number X where "a < x < b" where "a" is
> +  * "maximum number of pthreads we'll ever plausibly spawn" and
> +  * "b" is "something less than Inf", since the point is to
> +  * prevent infinite recursion.
> +  */
> + static const int recursion_limit = 1024;
> +
> + dying++;
> + if (dying > recursion_limit) {
> + return 1;
> + } else if (dying == 2) {
> + warning("die() called many times. Recursion error or racy 
> threaded death!");
> + return 0;
> + } else {
> + return 0;
> + }
>  }
>  
>  /* If we are in a dlopen()ed .so write to a global variable would segfault


Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard

2017-06-21 Thread Morten Welinder
If threading is the issue, how do you get meaningful results from
reading and updating
"dying" with no use of atomic types or locks?  Other than winning the
implied race, of
course.

M.


On Wed, Jun 21, 2017 at 4:47 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose the meaningful error messages under threaded execution
> where threads concurrently start to die.
>
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself calls die(), and would thus infinitely
> recurse.
>
> However, doing that very aggressively by immediately printing out
> "recursion detected in die handler" if we've already called die() once
> means that threaded invocations of git can end up only printing out
> the "recursion detected" error, while hiding the meaningful error.
>
> An example of this is running a threaded grep which dies on execution
> against pretty much any repo, git.git will do:
>
> git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>
> With the current version of git this will print some combination of
> multiple PCRE failures that caused the abort and multiple "recursion
> detected", some invocations will print out multiple "recursion
> detected" errors with no PCRE error at all!
>
> Before this change, running the above grep command 1000 times against
> git.git[1] and taking the top 20 results will on my system yield the
> following distribution of actual errors ("E") and recursion
> errors ("R"):
>
> 322 E R
> 306 E
> 116 E R R
>  65 R R
>  54 R E
>  49 E E
>  44 R
>  15 E R R R
>   9 R R R
>   7 R E R
>   5 R R E
>   3 E R R R R
>   2 E E R
>   1 R R R R
>   1 R R R E
>   1 R E R R
>
> The exact results are obviously random and system-dependent, but this
> shows the race condition in this code. Some small part of the time
> we're about to print out the actual error ("E") but another thread's
> recursion error beats us to it, and sometimes we print out nothing but
> the recursion error.
>
> With this change we get, now with "W" to mean the new warning being
> emitted indicating that we've called die() many times:
>
> 502 E
> 160 E W E
> 120 E E
>  53 E W
>  35 E W E E
>  34 W E E
>  29 W E E E
>  16 E E W
>  16 E E E
>  11 W E E E E
>   7 E E W E
>   4 W E
>   3 W W E E
>   2 E W E E E
>   1 W W E
>   1 W E W E
>   1 E W W E E E
>   1 E W W E E
>   1 E W W E
>   1 E W E E W
>
> Which still sucks a bit, due to a still present race-condition in this
> code we're sometimes going to print out several errors still, or
> several warnings, or two duplicate errors without the warning.
>
> But we will never have a case where we completely hide the actual
> error as we do now.
>
> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16). There's other threaded code that calls set_die_routine()
> or set_die_is_recursing_routine().
>
> But this is about fixing the general die() behavior with threading
> when we don't have such a custom routine yet. Right now the common
> case is not an infinite recursion in the handler, but us losing error
> messages by default because we're overly paranoid about our recursion
> check.
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.
>
> But that doesn't really matter, for the recursion guard we just need
> to die "soon", not at exactly 1024 calls, and for printing the correct
> error and only one warning most of the time in the face of threaded
> death this is good enough and a net improvement on the current code.
>
> 1. for i in {1..1000}; do git grep -P --threads=8 
> '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: 
> p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort 
> -nr | head -n 20
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>
> This replaces v1 and takes into account the feedback in this thread
> (thanks everyone!).
>
> The commit message is also much improved and includes more rationale
> originally in my reply to Stefan in 87podz8v6v@gmail.com
>
>  usage.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(

Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard

2017-06-21 Thread Stefan Beller
On Wed, Jun 21, 2017 at 1:47 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose the meaningful error messages under threaded execution
> where threads concurrently start to die.
>
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself calls die(), and would thus infinitely
> recurse.
>
> However, doing that very aggressively by immediately printing out
> "recursion detected in die handler" if we've already called die() once
> means that threaded invocations of git can end up only printing out
> the "recursion detected" error, while hiding the meaningful error.
>
> An example of this is running a threaded grep which dies on execution
> against pretty much any repo, git.git will do:
>
> git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>
> With the current version of git this will print some combination of
> multiple PCRE failures that caused the abort and multiple "recursion
> detected", some invocations will print out multiple "recursion
> detected" errors with no PCRE error at all!
>
> Before this change, running the above grep command 1000 times against
> git.git[1] and taking the top 20 results will on my system yield the
> following distribution of actual errors ("E") and recursion
> errors ("R"):
>
> 322 E R
> 306 E
> 116 E R R
>  65 R R
>  54 R E
>  49 E E
>  44 R
>  15 E R R R
>   9 R R R
>   7 R E R
>   5 R R E
>   3 E R R R R
>   2 E E R
>   1 R R R R
>   1 R R R E
>   1 R E R R
>
> The exact results are obviously random and system-dependent, but this
> shows the race condition in this code. Some small part of the time
> we're about to print out the actual error ("E") but another thread's
> recursion error beats us to it, and sometimes we print out nothing but
> the recursion error.
>
> With this change we get, now with "W" to mean the new warning being
> emitted indicating that we've called die() many times:
>
> 502 E
> 160 E W E
> 120 E E
>  53 E W
>  35 E W E E
>  34 W E E
>  29 W E E E
>  16 E E W
>  16 E E E
>  11 W E E E E
>   7 E E W E
>   4 W E
>   3 W W E E
>   2 E W E E E
>   1 W W E
>   1 W E W E
>   1 E W W E E E
>   1 E W W E E
>   1 E W W E
>   1 E W E E W
>
> Which still sucks a bit, due to a still present race-condition in this
> code we're sometimes going to print out several errors still, or
> several warnings, or two duplicate errors without the warning.
>
> But we will never have a case where we completely hide the actual
> error as we do now.
>
> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16). There's other threaded code that calls set_die_routine()
> or set_die_is_recursing_routine().
>
> But this is about fixing the general die() behavior with threading
> when we don't have such a custom routine yet. Right now the common
> case is not an infinite recursion in the handler, but us losing error
> messages by default because we're overly paranoid about our recursion
> check.
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.
>
> But that doesn't really matter, for the recursion guard we just need
> to die "soon", not at exactly 1024 calls, and for printing the correct
> error and only one warning most of the time in the face of threaded
> death this is good enough and a net improvement on the current code.
>
> 1. for i in {1..1000}; do git grep -P --threads=8 
> '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: 
> p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort 
> -nr | head -n 20
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

Reviewed-by-and-found-no-nits: Stefan Beller 
;)

>
> This replaces v1 and takes into account the feedback in this thread
> (thanks everyone!).
>
> The commit message is also much improved and includes more rationale
> originally in my reply to Stefan in 87podz8v6v@gmail.com

Thanks!
Stefan

>
>  usage.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1ea7df9a20 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +