Re: [poll / rfc] kdb_stop_cpus

2011-06-04 Thread Attilio Rao
2011/6/4 Andriy Gapon :
> on 03/06/2011 20:57 Robert N. M. Watson said the following:
>>
>> On 3 Jun 2011, at 16:13, Andriy Gapon wrote:
>>
>>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I
>>> am very interested to learn about your usecase for it.
>>
>> The issue that prompted the sysctl was non-NMI IPIs being used to enter the
>> debugger or reboot following a core hanging with interrupts disabled. With
>> the switch to NMI IPIs in some of those circumstances, life is better -- at
>> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64
>> doesn't, however?
>
> Seems to be so as Nathan has also pointed out for PPC.
> For this I also plan the following change:
>
> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016
>
>    generic_stop_cpus: move timeout detection code from under DIAGNOSTIC
>
>    ... and also increase it a bit.
>    IMO it's better to detect and report the (rather serious) condition and
>    allow a system to proceed somehow rather than be stuck in an endless
>    loop.
>
> diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c
> index ae52f4b..4bd766b 100644
> --- a/sys/kern/subr_smp.c
> +++ b/sys/kern/subr_smp.c
> @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type)
>                /* spin */
>                cpu_spinwait();
>                i++;
> -#ifdef DIAGNOSTIC
> -               if (i == 10) {
> +               if (i == 1) {
>                        printf("timeout stopping cpus\n");
>                        break;
>                }
> -#endif
>        }
>
>        stopping_cpu = NOCPU;

I'd also add the ability, once the deadlock is detected, to break in
KDB, and put that under DIAGNOSTIC.
I had such a patch and I used it to debug some deadlocks on shutdown
code, but now it seems I can't find it anymore.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [poll / rfc] kdb_stop_cpus

2011-06-04 Thread Attilio Rao
2011/6/3 Nathan Whitehorn :
> On 06/03/11 10:13, Andriy Gapon wrote:
>>
>> I wonder if anybody uses kdb_stop_cpus with non-default value.
>> If, yes, I am very interested to learn about your usecase for it.
>>
>> I think that the default kdb behavior is the correct one, so it doesn't
>> make sense
>> to have a knob to turn on incorrect behavior.
>> But I may be missing something obvious.
>>
>> The comment in the code doesn't really satisfy me:
>> /*
>>  * Flag indicating whether or not to IPI the other CPUs to stop them on
>>  * entering the debugger.  Sometimes, this will result in a deadlock as
>>  * stop_cpus() waits for the other cpus to stop, so we allow it to be
>>  * disabled.  In order to maximize the chances of success, use a hard
>>  * stop for that.
>>  */
>>
>> The hard stop should be sufficiently mighty.
>> Yes, I am aware of supposedly extremely rare situations where a deadlock
>> could
>> happen even when using hard stop.  But I'd rather fix that than have this
>> switch.
>>
>> Oh, the commit message (from 2004) explains it:
>>>
>>> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we
>>> attempt to IPI other cpus when entering the debugger in order to stop
>>> them while in the debugger.  The default remains to issue the stop;
>>> however, that can result in a hang if another cpu has interrupts disabled
>>> and is spinning, since the IPI won't be received and the KDB will wait
>>> indefinitely.  We probably need to add a timeout, but this is a useful
>>> stopgap in the mean time.
>>
>> But that was before we started using hard stop in this context (in 2009).
>
> Some non-x86 platforms (e.g. PPC) don't support real NMIs, and so this still
> applies.

Well, if I get Andriy's proposal right, he just wants to trim off the
possibility to not stop the CPUs on entering KDB. I'm not entirely
sure why there is a sysctl for disabling that and I really don't want
it.

Note that the missing of the NMI/privileged Interrupt is not going to
be a factor on this request, unless you are worried a lot by the easy
deadlock that a normal stop operation may lead.
If that is the case, I think that the upcoming work on skipping
locking during KDB/panic entering is going to help a lot for this
case. At that point removing the possibility to turn off CPU stopping
will be a good idea, IMHO.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [poll / rfc] kdb_stop_cpus

2011-06-04 Thread Robert N. M. Watson

On 4 Jun 2011, at 09:22, Andriy Gapon wrote:

> on 03/06/2011 20:57 Robert N. M. Watson said the following:
>> 
>> On 3 Jun 2011, at 16:13, Andriy Gapon wrote:
>> 
>>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I
>>> am very interested to learn about your usecase for it.
>> 
>> The issue that prompted the sysctl was non-NMI IPIs being used to enter the
>> debugger or reboot following a core hanging with interrupts disabled. With
>> the switch to NMI IPIs in some of those circumstances, life is better -- at
>> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64
>> doesn't, however?
> 
> Seems to be so as Nathan has also pointed out for PPC.
> For this I also plan the following change:
> 
> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016
> 
>generic_stop_cpus: move timeout detection code from under DIAGNOSTIC
> 
>... and also increase it a bit.
>IMO it's better to detect and report the (rather serious) condition and
>allow a system to proceed somehow rather than be stuck in an endless
>loop.

Agreed on detecting and reporting. It would be good to confirm that it works in 
practice, however, and also that there are no false positives. I'm not sure 
what the best test scenarios are for that.

Robert

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [poll / rfc] kdb_stop_cpus

2011-06-04 Thread Andriy Gapon
on 03/06/2011 20:57 Robert N. M. Watson said the following:
> 
> On 3 Jun 2011, at 16:13, Andriy Gapon wrote:
> 
>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I
>> am very interested to learn about your usecase for it.
> 
> The issue that prompted the sysctl was non-NMI IPIs being used to enter the
> debugger or reboot following a core hanging with interrupts disabled. With
> the switch to NMI IPIs in some of those circumstances, life is better -- at
> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64
> doesn't, however?

Seems to be so as Nathan has also pointed out for PPC.
For this I also plan the following change:

commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016

generic_stop_cpus: move timeout detection code from under DIAGNOSTIC

... and also increase it a bit.
IMO it's better to detect and report the (rather serious) condition and
allow a system to proceed somehow rather than be stuck in an endless
loop.

diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c
index ae52f4b..4bd766b 100644
--- a/sys/kern/subr_smp.c
+++ b/sys/kern/subr_smp.c
@@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type)
/* spin */
cpu_spinwait();
i++;
-#ifdef DIAGNOSTIC
-   if (i == 10) {
+   if (i == 1) {
printf("timeout stopping cpus\n");
break;
}
-#endif
}

stopping_cpu = NOCPU;


> Not sure about MIPS, etc. Attilio has since significantly
> improved our shutdown behaviour -- initially, the switch to NMI IPIs broke
> other things (because certain IPIs then improperly preempted threads holding
> spinlocks), but that pretty much all seems worked out now.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"