Re: [poll / rfc] kdb_stop_cpus
on 03/06/2011 18:13 Andriy Gapon said the following: > > I wonder if anybody uses kdb_stop_cpus with non-default value. I would like to go ahead and remove kdb_stop_cpus tunable/sysctl if nobody objects. > If, yes, I am very interested to learn about your usecase for it. > > I think that the default kdb behavior is the correct one, so it doesn't make > sense > to have a knob to turn on incorrect behavior. > But I may be missing something obvious. > > The comment in the code doesn't really satisfy me: > /* > * Flag indicating whether or not to IPI the other CPUs to stop them on > * entering the debugger. Sometimes, this will result in a deadlock as > * stop_cpus() waits for the other cpus to stop, so we allow it to be > * disabled. In order to maximize the chances of success, use a hard > * stop for that. > */ > > The hard stop should be sufficiently mighty. > Yes, I am aware of supposedly extremely rare situations where a deadlock could > happen even when using hard stop. But I'd rather fix that than have this > switch. > > Oh, the commit message (from 2004) explains it: >> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >> attempt to IPI other cpus when entering the debugger in order to stop >> them while in the debugger. The default remains to issue the stop; >> however, that can result in a hang if another cpu has interrupts disabled >> and is spinning, since the IPI won't be received and the KDB will wait >> indefinitely. We probably need to add a timeout, but this is a useful >> stopgap in the mean time. > > But that was before we started using hard stop in this context (in 2009). > -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
on 04/06/2011 11:22 Andriy Gapon said the following: > commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 > > generic_stop_cpus: move timeout detection code from under DIAGNOSTIC > > ... and also increase it a bit. > IMO it's better to detect and report the (rather serious) condition and > allow a system to proceed somehow rather than be stuck in an endless > loop. > > diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c > index ae52f4b..4bd766b 100644 > --- a/sys/kern/subr_smp.c > +++ b/sys/kern/subr_smp.c > @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type) > /* spin */ > cpu_spinwait(); > i++; > -#ifdef DIAGNOSTIC > - if (i == 10) { > + if (i == 1) { > printf("timeout stopping cpus\n"); > break; > } > -#endif > } > > stopping_cpu = NOCPU; > > I would like to commit the above, if nobody objects. A to do item is adding some code to aid debugging of the timeout condition. I discussed this with Attilio, he doesn't see this as a show-stopper and he plans to add the code at a later time. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
on 04/06/2011 12:11 Robert N. M. Watson said the following: > > On 4 Jun 2011, at 09:22, Andriy Gapon wrote: >> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 >> >> generic_stop_cpus: move timeout detection code from under DIAGNOSTIC >> >> ... and also increase it a bit. IMO it's better to detect and report the >> (rather serious) condition and allow a system to proceed somehow rather than >> be stuck in an endless loop. > > Agreed on detecting and reporting. It would be good to confirm that it works > in > practice, however, What is your concern here? :) The code seems rather simple - the loop is no longer infinite. > and also that there are no false positives. I'm not sure > what the best test scenarios are for that. As to the false positives - I think that that can only be verified by practice (very wide testing), because that would greatly depend on hardware. Maybe we should use some time-based approach instead of the iteration count approach or maybe we should calibrate the iteration count based on hardware characteristics... -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
on 05/06/2011 01:35 Attilio Rao said the following: > 2011/6/4 Andriy Gapon : >> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 >> >>generic_stop_cpus: move timeout detection code from under DIAGNOSTIC >> >>... and also increase it a bit. >>IMO it's better to detect and report the (rather serious) condition and >>allow a system to proceed somehow rather than be stuck in an endless >>loop. >> >> diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c >> index ae52f4b..4bd766b 100644 >> --- a/sys/kern/subr_smp.c >> +++ b/sys/kern/subr_smp.c >> @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type) >>/* spin */ >>cpu_spinwait(); >>i++; >> -#ifdef DIAGNOSTIC >> - if (i == 10) { >> + if (i == 1) { >>printf("timeout stopping cpus\n"); >>break; >>} >> -#endif >>} >> >>stopping_cpu = NOCPU; > > I'd also add the ability, once the deadlock is detected, to break in > KDB, and put that under DIAGNOSTIC. > I had such a patch and I used it to debug some deadlocks on shutdown > code, but now it seems I can't find it anymore. I think that this could be useful. Of course, it would have to honor KDB_UNATTENDED. However, I am not sure how to implement it safely. E.g. panic() should stop other CPUs before setting panicstr and if some CPU is stuck for good, then we would just be recursively calling panic() until triple-fault. Ditto for kdb_trap(). So if you could dig up your code for implementing this that would be useful. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
2011/6/4 Andriy Gapon : > on 03/06/2011 20:57 Robert N. M. Watson said the following: >> >> On 3 Jun 2011, at 16:13, Andriy Gapon wrote: >> >>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I >>> am very interested to learn about your usecase for it. >> >> The issue that prompted the sysctl was non-NMI IPIs being used to enter the >> debugger or reboot following a core hanging with interrupts disabled. With >> the switch to NMI IPIs in some of those circumstances, life is better -- at >> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 >> doesn't, however? > > Seems to be so as Nathan has also pointed out for PPC. > For this I also plan the following change: > > commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 > > generic_stop_cpus: move timeout detection code from under DIAGNOSTIC > > ... and also increase it a bit. > IMO it's better to detect and report the (rather serious) condition and > allow a system to proceed somehow rather than be stuck in an endless > loop. > > diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c > index ae52f4b..4bd766b 100644 > --- a/sys/kern/subr_smp.c > +++ b/sys/kern/subr_smp.c > @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type) > /* spin */ > cpu_spinwait(); > i++; > -#ifdef DIAGNOSTIC > - if (i == 10) { > + if (i == 1) { > printf("timeout stopping cpus\n"); > break; > } > -#endif > } > > stopping_cpu = NOCPU; I'd also add the ability, once the deadlock is detected, to break in KDB, and put that under DIAGNOSTIC. I had such a patch and I used it to debug some deadlocks on shutdown code, but now it seems I can't find it anymore. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
2011/6/3 Nathan Whitehorn : > On 06/03/11 10:13, Andriy Gapon wrote: >> >> I wonder if anybody uses kdb_stop_cpus with non-default value. >> If, yes, I am very interested to learn about your usecase for it. >> >> I think that the default kdb behavior is the correct one, so it doesn't >> make sense >> to have a knob to turn on incorrect behavior. >> But I may be missing something obvious. >> >> The comment in the code doesn't really satisfy me: >> /* >> * Flag indicating whether or not to IPI the other CPUs to stop them on >> * entering the debugger. Sometimes, this will result in a deadlock as >> * stop_cpus() waits for the other cpus to stop, so we allow it to be >> * disabled. In order to maximize the chances of success, use a hard >> * stop for that. >> */ >> >> The hard stop should be sufficiently mighty. >> Yes, I am aware of supposedly extremely rare situations where a deadlock >> could >> happen even when using hard stop. But I'd rather fix that than have this >> switch. >> >> Oh, the commit message (from 2004) explains it: >>> >>> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >>> attempt to IPI other cpus when entering the debugger in order to stop >>> them while in the debugger. The default remains to issue the stop; >>> however, that can result in a hang if another cpu has interrupts disabled >>> and is spinning, since the IPI won't be received and the KDB will wait >>> indefinitely. We probably need to add a timeout, but this is a useful >>> stopgap in the mean time. >> >> But that was before we started using hard stop in this context (in 2009). > > Some non-x86 platforms (e.g. PPC) don't support real NMIs, and so this still > applies. Well, if I get Andriy's proposal right, he just wants to trim off the possibility to not stop the CPUs on entering KDB. I'm not entirely sure why there is a sysctl for disabling that and I really don't want it. Note that the missing of the NMI/privileged Interrupt is not going to be a factor on this request, unless you are worried a lot by the easy deadlock that a normal stop operation may lead. If that is the case, I think that the upcoming work on skipping locking during KDB/panic entering is going to help a lot for this case. At that point removing the possibility to turn off CPU stopping will be a good idea, IMHO. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
On 4 Jun 2011, at 09:22, Andriy Gapon wrote: > on 03/06/2011 20:57 Robert N. M. Watson said the following: >> >> On 3 Jun 2011, at 16:13, Andriy Gapon wrote: >> >>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I >>> am very interested to learn about your usecase for it. >> >> The issue that prompted the sysctl was non-NMI IPIs being used to enter the >> debugger or reboot following a core hanging with interrupts disabled. With >> the switch to NMI IPIs in some of those circumstances, life is better -- at >> least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 >> doesn't, however? > > Seems to be so as Nathan has also pointed out for PPC. > For this I also plan the following change: > > commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 > >generic_stop_cpus: move timeout detection code from under DIAGNOSTIC > >... and also increase it a bit. >IMO it's better to detect and report the (rather serious) condition and >allow a system to proceed somehow rather than be stuck in an endless >loop. Agreed on detecting and reporting. It would be good to confirm that it works in practice, however, and also that there are no false positives. I'm not sure what the best test scenarios are for that. Robert ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
on 03/06/2011 20:57 Robert N. M. Watson said the following: > > On 3 Jun 2011, at 16:13, Andriy Gapon wrote: > >> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I >> am very interested to learn about your usecase for it. > > The issue that prompted the sysctl was non-NMI IPIs being used to enter the > debugger or reboot following a core hanging with interrupts disabled. With > the switch to NMI IPIs in some of those circumstances, life is better -- at > least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 > doesn't, however? Seems to be so as Nathan has also pointed out for PPC. For this I also plan the following change: commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 generic_stop_cpus: move timeout detection code from under DIAGNOSTIC ... and also increase it a bit. IMO it's better to detect and report the (rather serious) condition and allow a system to proceed somehow rather than be stuck in an endless loop. diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c index ae52f4b..4bd766b 100644 --- a/sys/kern/subr_smp.c +++ b/sys/kern/subr_smp.c @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type) /* spin */ cpu_spinwait(); i++; -#ifdef DIAGNOSTIC - if (i == 10) { + if (i == 1) { printf("timeout stopping cpus\n"); break; } -#endif } stopping_cpu = NOCPU; > Not sure about MIPS, etc. Attilio has since significantly > improved our shutdown behaviour -- initially, the switch to NMI IPIs broke > other things (because certain IPIs then improperly preempted threads holding > spinlocks), but that pretty much all seems worked out now. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
On 3 Jun 2011, at 16:13, Andriy Gapon wrote: > I wonder if anybody uses kdb_stop_cpus with non-default value. > If, yes, I am very interested to learn about your usecase for it. The issue that prompted the sysctl was non-NMI IPIs being used to enter the debugger or reboot following a core hanging with interrupts disabled. With the switch to NMI IPIs in some of those circumstances, life is better -- at least, on hardware that supports non-maskable IPIs. I seem to recall sparc64 doesn't, however? Not sure about MIPS, etc. Attilio has since significantly improved our shutdown behaviour -- initially, the switch to NMI IPIs broke other things (because certain IPIs then improperly preempted threads holding spinlocks), but that pretty much all seems worked out now. Robert > > I think that the default kdb behavior is the correct one, so it doesn't make > sense > to have a knob to turn on incorrect behavior. > But I may be missing something obvious. > > The comment in the code doesn't really satisfy me: > /* > * Flag indicating whether or not to IPI the other CPUs to stop them on > * entering the debugger. Sometimes, this will result in a deadlock as > * stop_cpus() waits for the other cpus to stop, so we allow it to be > * disabled. In order to maximize the chances of success, use a hard > * stop for that. > */ > > The hard stop should be sufficiently mighty. > Yes, I am aware of supposedly extremely rare situations where a deadlock could > happen even when using hard stop. But I'd rather fix that than have this > switch. > > Oh, the commit message (from 2004) explains it: >> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >> attempt to IPI other cpus when entering the debugger in order to stop >> them while in the debugger. The default remains to issue the stop; >> however, that can result in a hang if another cpu has interrupts disabled >> and is spinning, since the IPI won't be received and the KDB will wait >> indefinitely. We probably need to add a timeout, but this is a useful >> stopgap in the mean time. > > But that was before we started using hard stop in this context (in 2009). > > -- > Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
on 03/06/2011 18:28 Nathan Whitehorn said the following: > On 06/03/11 10:13, Andriy Gapon wrote: >> >> I wonder if anybody uses kdb_stop_cpus with non-default value. >> If, yes, I am very interested to learn about your usecase for it. >> >> I think that the default kdb behavior is the correct one, so it doesn't make >> sense >> to have a knob to turn on incorrect behavior. >> But I may be missing something obvious. >> >> The comment in the code doesn't really satisfy me: >> /* >> * Flag indicating whether or not to IPI the other CPUs to stop them on >> * entering the debugger. Sometimes, this will result in a deadlock as >> * stop_cpus() waits for the other cpus to stop, so we allow it to be >> * disabled. In order to maximize the chances of success, use a hard >> * stop for that. >> */ >> >> The hard stop should be sufficiently mighty. >> Yes, I am aware of supposedly extremely rare situations where a deadlock >> could >> happen even when using hard stop. But I'd rather fix that than have this >> switch. >> >> Oh, the commit message (from 2004) explains it: >>> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we >>> attempt to IPI other cpus when entering the debugger in order to stop >>> them while in the debugger. The default remains to issue the stop; >>> however, that can result in a hang if another cpu has interrupts disabled >>> and is spinning, since the IPI won't be received and the KDB will wait >>> indefinitely. We probably need to add a timeout, but this is a useful >>> stopgap in the mean time. >> >> But that was before we started using hard stop in this context (in 2009). > > Some non-x86 platforms (e.g. PPC) don't support real NMIs, and so this still > applies. Well, even if it does, there are two things that can be done about that (and, IMO, both are better than the manually controlled knob): - quick and dirty: just let stop_cpus[_hard] timeout; this way good CPUs are stopped and the bad ones are no worse than with kdb_stop_cpus=0. - have a special reserved high priority interrupt, change 'disabling of interrupts' to 'disabling of all interrupts except the special one' by employing various kinds of interrupt priority registers (like it was done for splX stuff); use the special interrupt like an IPI+NMI. What do you think? P.S. I think that the first "quick and dirty" thing should be done anyway, regardless of any other changes and plans. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [poll / rfc] kdb_stop_cpus
On 06/03/11 10:13, Andriy Gapon wrote: I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes, I am very interested to learn about your usecase for it. I think that the default kdb behavior is the correct one, so it doesn't make sense to have a knob to turn on incorrect behavior. But I may be missing something obvious. The comment in the code doesn't really satisfy me: /* * Flag indicating whether or not to IPI the other CPUs to stop them on * entering the debugger. Sometimes, this will result in a deadlock as * stop_cpus() waits for the other cpus to stop, so we allow it to be * disabled. In order to maximize the chances of success, use a hard * stop for that. */ The hard stop should be sufficiently mighty. Yes, I am aware of supposedly extremely rare situations where a deadlock could happen even when using hard stop. But I'd rather fix that than have this switch. Oh, the commit message (from 2004) explains it: Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we attempt to IPI other cpus when entering the debugger in order to stop them while in the debugger. The default remains to issue the stop; however, that can result in a hang if another cpu has interrupts disabled and is spinning, since the IPI won't be received and the KDB will wait indefinitely. We probably need to add a timeout, but this is a useful stopgap in the mean time. But that was before we started using hard stop in this context (in 2009). Some non-x86 platforms (e.g. PPC) don't support real NMIs, and so this still applies. -Nathan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"