Re: [patch] serial console vs NMI watchdog
On Mon, 12 Mar 2001 00:27:14 -0800, george anzinger <[EMAIL PROTECTED]> wrote: >Keith Owens wrote: >> kdb uses NMI IPI to get the other cpu's attention. One cpu is in >> control and may or may not be accepting NMI, it depends on the event >> that entered kdb. The other cpus end up in kdb code, spinning waiting >> for a cpu switch. Initially they are not receiving NMI because they >> were invoked via NMI which is masked until they exit. > >Are you actually twiddling the hardware, or just changing what happens >on NMI? No hardware twiddling. One cpu gets an event which triggers kdb, that event may or may not be via NMI. kdb on ix86 then uses NMI to get the attention of the other cpus, even if they are in a disabled spinloop. ix86 hardware will not deliver another NMI to a cpu until the cpu issues iret to return from the NMI handler. Initially all but one cpu is forced into kdb via the NMI handler so no more NMI events will occur on those cpus. The first cpu may or may not be receiving NMI, it depends on how kdb was invoked on the first cpu. To do single stepping of code, kdb allows one cpu out of kdb state so it can execute one instruction at the point it was interrupted. If the cpu was entered via NMI then that means exiting from the NMI handler back to the original code, do single step then back into kdb again. Having exited the NMI handler, that cpu will start receiving NMI events again, even though it is still under kdb control. So some cpus will get NMI, some will not, depending on user actions. kdb uses a software mechanism to selectively disable the NMI watchdog. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
Keith Owens wrote: > > On Sun, 11 Mar 2001 20:43:16 -0800, > george anzinger <[EMAIL PROTECTED]> wrote: > >Consider this. Why not use the NMI to sync the cpus. Kdb would have a > >function that is called each NMI. > > kdb uses NMI IPI to get the other cpu's attention. One cpu is in > control and may or may not be accepting NMI, it depends on the event > that entered kdb. The other cpus end up in kdb code, spinning waiting > for a cpu switch. Initially they are not receiving NMI because they > were invoked via NMI which is masked until they exit. However if the > user does a cpu switch then single steps the interrupted code, the cpu > has to return from the NMI handler to the interrupted code at which > time this cpu starts receiving NMI again. Are you actually twiddling the hardware, or just changing what happens on NMI? > > The kdb context can change from ignoring NMI to accepting NMI. It is > easier to bring all the cpus into kdb and let the kdb code decide if it > ignores any NMI that is being received. Yes. Exactly. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
On Sun, 11 Mar 2001 20:43:16 -0800, george anzinger <[EMAIL PROTECTED]> wrote: >Consider this. Why not use the NMI to sync the cpus. Kdb would have a >function that is called each NMI. kdb uses NMI IPI to get the other cpu's attention. One cpu is in control and may or may not be accepting NMI, it depends on the event that entered kdb. The other cpus end up in kdb code, spinning waiting for a cpu switch. Initially they are not receiving NMI because they were invoked via NMI which is masked until they exit. However if the user does a cpu switch then single steps the interrupted code, the cpu has to return from the NMI handler to the interrupted code at which time this cpu starts receiving NMI again. The kdb context can change from ignoring NMI to accepting NMI. It is easier to bring all the cpus into kdb and let the kdb code decide if it ignores any NMI that is being received. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
Keith Owens wrote: > > On Sun, 11 Mar 2001 08:44:24 +0100 (CET), > Ingo Molnar <[EMAIL PROTECTED]> wrote: > >Andrew, > > > >your patch looks too complex, and doesnt cover the case of the serial > >driver deadlocking. Why not add a "touch_nmi_watchdog_counter()" function > >that just changes last_irq_sums instead of adding locking? This way > >deadlocks will be caught in the serial code too. (because touch_nmi() will > >only "postpone" the NMI watchdog lockup event, not disable it.) > > kdb has to completely disable the nmi counter while it is in control. > All interrupts are disabled, all but one cpus are spinning, the control > cpu does busy wait while it polls the input devices. With that model > there is no alternative to a complete disable. > Consider this. Why not use the NMI to sync the cpus. Kdb would have a function that is called each NMI. If it is doing nothing, just return false, else, if waiting for this cpu, well here it is, put it in spin AFTER saving where it came from so the operator can figure out what it is doing. In kgdb I just put the interrupt registers in the task_struct where they are put when a context switch is done. Then the debugger can do a trace, etc. on that task. A global var that the debugger can see is also set to the cpus, "current". If the cpu is already spinning, return to the nmi code with a true flag which will cause it to ignore the nmi. Same thing if it is the cpu that is doing debug i/o. I went to this for kgdb after the system failed to return from the call to force the other cpus to execute a function (which means they have to be alive). For extra safety I also time the sync. If one or more expected cpus, don't show while looping reading the cycle counter, the code just continues with out the sync. George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
Keith Owens wrote: > > On Sun, 11 Mar 2001 08:53:40 +0100 (CET), > Ingo Molnar <[EMAIL PROTECTED]> wrote: > >it sure has an alternative. The 'cpus spinning' code calls touch_nmi() > >within the busy loop, the polling code on the control CPU too. This is > >sure more robust and catches lockup bugs in kdb too ... > > Works for me. It even makes kdb simpler. humm... OK, this seems doable in the case of serial console. For CONFIG_LP_CONSOLE (which has the same problem) it looks like we can just call touch_nmi() in lp_console_write(). I'll see what Tim has to say. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
On Sun, 11 Mar 2001, Keith Owens wrote: > kdb has to completely disable the nmi counter while it is in control. > All interrupts are disabled, all but one cpus are spinning, the > control cpu does busy wait while it polls the input devices. With > that model there is no alternative to a complete disable. it sure has an alternative. The 'cpus spinning' code calls touch_nmi() within the busy loop, the polling code on the control CPU too. This is sure more robust and catches lockup bugs in kdb too ... Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
On Sun, 11 Mar 2001 08:44:24 +0100 (CET), Ingo Molnar <[EMAIL PROTECTED]> wrote: >Andrew, > >your patch looks too complex, and doesnt cover the case of the serial >driver deadlocking. Why not add a "touch_nmi_watchdog_counter()" function >that just changes last_irq_sums instead of adding locking? This way >deadlocks will be caught in the serial code too. (because touch_nmi() will >only "postpone" the NMI watchdog lockup event, not disable it.) kdb has to completely disable the nmi counter while it is in control. All interrupts are disabled, all but one cpus are spinning, the control cpu does busy wait while it polls the input devices. With that model there is no alternative to a complete disable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
On Sun, 11 Mar 2001 08:53:40 +0100 (CET), Ingo Molnar <[EMAIL PROTECTED]> wrote: >it sure has an alternative. The 'cpus spinning' code calls touch_nmi() >within the busy loop, the polling code on the control CPU too. This is >sure more robust and catches lockup bugs in kdb too ... Works for me. It even makes kdb simpler. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
Andrew, your patch looks too complex, and doesnt cover the case of the serial driver deadlocking. Why not add a "touch_nmi_watchdog_counter()" function that just changes last_irq_sums instead of adding locking? This way deadlocks will be caught in the serial code too. (because touch_nmi() will only "postpone" the NMI watchdog lockup event, not disable it.) This should be a matter of 5 lines ... Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
Ion Badulescu wrote: > > On Sat, 10 Mar 2001 01:21:25 +1100, Andrew Morton <[EMAIL PROTECTED]> wrote: > > > +/** > > + * enable_nmi_watchdog - enables/disables NMI watchdog checking. > > + * @yes: If zero, disable > > Ugh. I have a feeling that your chances to get Linus to accept this are > extremely slim. > > Just have two functions, enable_nmi_watchdog and disable_nmi_watchdog. > You can make them inline, or even macros... You're right. --- linux-2.4.2-ac16/include/linux/irq.hFri Mar 9 17:11:17 2001 +++ linux-ac/include/linux/irq.hSat Mar 10 13:34:22 2001 @@ -56,6 +56,21 @@ #include /* the arch dependent stuff */ +/** + * nmi_watchdog_disable - disable NMI watchdog checking. + * + * If the architecture supports the NMI watchdog, nmi_watchdog_disable() may be used + * to temporarily disable it. Use nmi_watchdog_enable() later on. It is implemented + * via an up/down counter, so you must keep the calls balanced. + */ +#ifdef ARCH_HAS_NMI_WATCHDOG +extern void nmi_watchdog_disable(void); +extern void nmi_watchdog_enable(void); +#else +#define nmi_watchdog_disable() do{} while(0) +#define nmi_watchdog_enable() do{} while(0) +#endif + extern int handle_IRQ_event(unsigned int, struct pt_regs *, struct irqaction *); extern int setup_irq(unsigned int , struct irqaction * ); --- linux-2.4.2-ac16/include/asm-i386/irq.h Fri Oct 8 03:17:09 1999 +++ linux-ac/include/asm-i386/irq.h Sat Mar 10 02:17:47 2001 @@ -32,4 +32,8 @@ extern void disable_irq_nosync(unsigned int); extern void enable_irq(unsigned int); +#ifdef CONFIG_X86_LOCAL_APIC +#define ARCH_HAS_NMI_WATCHDOG /* See include/linux/irq.h */ +#endif + #endif /* _ASM_IRQ_H */ --- linux-2.4.2-ac16/drivers/char/sysrq.c Sun Feb 25 17:37:04 2001 +++ linux-ac/drivers/char/sysrq.c Sat Mar 10 13:07:46 2001 @@ -23,6 +23,7 @@ #include #include #include +#include #include @@ -69,6 +70,11 @@ if (!key) return; + /* +* Interrupts are disabled, and serial consoles are slow. So +* Let's suspend the NMI watchdog. +*/ + nmi_watchdog_disable(); console_loglevel = 7; printk(KERN_INFO "SysRq: "); switch (key) { @@ -152,6 +158,7 @@ /* Don't use 'A' as it's handled specially on the Sparc */ } + nmi_watchdog_enable(); console_loglevel = orig_log_level; } --- linux-2.4.2-ac16/arch/i386/kernel/nmi.c Fri Mar 9 17:10:51 2001 +++ linux-ac/arch/i386/kernel/nmi.c Sat Mar 10 13:21:59 2001 @@ -226,6 +226,17 @@ } static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; +static atomic_t nmi_watchdog_disabled = ATOMIC_INIT(0); + +void nmi_watchdog_disable(void) +{ + atomic_inc(&nmi_watchdog_disabled); +} + +void nmi_watchdog_enable(void) +{ + atomic_dec(&nmi_watchdog_disabled); +} void nmi_watchdog_tick (struct pt_regs * regs) { @@ -255,7 +266,7 @@ sum = apic_timer_irqs[cpu]; - if (last_irq_sums[cpu] == sum) { + if (last_irq_sums[cpu] == sum && atomic_read(&nmi_watchdog_disabled) == 0) { /* * Ayiee, looks like this CPU is stuck ... * wait a few IRQs (5 seconds) before doing the oops ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
On Sat, Mar 10, 2001 at 01:21:25AM +1100, Andrew Morton wrote: > +static atomic_t nmi_watchdog_enabled = ATOMIC_INIT(0); /* 0 == enabled */ > + > +void enable_nmi_watchdog(int yes) > +{ > + if (yes) > + atomic_inc(&nmi_watchdog_enabled); > + else > + atomic_dec(&nmi_watchdog_enabled); > +} > > void nmi_watchdog_tick (struct pt_regs * regs) > { > @@ -255,7 +264,7 @@ > > sum = apic_timer_irqs[cpu]; > > - if (last_irq_sums[cpu] == sum) { > + if (last_irq_sums[cpu] == sum && atomic_read(&nmi_watchdog_enabled) == 0) { Shouldn't that be atomic_read(&nmi_watchdog_enabled) != 0? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] serial console vs NMI watchdog
On Sat, 10 Mar 2001 01:21:25 +1100, Andrew Morton <[EMAIL PROTECTED]> wrote: > +/** > + * enable_nmi_watchdog - enables/disables NMI watchdog checking. > + * @yes: If zero, disable Ugh. I have a feeling that your chances to get Linus to accept this are extremely slim. Just have two functions, enable_nmi_watchdog and disable_nmi_watchdog. You can make them inline, or even macros... Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] serial console vs NMI watchdog
SYSRQ-T on serial console can crash the machine. This is because a large amount of output is sent to a slow device while interrupts are disabled. The NMI watchdog triggers. The interrupt disabling happens in pc_keyb.c:keyboard_interrupt(). Changing this code to *not* disable interrupts looks complex. I see two ways of fixing this. One is to do the sysrq stuff outside the spin_lock_irq(), with: static void keyboard_interrupt(int irq, void *dev_id, struct pt_regs *regs) { + extern void (*sysrq_handler)(void); + void (*my_sysrq_handler)(void); spin_lock_irq(&kbd_controller_lock); handle_kbd_event(); + my_sysrq_handler = sysrq_handler; + sysrq_handler = 0; spin_unlock_irq(&kbd_controller_lock); + if (my_sysrq_handler) + (*my_sysrq_handler)(); } But I didn't do that, because I suspect there are other places in the kernel (development and debug stuff) where we want to turn the NMI watchdog handler off for a while. So this patch creates a new API function enable_nmi_watchdog(int yes); and uses it within the sysrq code. BTW: NMI watchdog is now disabled by default in 2.4.3-pre3. The `nmi_watchdog=1' boot option is needed to enable it. --- linux-2.4.2-ac16/include/linux/irq.hFri Mar 9 17:11:17 2001 +++ linux-ac/include/linux/irq.hSat Mar 10 01:02:12 2001 @@ -56,6 +56,20 @@ #include /* the arch dependent stuff */ +/** + * enable_nmi_watchdog - enables/disables NMI watchdog checking. + * @yes: If zero, disable + * + * If the architecture supports the NMI watchdog, enable_nmi_watchdog() may be used + * to temporarily disable it. Calls to enable_nmi_watchdog() may be nested - it is + * implemented as an up/down counter, so the calls must be balanced. + */ +#ifdef ARCH_HAS_NMI_WATCHDOG +extern void enable_nmi_watchdog(int yes); +#else +#define enable_nmi_watchdog(yes) do{} while(0) +#endif + extern int handle_IRQ_event(unsigned int, struct pt_regs *, struct irqaction *); extern int setup_irq(unsigned int , struct irqaction * ); --- linux-2.4.2-ac16/include/asm-i386/irq.h Fri Oct 8 03:17:09 1999 +++ linux-ac/include/asm-i386/irq.h Fri Mar 9 22:59:15 2001 @@ -32,4 +32,8 @@ extern void disable_irq_nosync(unsigned int); extern void enable_irq(unsigned int); +#ifdef CONFIG_X86_LOCAL_APIC +#define ARCH_HAS_NMI_WATCHDOG /* See include/linux/irq.h */ +#endif + #endif /* _ASM_IRQ_H */ --- linux-2.4.2-ac16/drivers/char/sysrq.c Sun Feb 25 17:37:04 2001 +++ linux-ac/drivers/char/sysrq.c Fri Mar 9 23:00:39 2001 @@ -23,6 +23,7 @@ #include #include #include +#include #include @@ -69,6 +70,11 @@ if (!key) return; + /* +* Interrupts are disabled, and serial consoles are slow. So +* Let's suspend the NMI watchdog. +*/ + enable_nmi_watchdog(0); console_loglevel = 7; printk(KERN_INFO "SysRq: "); switch (key) { @@ -152,6 +158,7 @@ /* Don't use 'A' as it's handled specially on the Sparc */ } + enable_nmi_watchdog(1); console_loglevel = orig_log_level; } --- linux-2.4.2-ac16/arch/i386/kernel/nmi.c Fri Mar 9 17:10:51 2001 +++ linux-ac/arch/i386/kernel/nmi.c Sat Mar 10 01:10:50 2001 @@ -226,6 +226,15 @@ } static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; +static atomic_t nmi_watchdog_enabled = ATOMIC_INIT(0); /* 0 == enabled */ + +void enable_nmi_watchdog(int yes) +{ + if (yes) + atomic_inc(&nmi_watchdog_enabled); + else + atomic_dec(&nmi_watchdog_enabled); +} void nmi_watchdog_tick (struct pt_regs * regs) { @@ -255,7 +264,7 @@ sum = apic_timer_irqs[cpu]; - if (last_irq_sums[cpu] == sum) { + if (last_irq_sums[cpu] == sum && atomic_read(&nmi_watchdog_enabled) == 0) { /* * Ayiee, looks like this CPU is stuck ... * wait a few IRQs (5 seconds) before doing the oops ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/