[PATCH] Re: [PATCH] NMI watch dog notify patch
It seems that the subject patch generates a warning (missed it on the compile). Here is a patch to eliminate the warning. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: This patch eliminates the warning generated in die_nmi() when calling notify_die() by adding "const" to notify_die()'s definition. Signed-off-by: George Anzinger Index: linux-2.6.13-rc/include/asm-i386/kdebug.h === --- linux-2.6.13-rc.orig/include/asm-i386/kdebug.h +++ linux-2.6.13-rc/include/asm-i386/kdebug.h @@ -41,7 +41,7 @@ enum die_val { DIE_PAGE_FAULT, }; -static inline int notify_die(enum die_val val,char *str,struct pt_regs *regs,long err,int trap, int sig) +static inline int notify_die(enum die_val val, const char *str,struct pt_regs *regs,long err,int trap, int sig) { struct die_args args = { .regs=regs, .str=str, .err=err, .trapnr=trap,.signr=sig }; return notifier_call_chain(_chain, val, );
[PATCH] Re: [PATCH] NMI watch dog notify patch
It seems that the subject patch generates a warning (missed it on the compile). Here is a patch to eliminate the warning. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzingergeorge@mvista.com Type: Defect Fix Description: This patch eliminates the warning generated in die_nmi() when calling notify_die() by adding const to notify_die()'s definition. Signed-off-by: George Anzinger george@mvista.com Index: linux-2.6.13-rc/include/asm-i386/kdebug.h === --- linux-2.6.13-rc.orig/include/asm-i386/kdebug.h +++ linux-2.6.13-rc/include/asm-i386/kdebug.h @@ -41,7 +41,7 @@ enum die_val { DIE_PAGE_FAULT, }; -static inline int notify_die(enum die_val val,char *str,struct pt_regs *regs,long err,int trap, int sig) +static inline int notify_die(enum die_val val, const char *str,struct pt_regs *regs,long err,int trap, int sig) { struct die_args args = { .regs=regs, .str=str, .err=err, .trapnr=trap,.signr=sig }; return notifier_call_chain(i386die_chain, val, args);
Re: [PATCH] NMI watch dog notify patch
Keith Owens wrote: On Fri, 29 Jul 2005 13:55:23 -0700, George Anzinger wrote: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(_print_lock); /* * We are in trouble anyway, lets at least try Minor nitpick. die_nmi() already gets a message passed in to distinguish between different types of nmi. Pass that message to notify_die(), on the off chance that the notified routines can use that difference. Excellent idea! Also your patch adds a trailing whitespace on the call to notify_die(). Fixed. This should do it. - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Enhancement Description: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger nmi.c |5 - traps.c |4 2 files changed, 8 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs * */ alert_counter[cpu]++; if (alert_counter[cpu] == 5*nmi_hz) + /* +* die_nmi will return ONLY if NOTIFY_STOP happens.. +*/ die_nmi(regs, "NMI Watchdog detected LOCKUP"); - } else { + last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } Index: linux-2.6.13-rc/arch/i386/kernel/traps.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c +++ linux-2.6.13-rc/arch/i386/kernel/traps.c @@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock); void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) == + NOTIFY_STOP) + return; + spin_lock(_print_lock); /* * We are in trouble anyway, lets at least try
Re: [PATCH] NMI watch dog notify patch
Keith Owens wrote: On Fri, 29 Jul 2005 13:55:23 -0700, George Anzinger george@mvista.com wrote: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(nmi_print_lock); /* * We are in trouble anyway, lets at least try Minor nitpick. die_nmi() already gets a message passed in to distinguish between different types of nmi. Pass that message to notify_die(), on the off chance that the notified routines can use that difference. Excellent idea! Also your patch adds a trailing whitespace on the call to notify_die(). Fixed. This should do it. - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Enhancement Description: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with other cpus held. Signed-off-by: George Anzingergeorge@mvista.com nmi.c |5 - traps.c |4 2 files changed, 8 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs * */ alert_counter[cpu]++; if (alert_counter[cpu] == 5*nmi_hz) + /* +* die_nmi will return ONLY if NOTIFY_STOP happens.. +*/ die_nmi(regs, NMI Watchdog detected LOCKUP); - } else { + last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } Index: linux-2.6.13-rc/arch/i386/kernel/traps.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c +++ linux-2.6.13-rc/arch/i386/kernel/traps.c @@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock); void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) == + NOTIFY_STOP) + return; + spin_lock(nmi_print_lock); /* * We are in trouble anyway, lets at least try
Re: [PATCH] NMI watch dog notify patch
On Fri, 29 Jul 2005 13:55:23 -0700, George Anzinger wrote: > This patch adds a notify to the die_nmi notify that the system > is about to be taken down. If the notify is handled with a > NOTIFY_STOP return, the system is given a new lease on life. > > void die_nmi (struct pt_regs *regs, const char *msg) > { >+ if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, >+ 0, 0, SIGINT) == NOTIFY_STOP) >+ return; >+ > spin_lock(_print_lock); > /* > * We are in trouble anyway, lets at least try Minor nitpick. die_nmi() already gets a message passed in to distinguish between different types of nmi. Pass that message to notify_die(), on the off chance that the notified routines can use that difference. Also your patch adds a trailing whitespace on the call to notify_die(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Andrew Morton wrote: Keith Owens <[EMAIL PROTECTED]> wrote: I had though that too, but it does not allow recovery (i.e. lets reset >the watchdog and try again). die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the reset and continues. Patch below. >Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi >trap, only from the watchdog. Also, there is a notify in the path to >the other nmi stuff. I was looking at unknown_nmi_panic_callback(), which also calls die_nmi(). traps.c already has several notify_die() calls, nmi.c has none. It is cleaner to keep all the notification in traps.c, with this small change to nmi.c to cope with die_nmi() returning. Index: linux/arch/i386/kernel/nmi.c === --- linux.orig/arch/i386/kernel/nmi.c 2005-07-28 17:22:06.735038510 +1000 +++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000 @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) + if (alert_counter[cpu] == 5*nmi_hz) { die_nmi(regs, "NMI Watchdog detected LOCKUP"); + alert_counter[cpu] = 0; + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; That all makes sense - let's go that way? Looks good to me. Trimed a bit more fat too. Here is the complete patch. - - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Enhancement Description: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger nmi.c |5 - traps.c |4 2 files changed, 8 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs * */ alert_counter[cpu]++; if (alert_counter[cpu] == 5*nmi_hz) + /* +* die_nmi will return ONLY if NOTIFY_STOP happens.. +*/ die_nmi(regs, "NMI Watchdog detected LOCKUP"); - } else { + last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } Index: linux-2.6.13-rc/arch/i386/kernel/traps.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c +++ linux-2.6.13-rc/arch/i386/kernel/traps.c @@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock); void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(_print_lock); /* * We are in trouble anyway, lets at least try
Re: [PATCH] NMI watch dog notify patch
Andrew Morton wrote: Keith Owens [EMAIL PROTECTED] wrote: I had though that too, but it does not allow recovery (i.e. lets reset the watchdog and try again). die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the reset and continues. Patch below. Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi trap, only from the watchdog. Also, there is a notify in the path to the other nmi stuff. I was looking at unknown_nmi_panic_callback(), which also calls die_nmi(). traps.c already has several notify_die() calls, nmi.c has none. It is cleaner to keep all the notification in traps.c, with this small change to nmi.c to cope with die_nmi() returning. Index: linux/arch/i386/kernel/nmi.c === --- linux.orig/arch/i386/kernel/nmi.c 2005-07-28 17:22:06.735038510 +1000 +++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000 @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) + if (alert_counter[cpu] == 5*nmi_hz) { die_nmi(regs, NMI Watchdog detected LOCKUP); + alert_counter[cpu] = 0; + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; That all makes sense - let's go that way? Looks good to me. Trimed a bit more fat too. Here is the complete patch. - - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Enhancement Description: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with other cpus held. Signed-off-by: George Anzingergeorge@mvista.com nmi.c |5 - traps.c |4 2 files changed, 8 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs * */ alert_counter[cpu]++; if (alert_counter[cpu] == 5*nmi_hz) + /* +* die_nmi will return ONLY if NOTIFY_STOP happens.. +*/ die_nmi(regs, NMI Watchdog detected LOCKUP); - } else { + last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } Index: linux-2.6.13-rc/arch/i386/kernel/traps.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c +++ linux-2.6.13-rc/arch/i386/kernel/traps.c @@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock); void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(nmi_print_lock); /* * We are in trouble anyway, lets at least try
Re: [PATCH] NMI watch dog notify patch
On Fri, 29 Jul 2005 13:55:23 -0700, George Anzinger george@mvista.com wrote: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(nmi_print_lock); /* * We are in trouble anyway, lets at least try Minor nitpick. die_nmi() already gets a message passed in to distinguish between different types of nmi. Pass that message to notify_die(), on the off chance that the notified routines can use that difference. Also your patch adds a trailing whitespace on the call to notify_die(). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Keith Owens <[EMAIL PROTECTED]> wrote: > > >I had though that too, but it does not allow recovery (i.e. lets reset > >the watchdog and try again). > > die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the > reset and continues. Patch below. > > >Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi > >trap, only from the watchdog. Also, there is a notify in the path to > >the other nmi stuff. > > I was looking at unknown_nmi_panic_callback(), which also calls > die_nmi(). > > traps.c already has several notify_die() calls, nmi.c has none. It is > cleaner to keep all the notification in traps.c, with this small change > to nmi.c to cope with die_nmi() returning. > > Index: linux/arch/i386/kernel/nmi.c > === > --- linux.orig/arch/i386/kernel/nmi.c2005-07-28 17:22:06.735038510 > +1000 > +++ linux/arch/i386/kernel/nmi.c 2005-07-29 15:19:00.371196596 +1000 > @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * >* wait a few IRQs (5 seconds) before doing the oops ... >*/ > alert_counter[cpu]++; > -if (alert_counter[cpu] == 5*nmi_hz) > +if (alert_counter[cpu] == 5*nmi_hz) { > die_nmi(regs, "NMI Watchdog detected LOCKUP"); > +alert_counter[cpu] = 0; > +} > } else { > last_irq_sums[cpu] = sum; > alert_counter[cpu] = 0; That all makes sense - let's go that way? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
On Thu, 28 Jul 2005 21:16:56 -0700, George Anzinger wrote: >Keith Owens wrote: >> On Thu, 28 Jul 2005 13:31:58 -0700, >> George Anzinger wrote: >> >>>I have been doing some work on kgdb to pull a few of it "fingers" out of >>>various places in the kernel. This is the final location where we have >>>a kgdb intercept not covered by a notify. >> >> >> I like the idea, but the hook should be in die_nmi(), not in the >> watchdog, using the reason that is already passed into die_nmi. >> die_nmi() is also called for a real NMI. >> >I had though that too, but it does not allow recovery (i.e. lets reset >the watchdog and try again). die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the reset and continues. Patch below. >Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi >trap, only from the watchdog. Also, there is a notify in the path to >the other nmi stuff. I was looking at unknown_nmi_panic_callback(), which also calls die_nmi(). traps.c already has several notify_die() calls, nmi.c has none. It is cleaner to keep all the notification in traps.c, with this small change to nmi.c to cope with die_nmi() returning. Index: linux/arch/i386/kernel/nmi.c === --- linux.orig/arch/i386/kernel/nmi.c 2005-07-28 17:22:06.735038510 +1000 +++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000 @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) + if (alert_counter[cpu] == 5*nmi_hz) { die_nmi(regs, "NMI Watchdog detected LOCKUP"); + alert_counter[cpu] = 0; + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Keith Owens wrote: On Thu, 28 Jul 2005 13:31:58 -0700, George Anzinger wrote: I have been doing some work on kgdb to pull a few of it "fingers" out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. I like the idea, but the hook should be in die_nmi(), not in the watchdog, using the reason that is already passed into die_nmi. die_nmi() is also called for a real NMI. I had though that too, but it does not allow recovery (i.e. lets reset the watchdog and try again). Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi trap, only from the watchdog. Also, there is a notify in the path to the other nmi stuff. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
On Thu, 28 Jul 2005 13:31:58 -0700, George Anzinger wrote: >I have been doing some work on kgdb to pull a few of it "fingers" out of >various places in the kernel. This is the final location where we have >a kgdb intercept not covered by a notify. I like the idea, but the hook should be in die_nmi(), not in the watchdog, using the reason that is already passed into die_nmi. die_nmi() is also called for a real NMI. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
George Anzinger wrote: > > Andrew Morton wrote: > > George Anzinger wrote: > > > >>This patch adds a notify to the nmi watchdog to notify that > >>the system is about to be taken down by the watchdog. If the > >>notify is handled with a NOTIFY_STOP return, the system is > >>given a new lease on life. > > > > > > It looks sensible, but as there aren't actually any in-kernel uses for this > > I'd have thought it would be better for it to live out-of-tree? > > I should just bundle it with the kgdb patch then? I spose so, for now. If kdb and/or nlkd could benefit from it then it might simplify life to merge it into mainline. Perhaps you could ping Keith Owens <[EMAIL PROTECTED]> and [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Andrew Morton wrote: George Anzinger wrote: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. It looks sensible, but as there aren't actually any in-kernel uses for this I'd have thought it would be better for it to live out-of-tree? I should just bundle it with the kgdb patch then? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
George Anzinger wrote: > > This patch adds a notify to the nmi watchdog to notify that > the system is about to be taken down by the watchdog. If the > notify is handled with a NOTIFY_STOP return, the system is > given a new lease on life. It looks sensible, but as there aren't actually any in-kernel uses for this I'd have thought it would be better for it to live out-of-tree? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] NMI watch dog notify patch
Andrew, I have been doing some work on kgdb to pull a few of it "fingers" out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. On a related issue, I feel very queasy with sending nmi interrupts and non-nmi events to the same notify code. Would you be open to a patch to create a seperate notify list for nmi events? - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Enhancement Description: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger nmi.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -26,11 +26,13 @@ #include #include #include +#include #include #include #include #include +#include #include "mach_traps.h" @@ -494,8 +496,15 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) - die_nmi(regs, "NMI Watchdog detected LOCKUP"); + if (alert_counter[cpu] == 5*nmi_hz) { + if (notify_die(DIE_NMIWATCHDOG, "nmi_ipi_watchdog", + regs, 0, 0, SIGINT) == NOTIFY_STOP) { + last_irq_sums[cpu] = sum; + alert_counter[cpu] = 0; + } else { + die_nmi(regs, "NMI Watchdog detected LOCKUP"); + } + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; @@ -555,7 +564,7 @@ int proc_unknown_nmi_panic(ctl_table *ta return -EBUSY; } else { set_nmi_callback(unknown_nmi_panic_callback); - } + } } else { release_lapic_nmi(); unset_nmi_callback();
[PATCH] NMI watch dog notify patch
Andrew, I have been doing some work on kgdb to pull a few of it fingers out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. On a related issue, I feel very queasy with sending nmi interrupts and non-nmi events to the same notify code. Would you be open to a patch to create a seperate notify list for nmi events? - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Enhancement Description: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with other cpus held. Signed-off-by: George Anzingergeorge@mvista.com nmi.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -26,11 +26,13 @@ #include linux/nmi.h #include linux/sysdev.h #include linux/sysctl.h +#include linux/notifier.h #include asm/smp.h #include asm/mtrr.h #include asm/mpspec.h #include asm/nmi.h +#include asm/kdebug.h #include mach_traps.h @@ -494,8 +496,15 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) - die_nmi(regs, NMI Watchdog detected LOCKUP); + if (alert_counter[cpu] == 5*nmi_hz) { + if (notify_die(DIE_NMIWATCHDOG, nmi_ipi_watchdog, + regs, 0, 0, SIGINT) == NOTIFY_STOP) { + last_irq_sums[cpu] = sum; + alert_counter[cpu] = 0; + } else { + die_nmi(regs, NMI Watchdog detected LOCKUP); + } + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; @@ -555,7 +564,7 @@ int proc_unknown_nmi_panic(ctl_table *ta return -EBUSY; } else { set_nmi_callback(unknown_nmi_panic_callback); - } + } } else { release_lapic_nmi(); unset_nmi_callback();
Re: [PATCH] NMI watch dog notify patch
George Anzinger george@mvista.com wrote: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. It looks sensible, but as there aren't actually any in-kernel uses for this I'd have thought it would be better for it to live out-of-tree? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Andrew Morton wrote: George Anzinger george@mvista.com wrote: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. It looks sensible, but as there aren't actually any in-kernel uses for this I'd have thought it would be better for it to live out-of-tree? I should just bundle it with the kgdb patch then? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
George Anzinger george@mvista.com wrote: Andrew Morton wrote: George Anzinger george@mvista.com wrote: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. It looks sensible, but as there aren't actually any in-kernel uses for this I'd have thought it would be better for it to live out-of-tree? I should just bundle it with the kgdb patch then? I spose so, for now. If kdb and/or nlkd could benefit from it then it might simplify life to merge it into mainline. Perhaps you could ping Keith Owens [EMAIL PROTECTED] and [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
On Thu, 28 Jul 2005 13:31:58 -0700, George Anzinger george@mvista.com wrote: I have been doing some work on kgdb to pull a few of it fingers out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. I like the idea, but the hook should be in die_nmi(), not in the watchdog, using the reason that is already passed into die_nmi. die_nmi() is also called for a real NMI. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Keith Owens wrote: On Thu, 28 Jul 2005 13:31:58 -0700, George Anzinger george@mvista.com wrote: I have been doing some work on kgdb to pull a few of it fingers out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. I like the idea, but the hook should be in die_nmi(), not in the watchdog, using the reason that is already passed into die_nmi. die_nmi() is also called for a real NMI. I had though that too, but it does not allow recovery (i.e. lets reset the watchdog and try again). Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi trap, only from the watchdog. Also, there is a notify in the path to the other nmi stuff. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
On Thu, 28 Jul 2005 21:16:56 -0700, George Anzinger george@mvista.com wrote: Keith Owens wrote: On Thu, 28 Jul 2005 13:31:58 -0700, George Anzinger george@mvista.com wrote: I have been doing some work on kgdb to pull a few of it fingers out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. I like the idea, but the hook should be in die_nmi(), not in the watchdog, using the reason that is already passed into die_nmi. die_nmi() is also called for a real NMI. I had though that too, but it does not allow recovery (i.e. lets reset the watchdog and try again). die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the reset and continues. Patch below. Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi trap, only from the watchdog. Also, there is a notify in the path to the other nmi stuff. I was looking at unknown_nmi_panic_callback(), which also calls die_nmi(). traps.c already has several notify_die() calls, nmi.c has none. It is cleaner to keep all the notification in traps.c, with this small change to nmi.c to cope with die_nmi() returning. Index: linux/arch/i386/kernel/nmi.c === --- linux.orig/arch/i386/kernel/nmi.c 2005-07-28 17:22:06.735038510 +1000 +++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000 @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) + if (alert_counter[cpu] == 5*nmi_hz) { die_nmi(regs, NMI Watchdog detected LOCKUP); + alert_counter[cpu] = 0; + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Keith Owens [EMAIL PROTECTED] wrote: I had though that too, but it does not allow recovery (i.e. lets reset the watchdog and try again). die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the reset and continues. Patch below. Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi trap, only from the watchdog. Also, there is a notify in the path to the other nmi stuff. I was looking at unknown_nmi_panic_callback(), which also calls die_nmi(). traps.c already has several notify_die() calls, nmi.c has none. It is cleaner to keep all the notification in traps.c, with this small change to nmi.c to cope with die_nmi() returning. Index: linux/arch/i386/kernel/nmi.c === --- linux.orig/arch/i386/kernel/nmi.c2005-07-28 17:22:06.735038510 +1000 +++ linux/arch/i386/kernel/nmi.c 2005-07-29 15:19:00.371196596 +1000 @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; -if (alert_counter[cpu] == 5*nmi_hz) +if (alert_counter[cpu] == 5*nmi_hz) { die_nmi(regs, NMI Watchdog detected LOCKUP); +alert_counter[cpu] = 0; +} } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; That all makes sense - let's go that way? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/