[PATCH] Re: [PATCH] NMI watch dog notify patch

2005-08-02 Thread George Anzinger
It seems that the subject patch generates a warning (missed it on the 
compile).  Here is a patch to eliminate the warning.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger
Type: Defect Fix 

Description:
This patch eliminates the warning generated in die_nmi() when
calling notify_die() by adding "const" to notify_die()'s
definition.

Signed-off-by: George Anzinger 

Index: linux-2.6.13-rc/include/asm-i386/kdebug.h
===
--- linux-2.6.13-rc.orig/include/asm-i386/kdebug.h
+++ linux-2.6.13-rc/include/asm-i386/kdebug.h
@@ -41,7 +41,7 @@ enum die_val {
DIE_PAGE_FAULT,
 };
 
-static inline int notify_die(enum die_val val,char *str,struct pt_regs 
*regs,long err,int trap, int sig)
+static inline int notify_die(enum die_val val, const char *str,struct pt_regs 
*regs,long err,int trap, int sig)
 {
struct die_args args = { .regs=regs, .str=str, .err=err, 
.trapnr=trap,.signr=sig };
return notifier_call_chain(_chain, val, );


[PATCH] Re: [PATCH] NMI watch dog notify patch

2005-08-02 Thread George Anzinger
It seems that the subject patch generates a warning (missed it on the 
compile).  Here is a patch to eliminate the warning.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzingergeorge@mvista.com
Type: Defect Fix 

Description:
This patch eliminates the warning generated in die_nmi() when
calling notify_die() by adding const to notify_die()'s
definition.

Signed-off-by: George Anzinger george@mvista.com

Index: linux-2.6.13-rc/include/asm-i386/kdebug.h
===
--- linux-2.6.13-rc.orig/include/asm-i386/kdebug.h
+++ linux-2.6.13-rc/include/asm-i386/kdebug.h
@@ -41,7 +41,7 @@ enum die_val {
DIE_PAGE_FAULT,
 };
 
-static inline int notify_die(enum die_val val,char *str,struct pt_regs 
*regs,long err,int trap, int sig)
+static inline int notify_die(enum die_val val, const char *str,struct pt_regs 
*regs,long err,int trap, int sig)
 {
struct die_args args = { .regs=regs, .str=str, .err=err, 
.trapnr=trap,.signr=sig };
return notifier_call_chain(i386die_chain, val, args);


Re: [PATCH] NMI watch dog notify patch

2005-08-01 Thread George Anzinger

Keith Owens wrote:
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger  wrote:



This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

void die_nmi (struct pt_regs *regs, const char *msg)
{
+	if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, 
+		   0, 0, SIGINT) == NOTIFY_STOP)

+   return;
+
spin_lock(_print_lock);
/*
* We are in trouble anyway, lets at least try



Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.


Excellent idea!


Also your patch adds a trailing whitespace on the call to notify_die().


Fixed.

This should do it.
-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Enhancement 
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger

 nmi.c   |5 -
 traps.c |4 
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
 */
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+   /*
+* die_nmi will return ONLY if NOTIFY_STOP happens..
+*/
die_nmi(regs, "NMI Watchdog detected LOCKUP");
-   } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+   if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) ==
+   NOTIFY_STOP)
+   return;
+
spin_lock(_print_lock);
/*
* We are in trouble anyway, lets at least try


Re: [PATCH] NMI watch dog notify patch

2005-08-01 Thread George Anzinger

Keith Owens wrote:
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger george@mvista.com wrote:



This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

void die_nmi (struct pt_regs *regs, const char *msg)
{
+	if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, 
+		   0, 0, SIGINT) == NOTIFY_STOP)

+   return;
+
spin_lock(nmi_print_lock);
/*
* We are in trouble anyway, lets at least try



Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.


Excellent idea!


Also your patch adds a trailing whitespace on the call to notify_die().


Fixed.

This should do it.
-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type: Enhancement 
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with other cpus held.

Signed-off-by: George Anzingergeorge@mvista.com

 nmi.c   |5 -
 traps.c |4 
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
 */
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+   /*
+* die_nmi will return ONLY if NOTIFY_STOP happens..
+*/
die_nmi(regs, NMI Watchdog detected LOCKUP);
-   } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+   if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) ==
+   NOTIFY_STOP)
+   return;
+
spin_lock(nmi_print_lock);
/*
* We are in trouble anyway, lets at least try


Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger  wrote:
>   This patch adds a notify to the die_nmi notify that the system
>   is about to be taken down.  If the notify is handled with a
>   NOTIFY_STOP return, the system is given a new lease on life.
> 
> void die_nmi (struct pt_regs *regs, const char *msg)
> {
>+  if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, 
>+ 0, 0, SIGINT) == NOTIFY_STOP)
>+  return;
>+
>   spin_lock(_print_lock);
>   /*
>   * We are in trouble anyway, lets at least try

Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.

Also your patch adds a trailing whitespace on the call to notify_die().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread George Anzinger

Andrew Morton wrote:

Keith Owens <[EMAIL PROTECTED]> wrote:

I had though that too, but it does not allow recovery (i.e. lets reset 


>the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

>Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
>trap, only from the watchdog.  Also, there is a notify in the path to 
>the other nmi stuff.


I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;



That all makes sense - let's go that way?


Looks good to me.  Trimed a bit more fat too.  Here is the complete patch.

-

-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Enhancement 
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger

 nmi.c   |5 -
 traps.c |4 
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
 */
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+   /*
+* die_nmi will return ONLY if NOTIFY_STOP happens..
+*/
die_nmi(regs, "NMI Watchdog detected LOCKUP");
-   } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+   if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, 
+  0, 0, SIGINT) == NOTIFY_STOP)
+   return;
+
spin_lock(_print_lock);
/*
* We are in trouble anyway, lets at least try


Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread George Anzinger

Andrew Morton wrote:

Keith Owens [EMAIL PROTECTED] wrote:

I had though that too, but it does not allow recovery (i.e. lets reset 


the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
trap, only from the watchdog.  Also, there is a notify in the path to 
the other nmi stuff.


I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, NMI Watchdog detected LOCKUP);
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;



That all makes sense - let's go that way?


Looks good to me.  Trimed a bit more fat too.  Here is the complete patch.

-

-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type: Enhancement 
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with other cpus held.

Signed-off-by: George Anzingergeorge@mvista.com

 nmi.c   |5 -
 traps.c |4 
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
 */
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+   /*
+* die_nmi will return ONLY if NOTIFY_STOP happens..
+*/
die_nmi(regs, NMI Watchdog detected LOCKUP);
-   } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+   if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, 
+  0, 0, SIGINT) == NOTIFY_STOP)
+   return;
+
spin_lock(nmi_print_lock);
/*
* We are in trouble anyway, lets at least try


Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger george@mvista.com wrote:
   This patch adds a notify to the die_nmi notify that the system
   is about to be taken down.  If the notify is handled with a
   NOTIFY_STOP return, the system is given a new lease on life.
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+  if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, 
+ 0, 0, SIGINT) == NOTIFY_STOP)
+  return;
+
   spin_lock(nmi_print_lock);
   /*
   * We are in trouble anyway, lets at least try

Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.

Also your patch adds a trailing whitespace on the call to notify_die().

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Andrew Morton
Keith Owens <[EMAIL PROTECTED]> wrote:
>
> >I had though that too, but it does not allow recovery (i.e. lets reset 
>  >the watchdog and try again).
> 
>  die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
>  reset and continues.  Patch below.
> 
>  >Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
>  >trap, only from the watchdog.  Also, there is a notify in the path to 
>  >the other nmi stuff.
> 
>  I was looking at unknown_nmi_panic_callback(), which also calls
>  die_nmi().
> 
>  traps.c already has several notify_die() calls, nmi.c has none.  It is
>  cleaner to keep all the notification in traps.c, with this small change
>  to nmi.c to cope with die_nmi() returning.
> 
>  Index: linux/arch/i386/kernel/nmi.c
>  ===
>  --- linux.orig/arch/i386/kernel/nmi.c2005-07-28 17:22:06.735038510 
> +1000
>  +++ linux/arch/i386/kernel/nmi.c 2005-07-29 15:19:00.371196596 +1000
>  @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
>* wait a few IRQs (5 seconds) before doing the oops ...
>*/
>   alert_counter[cpu]++;
>  -if (alert_counter[cpu] == 5*nmi_hz)
>  +if (alert_counter[cpu] == 5*nmi_hz) {
>   die_nmi(regs, "NMI Watchdog detected LOCKUP");
>  +alert_counter[cpu] = 0;
>  +}
>   } else {
>   last_irq_sums[cpu] = sum;
>   alert_counter[cpu] = 0;

That all makes sense - let's go that way?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 21:16:56 -0700, 
George Anzinger  wrote:
>Keith Owens wrote:
>> On Thu, 28 Jul 2005 13:31:58 -0700, 
>> George Anzinger  wrote:
>> 
>>>I have been doing some work on kgdb to pull a few of it "fingers" out of 
>>>various places in the kernel.  This is the final location where we have 
>>>a kgdb intercept not covered by a notify.
>> 
>> 
>> I like the idea, but the hook should be in die_nmi(), not in the
>> watchdog, using the reason that is already passed into die_nmi.
>> die_nmi() is also called for a real NMI.
>> 
>I had though that too, but it does not allow recovery (i.e. lets reset 
>the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

>Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
>trap, only from the watchdog.  Also, there is a notify in the path to 
>the other nmi stuff.

I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Keith Owens wrote:
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger  wrote:


I have been doing some work on kgdb to pull a few of it "fingers" out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.



I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

I had though that too, but it does not allow recovery (i.e. lets reset 
the watchdog and try again).


Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
trap, only from the watchdog.  Also, there is a notify in the path to 
the other nmi stuff.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger  wrote:
>I have been doing some work on kgdb to pull a few of it "fingers" out of 
>various places in the kernel.  This is the final location where we have 
>a kgdb intercept not covered by a notify.

I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Andrew Morton
George Anzinger  wrote:
>
> Andrew Morton wrote:
> > George Anzinger  wrote:
> > 
> >>This patch adds a notify to the nmi watchdog to notify that
> >>the system is about to be taken down by the watchdog.  If the
> >>notify is handled with a NOTIFY_STOP return, the system is
> >>given a new lease on life.
> > 
> > 
> > It looks sensible, but as there aren't actually any in-kernel uses for this
> > I'd have thought it would be better for it to live out-of-tree?
> 
> I should just bundle it with the kgdb patch then?

I spose so, for now.  If kdb and/or nlkd could benefit from it then it
might simplify life to merge it into mainline.  Perhaps you could ping
Keith Owens <[EMAIL PROTECTED]> and [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Andrew Morton wrote:

George Anzinger  wrote:


This patch adds a notify to the nmi watchdog to notify that
the system is about to be taken down by the watchdog.  If the
notify is handled with a NOTIFY_STOP return, the system is
given a new lease on life.



It looks sensible, but as there aren't actually any in-kernel uses for this
I'd have thought it would be better for it to live out-of-tree?


I should just bundle it with the kgdb patch then?
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Andrew Morton
George Anzinger  wrote:
>
>   This patch adds a notify to the nmi watchdog to notify that
>   the system is about to be taken down by the watchdog.  If the
>   notify is handled with a NOTIFY_STOP return, the system is
>   given a new lease on life.

It looks sensible, but as there aren't actually any in-kernel uses for this
I'd have thought it would be better for it to live out-of-tree?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Andrew,
I have been doing some work on kgdb to pull a few of it "fingers" out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.


On a related issue, I feel very queasy with sending nmi interrupts and 
non-nmi events to the same notify code.  Would you be open to a patch to 
create a seperate notify list for nmi events?



-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Enhancement 
Description:
This patch adds a notify to the nmi watchdog to notify that
the system is about to be taken down by the watchdog.  If the
notify is handled with a NOTIFY_STOP return, the system is
given a new lease on life.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger

 nmi.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -26,11 +26,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "mach_traps.h"
 
@@ -494,8 +496,15 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
-   die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   if (alert_counter[cpu] == 5*nmi_hz) {
+   if (notify_die(DIE_NMIWATCHDOG, "nmi_ipi_watchdog", 
+  regs, 0, 0, SIGINT) == NOTIFY_STOP) {
+   last_irq_sums[cpu] = sum;
+   alert_counter[cpu] = 0;
+   } else {
+   die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   }
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
@@ -555,7 +564,7 @@ int proc_unknown_nmi_panic(ctl_table *ta
return -EBUSY;
} else {
set_nmi_callback(unknown_nmi_panic_callback);
-   }
+   } 
} else {
release_lapic_nmi();
unset_nmi_callback();


[PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Andrew,
I have been doing some work on kgdb to pull a few of it fingers out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.


On a related issue, I feel very queasy with sending nmi interrupts and 
non-nmi events to the same notify code.  Would you be open to a patch to 
create a seperate notify list for nmi events?



-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type: Enhancement 
Description:
This patch adds a notify to the nmi watchdog to notify that
the system is about to be taken down by the watchdog.  If the
notify is handled with a NOTIFY_STOP return, the system is
given a new lease on life.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with other cpus held.

Signed-off-by: George Anzingergeorge@mvista.com

 nmi.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -26,11 +26,13 @@
 #include linux/nmi.h
 #include linux/sysdev.h
 #include linux/sysctl.h
+#include linux/notifier.h
 
 #include asm/smp.h
 #include asm/mtrr.h
 #include asm/mpspec.h
 #include asm/nmi.h
+#include asm/kdebug.h
 
 #include mach_traps.h
 
@@ -494,8 +496,15 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
-   die_nmi(regs, NMI Watchdog detected LOCKUP);
+   if (alert_counter[cpu] == 5*nmi_hz) {
+   if (notify_die(DIE_NMIWATCHDOG, nmi_ipi_watchdog, 
+  regs, 0, 0, SIGINT) == NOTIFY_STOP) {
+   last_irq_sums[cpu] = sum;
+   alert_counter[cpu] = 0;
+   } else {
+   die_nmi(regs, NMI Watchdog detected LOCKUP);
+   }
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
@@ -555,7 +564,7 @@ int proc_unknown_nmi_panic(ctl_table *ta
return -EBUSY;
} else {
set_nmi_callback(unknown_nmi_panic_callback);
-   }
+   } 
} else {
release_lapic_nmi();
unset_nmi_callback();


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Andrew Morton
George Anzinger george@mvista.com wrote:

   This patch adds a notify to the nmi watchdog to notify that
   the system is about to be taken down by the watchdog.  If the
   notify is handled with a NOTIFY_STOP return, the system is
   given a new lease on life.

It looks sensible, but as there aren't actually any in-kernel uses for this
I'd have thought it would be better for it to live out-of-tree?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Andrew Morton wrote:

George Anzinger george@mvista.com wrote:


This patch adds a notify to the nmi watchdog to notify that
the system is about to be taken down by the watchdog.  If the
notify is handled with a NOTIFY_STOP return, the system is
given a new lease on life.



It looks sensible, but as there aren't actually any in-kernel uses for this
I'd have thought it would be better for it to live out-of-tree?


I should just bundle it with the kgdb patch then?
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Andrew Morton
George Anzinger george@mvista.com wrote:

 Andrew Morton wrote:
  George Anzinger george@mvista.com wrote:
  
 This patch adds a notify to the nmi watchdog to notify that
 the system is about to be taken down by the watchdog.  If the
 notify is handled with a NOTIFY_STOP return, the system is
 given a new lease on life.
  
  
  It looks sensible, but as there aren't actually any in-kernel uses for this
  I'd have thought it would be better for it to live out-of-tree?
 
 I should just bundle it with the kgdb patch then?

I spose so, for now.  If kdb and/or nlkd could benefit from it then it
might simplify life to merge it into mainline.  Perhaps you could ping
Keith Owens [EMAIL PROTECTED] and [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger george@mvista.com wrote:
I have been doing some work on kgdb to pull a few of it fingers out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.

I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Keith Owens wrote:
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger george@mvista.com wrote:


I have been doing some work on kgdb to pull a few of it fingers out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.



I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

I had though that too, but it does not allow recovery (i.e. lets reset 
the watchdog and try again).


Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
trap, only from the watchdog.  Also, there is a notify in the path to 
the other nmi stuff.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 21:16:56 -0700, 
George Anzinger george@mvista.com wrote:
Keith Owens wrote:
 On Thu, 28 Jul 2005 13:31:58 -0700, 
 George Anzinger george@mvista.com wrote:
 
I have been doing some work on kgdb to pull a few of it fingers out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.
 
 
 I like the idea, but the hook should be in die_nmi(), not in the
 watchdog, using the reason that is already passed into die_nmi.
 die_nmi() is also called for a real NMI.
 
I had though that too, but it does not allow recovery (i.e. lets reset 
the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
trap, only from the watchdog.  Also, there is a notify in the path to 
the other nmi stuff.

I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, NMI Watchdog detected LOCKUP);
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Andrew Morton
Keith Owens [EMAIL PROTECTED] wrote:

 I had though that too, but it does not allow recovery (i.e. lets reset 
  the watchdog and try again).
 
  die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
  reset and continues.  Patch below.
 
  Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
  trap, only from the watchdog.  Also, there is a notify in the path to 
  the other nmi stuff.
 
  I was looking at unknown_nmi_panic_callback(), which also calls
  die_nmi().
 
  traps.c already has several notify_die() calls, nmi.c has none.  It is
  cleaner to keep all the notification in traps.c, with this small change
  to nmi.c to cope with die_nmi() returning.
 
  Index: linux/arch/i386/kernel/nmi.c
  ===
  --- linux.orig/arch/i386/kernel/nmi.c2005-07-28 17:22:06.735038510 
 +1000
  +++ linux/arch/i386/kernel/nmi.c 2005-07-29 15:19:00.371196596 +1000
  @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
* wait a few IRQs (5 seconds) before doing the oops ...
*/
   alert_counter[cpu]++;
  -if (alert_counter[cpu] == 5*nmi_hz)
  +if (alert_counter[cpu] == 5*nmi_hz) {
   die_nmi(regs, NMI Watchdog detected LOCKUP);
  +alert_counter[cpu] = 0;
  +}
   } else {
   last_irq_sums[cpu] = sum;
   alert_counter[cpu] = 0;

That all makes sense - let's go that way?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/