Re: [patch] Change softlockup trigger limit using a kernel parameter
On Wed, Jul 18, 2007 at 11:09:07PM -0700, Andrew Morton wrote: >On Wed, 18 Jul 2007 22:41:21 -0700 Ravikiran G Thirumalai <[EMAIL PROTECTED]> >wrote: > >> On Wed, Jul 18, 2007 at 04:08:58PM -0700, Andrew Morton wrote: >> > On Mon, 16 Jul 2007 15:26:50 -0700 >> > Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote: >> > >> > > Kernel warns of softlockups if the softlockup thread is not able to run >> > > on a CPU for 10s. It is useful to lower the softlockup warning >> > > threshold in testing environments to catch potential lockups early. >> > > Following patch adds a kernel parameter 'softlockup_lim' to control >> > > the softlockup threshold. >> > > >> > >> > Why not make it tunable at runtime? >> >> Sure! Like a sysctl? >> Hi Andrew, Here's another take, incorporating all of your comments. Thanks, Kiran Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh to identify possible softlockups earlier. This patch: 1. Adds a sysctl softlockup_thresh with valid values of 1-60s (Higher value to disable false positives) 2. Changes the softlockup printk to print the cpu softlockup time Signed-off-by: Ravikiran Thirumalai <[EMAIL PROTECTED]> Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]> --- Patch applies on top of Ingo's softlockup patches Documentation/sysctl/kernel.txt |8 include/linux/sched.h |1 + kernel/softlockup.c |8 +--- kernel/sysctl.c | 25 +++-- 4 files changed, 33 insertions(+), 9 deletions(-) Index: linux-2.6.22/kernel/softlockup.c === --- linux-2.6.22.orig/kernel/softlockup.c 2007-07-18 11:15:18.506614500 -0700 +++ linux-2.6.22/kernel/softlockup.c2007-07-18 21:39:20.498592750 -0700 @@ -23,6 +23,7 @@ static DEFINE_PER_CPU(unsigned long, pri static DEFINE_PER_CPU(struct task_struct *, watchdog_task); static int did_panic; +int softlockup_thresh = 10; static int softlock_panic(struct notifier_block *this, unsigned long event, void *ptr) @@ -101,7 +102,7 @@ void softlockup_tick(void) wake_up_process(per_cpu(watchdog_task, this_cpu)); /* Warn about unreasonable 10+ seconds delays: */ - if (now <= (touch_timestamp + 10)) + if (now <= (touch_timestamp + softlockup_thresh)) return; regs = get_irq_regs(); @@ -109,8 +110,9 @@ void softlockup_tick(void) per_cpu(print_timestamp, this_cpu) = touch_timestamp; spin_lock(_lock); - printk(KERN_ERR "BUG: soft lockup detected on CPU#%d! [%s:%d]\n", - this_cpu, current->comm, current->pid); + printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n", + this_cpu, now - touch_timestamp, + current->comm, current->pid); if (regs) show_regs(regs); else Index: linux-2.6.22/kernel/sysctl.c === --- linux-2.6.22.orig/kernel/sysctl.c 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/kernel/sysctl.c2007-07-19 13:42:50.275478000 -0700 @@ -79,6 +79,12 @@ extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; +/* Constants used for minimum and maximum */ +static int one = 1; +static int zero; +static int sixty = 60; +static int one_hundred = 100; + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -615,16 +621,23 @@ static ctl_table kern_table[] = { .proc_handler = _dointvec, }, #endif +#ifdef CONFIG_DETECT_SOFTLOCKUP + { + .ctl_name = CTL_UNNUMBERED, + .procname = "softlockup_thresh", + .data = _thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = _dointvec_minmax, + .strategy = _intvec, + .extra1 = , + .extra2 = , + }, +#endif { .ctl_name = 0 } }; -/* Constants for minimum and maximum testing in vm_table. - We use these as one-element integer vectors. */ -static int zero; -static int one_hundred = 100; - - static ctl_table vm_table[] = { { .ctl_name = VM_OVERCOMMIT_MEMORY, Index: linux-2.6.22/Documentation/sysctl/kernel.txt === --- linux-2.6.22.orig/Documentation/sysctl/kernel.txt 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/Documentation/sysctl/kernel.txt2007-07-19 13:42:09.748945250 -0700 @@ -320,6 +320,14 @@ kernel. This value defaults to SHMMAX. == +softlockup_thresh: + +This value can be
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Thu, Jul 19, 2007 at 11:51:14AM -0700, Jeremy Fitzhardinge wrote: >Ingo Molnar wrote: >> just in case someone sees false positives and wants to turn it off. > >Why not make 0=off? A patch to disable softlockup during boot already went in. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97842216b8400fe9d1a20468959e2989180f8f79 It uses kernel boot parameter to disable softlockup, not exactly disabling softlockup at run time though. Do we still need 0=off? Thanks, Kiran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
Ingo Molnar wrote: > just in case someone sees false positives and wants to turn it off. Why not make 0=off? J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
* Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote: > > also, i think the valid threshold should be between 1 and 60 seconds > > i think. > > 60 seconds! Is that not a pretty high threshold? The reason for > lowering the tolerance threshold from 10s is to catch bugs early in > lab environments, but why do we need to raise the tolerance thresh > beyond 10s? just in case someone sees false positives and wants to turn it off. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Thu, Jul 19, 2007 at 11:11:42AM +0200, Ingo Molnar wrote: > >* Andrew Morton <[EMAIL PROTECTED]> wrote: > >> > +softlockup_thresh: >> > + >> > +This value can be used to lower the softlockup tolerance >> > +threshold. The default threshold is 10s. If a cpu is locked up >> > +for 10s, the kernel complains. Valid values are 1-10s. >> > + >> >> neato. > >please make sure this is applied after the softlockup watchdog patches i >already did. (in particular the cleanup patch, which this could interact >with) > >also, i think the valid threshold should be between 1 and 60 seconds i >think. 60 seconds! Is that not a pretty high threshold? The reason for lowering the tolerance threshold from 10s is to catch bugs early in lab environments, but why do we need to raise the tolerance thresh beyond 10s? Thanks, Kiran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > +softlockup_thresh: > > + > > +This value can be used to lower the softlockup tolerance > > +threshold. The default threshold is 10s. If a cpu is locked up > > +for 10s, the kernel complains. Valid values are 1-10s. > > + > > neato. please make sure this is applied after the softlockup watchdog patches i already did. (in particular the cleanup patch, which this could interact with) also, i think the valid threshold should be between 1 and 60 seconds i think. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Wed, 18 Jul 2007 22:41:21 -0700 Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote: > On Wed, Jul 18, 2007 at 04:08:58PM -0700, Andrew Morton wrote: > > On Mon, 16 Jul 2007 15:26:50 -0700 > > Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote: > > > > > Kernel warns of softlockups if the softlockup thread is not able to run > > > on a CPU for 10s. It is useful to lower the softlockup warning > > > threshold in testing environments to catch potential lockups early. > > > Following patch adds a kernel parameter 'softlockup_lim' to control > > > the softlockup threshold. > > > > > > > Why not make it tunable at runtime? > > Sure! Like a sysctl? > > Here's a patch that does that (On top of Ingo's > softlockup-improve-debug-output.patch) > > ... > > --- linux-2.6.22.orig/kernel/sysctl.c 2007-07-08 16:32:17.0 -0700 > +++ linux-2.6.22/kernel/sysctl.c 2007-07-18 21:05:57.877436750 -0700 > @@ -78,6 +78,7 @@ extern int percpu_pagelist_fraction; > extern int compat_log; > extern int maps_protect; > extern int sysctl_stat_interval; > +extern int softlockup_thresh; Just because sysctl.c does this all over the place doesn't make it right ;) Please, if poss, find a header file for it. > /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID > */ > static int maxolduid = 65535; > @@ -206,6 +207,10 @@ static ctl_table root_table[] = { > { .ctl_name = 0 } > }; > > +/* Constants for kernel table minimum and maximum */ > +static int one = 1; > +static int ten = 10; I'd suggest that these go next to "zero", "two" and "one_hundred". Move 'em all to top-of-file where they should always have been. > static ctl_table kern_table[] = { > { > .ctl_name = KERN_PANIC, > @@ -615,6 +620,19 @@ static ctl_table kern_table[] = { > .proc_handler = _dointvec, > }, > #endif > +#ifdef CONFIG_DETECT_SOFTLOCKUP > + { > + .ctl_name = KERN_SOFTLOCKUP_THRESHOLD, > + .procname = "softlockup_thresh", > + .data = _thresh, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = _dointvec_minmax, > + .strategy = _intvec, > + .extra1 = , > + .extra2 = , > + }, > +#endif argh. There's supposed to be a big comment right here: /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt */ I'll fix that up. Please use CTL_UNNUMBERED. > }; > Index: linux-2.6.22/include/linux/sysctl.h > === > --- linux-2.6.22.orig/include/linux/sysctl.h 2007-07-08 16:32:17.0 > -0700 > +++ linux-2.6.22/include/linux/sysctl.h 2007-07-18 21:41:56.584347500 > -0700 > @@ -165,6 +165,7 @@ enum > KERN_MAX_LOCK_DEPTH=74, > KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ > KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ > + KERN_SOFTLOCKUP_THRESHOLD=77, /* int: softlockup tolerance threshold */ > }; and zap this > Index: linux-2.6.22/Documentation/sysctl/kernel.txt > === > --- linux-2.6.22.orig/Documentation/sysctl/kernel.txt 2007-07-08 > 16:32:17.0 -0700 > +++ linux-2.6.22/Documentation/sysctl/kernel.txt 2007-07-18 > 22:07:29.460146250 -0700 > @@ -320,6 +320,14 @@ kernel. This value defaults to SHMMAX. > > == > > +softlockup_thresh: > + > +This value can be used to lower the softlockup tolerance > +threshold. The default threshold is 10s. If a cpu is locked up > +for 10s, the kernel complains. Valid values are 1-10s. > + neato. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Wed, Jul 18, 2007 at 04:08:58PM -0700, Andrew Morton wrote: > On Mon, 16 Jul 2007 15:26:50 -0700 > Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote: > > > Kernel warns of softlockups if the softlockup thread is not able to run > > on a CPU for 10s. It is useful to lower the softlockup warning > > threshold in testing environments to catch potential lockups early. > > Following patch adds a kernel parameter 'softlockup_lim' to control > > the softlockup threshold. > > > > Why not make it tunable at runtime? Sure! Like a sysctl? Here's a patch that does that (On top of Ingo's softlockup-improve-debug-output.patch) > > > > > Control the trigger limit for softlockup warnings. This is useful for > > debugging softlockups, by lowering the softlockup_lim to identify > > possible softlockups earlier. > > Please check your patches with scripts/checkpatch.pl. Yep will-do. (checkpatch emitted one warning for the patch below, but that was because of a 'stylo' that already exists in include/linux/sysctl.h -- which probably needs a style change patch by itself) --- Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh sysctl, to identify possible softlockups earlier. Patch also changes the softlockup printk to print the cpu softlockup time. Signed-off-by: Ravikiran Thirumalai <[EMAIL PROTECTED]> Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]> Index: linux-2.6.22/kernel/softlockup.c === --- linux-2.6.22.orig/kernel/softlockup.c 2007-07-18 11:15:18.506614500 -0700 +++ linux-2.6.22/kernel/softlockup.c2007-07-18 21:39:20.498592750 -0700 @@ -23,6 +23,7 @@ static DEFINE_PER_CPU(unsigned long, pri static DEFINE_PER_CPU(struct task_struct *, watchdog_task); static int did_panic; +int softlockup_thresh = 10; static int softlock_panic(struct notifier_block *this, unsigned long event, void *ptr) @@ -101,7 +102,7 @@ void softlockup_tick(void) wake_up_process(per_cpu(watchdog_task, this_cpu)); /* Warn about unreasonable 10+ seconds delays: */ - if (now <= (touch_timestamp + 10)) + if (now <= (touch_timestamp + softlockup_thresh)) return; regs = get_irq_regs(); @@ -109,8 +110,9 @@ void softlockup_tick(void) per_cpu(print_timestamp, this_cpu) = touch_timestamp; spin_lock(_lock); - printk(KERN_ERR "BUG: soft lockup detected on CPU#%d! [%s:%d]\n", - this_cpu, current->comm, current->pid); + printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n", + this_cpu, now - touch_timestamp, + current->comm, current->pid); if (regs) show_regs(regs); else Index: linux-2.6.22/kernel/sysctl.c === --- linux-2.6.22.orig/kernel/sysctl.c 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/kernel/sysctl.c2007-07-18 21:05:57.877436750 -0700 @@ -78,6 +78,7 @@ extern int percpu_pagelist_fraction; extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; +extern int softlockup_thresh; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -206,6 +207,10 @@ static ctl_table root_table[] = { { .ctl_name = 0 } }; +/* Constants for kernel table minimum and maximum */ +static int one = 1; +static int ten = 10; + static ctl_table kern_table[] = { { .ctl_name = KERN_PANIC, @@ -615,6 +620,19 @@ static ctl_table kern_table[] = { .proc_handler = _dointvec, }, #endif +#ifdef CONFIG_DETECT_SOFTLOCKUP + { + .ctl_name = KERN_SOFTLOCKUP_THRESHOLD, + .procname = "softlockup_thresh", + .data = _thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = _dointvec_minmax, + .strategy = _intvec, + .extra1 = , + .extra2 = , + }, +#endif { .ctl_name = 0 } }; Index: linux-2.6.22/include/linux/sysctl.h === --- linux-2.6.22.orig/include/linux/sysctl.h2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/include/linux/sysctl.h 2007-07-18 21:41:56.584347500 -0700 @@ -165,6 +165,7 @@ enum KERN_MAX_LOCK_DEPTH=74, KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ + KERN_SOFTLOCKUP_THRESHOLD=77, /* int: softlockup tolerance threshold */ }; Index: linux-2.6.22/Documentation/sysctl/kernel.txt === ---
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Wed, Jul 18, 2007 at 04:08:58PM -0700, Andrew Morton wrote: On Mon, 16 Jul 2007 15:26:50 -0700 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: Kernel warns of softlockups if the softlockup thread is not able to run on a CPU for 10s. It is useful to lower the softlockup warning threshold in testing environments to catch potential lockups early. Following patch adds a kernel parameter 'softlockup_lim' to control the softlockup threshold. Why not make it tunable at runtime? Sure! Like a sysctl? Here's a patch that does that (On top of Ingo's softlockup-improve-debug-output.patch) Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_lim to identify possible softlockups earlier. Please check your patches with scripts/checkpatch.pl. Yep will-do. (checkpatch emitted one warning for the patch below, but that was because of a 'stylo' that already exists in include/linux/sysctl.h -- which probably needs a style change patch by itself) --- Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh sysctl, to identify possible softlockups earlier. Patch also changes the softlockup printk to print the cpu softlockup time. Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED] Signed-off-by: Shai Fultheim [EMAIL PROTECTED] Index: linux-2.6.22/kernel/softlockup.c === --- linux-2.6.22.orig/kernel/softlockup.c 2007-07-18 11:15:18.506614500 -0700 +++ linux-2.6.22/kernel/softlockup.c2007-07-18 21:39:20.498592750 -0700 @@ -23,6 +23,7 @@ static DEFINE_PER_CPU(unsigned long, pri static DEFINE_PER_CPU(struct task_struct *, watchdog_task); static int did_panic; +int softlockup_thresh = 10; static int softlock_panic(struct notifier_block *this, unsigned long event, void *ptr) @@ -101,7 +102,7 @@ void softlockup_tick(void) wake_up_process(per_cpu(watchdog_task, this_cpu)); /* Warn about unreasonable 10+ seconds delays: */ - if (now = (touch_timestamp + 10)) + if (now = (touch_timestamp + softlockup_thresh)) return; regs = get_irq_regs(); @@ -109,8 +110,9 @@ void softlockup_tick(void) per_cpu(print_timestamp, this_cpu) = touch_timestamp; spin_lock(print_lock); - printk(KERN_ERR BUG: soft lockup detected on CPU#%d! [%s:%d]\n, - this_cpu, current-comm, current-pid); + printk(KERN_ERR BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n, + this_cpu, now - touch_timestamp, + current-comm, current-pid); if (regs) show_regs(regs); else Index: linux-2.6.22/kernel/sysctl.c === --- linux-2.6.22.orig/kernel/sysctl.c 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/kernel/sysctl.c2007-07-18 21:05:57.877436750 -0700 @@ -78,6 +78,7 @@ extern int percpu_pagelist_fraction; extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; +extern int softlockup_thresh; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -206,6 +207,10 @@ static ctl_table root_table[] = { { .ctl_name = 0 } }; +/* Constants for kernel table minimum and maximum */ +static int one = 1; +static int ten = 10; + static ctl_table kern_table[] = { { .ctl_name = KERN_PANIC, @@ -615,6 +620,19 @@ static ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif +#ifdef CONFIG_DETECT_SOFTLOCKUP + { + .ctl_name = KERN_SOFTLOCKUP_THRESHOLD, + .procname = softlockup_thresh, + .data = softlockup_thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .strategy = sysctl_intvec, + .extra1 = one, + .extra2 = ten, + }, +#endif { .ctl_name = 0 } }; Index: linux-2.6.22/include/linux/sysctl.h === --- linux-2.6.22.orig/include/linux/sysctl.h2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/include/linux/sysctl.h 2007-07-18 21:41:56.584347500 -0700 @@ -165,6 +165,7 @@ enum KERN_MAX_LOCK_DEPTH=74, KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ + KERN_SOFTLOCKUP_THRESHOLD=77, /* int: softlockup tolerance threshold */ }; Index: linux-2.6.22/Documentation/sysctl/kernel.txt === ---
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Wed, 18 Jul 2007 22:41:21 -0700 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: On Wed, Jul 18, 2007 at 04:08:58PM -0700, Andrew Morton wrote: On Mon, 16 Jul 2007 15:26:50 -0700 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: Kernel warns of softlockups if the softlockup thread is not able to run on a CPU for 10s. It is useful to lower the softlockup warning threshold in testing environments to catch potential lockups early. Following patch adds a kernel parameter 'softlockup_lim' to control the softlockup threshold. Why not make it tunable at runtime? Sure! Like a sysctl? Here's a patch that does that (On top of Ingo's softlockup-improve-debug-output.patch) ... --- linux-2.6.22.orig/kernel/sysctl.c 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/kernel/sysctl.c 2007-07-18 21:05:57.877436750 -0700 @@ -78,6 +78,7 @@ extern int percpu_pagelist_fraction; extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; +extern int softlockup_thresh; Just because sysctl.c does this all over the place doesn't make it right ;) Please, if poss, find a header file for it. /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -206,6 +207,10 @@ static ctl_table root_table[] = { { .ctl_name = 0 } }; +/* Constants for kernel table minimum and maximum */ +static int one = 1; +static int ten = 10; I'd suggest that these go next to zero, two and one_hundred. Move 'em all to top-of-file where they should always have been. static ctl_table kern_table[] = { { .ctl_name = KERN_PANIC, @@ -615,6 +620,19 @@ static ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif +#ifdef CONFIG_DETECT_SOFTLOCKUP + { + .ctl_name = KERN_SOFTLOCKUP_THRESHOLD, + .procname = softlockup_thresh, + .data = softlockup_thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .strategy = sysctl_intvec, + .extra1 = one, + .extra2 = ten, + }, +#endif argh. There's supposed to be a big comment right here: /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt */ I'll fix that up. Please use CTL_UNNUMBERED. }; Index: linux-2.6.22/include/linux/sysctl.h === --- linux-2.6.22.orig/include/linux/sysctl.h 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/include/linux/sysctl.h 2007-07-18 21:41:56.584347500 -0700 @@ -165,6 +165,7 @@ enum KERN_MAX_LOCK_DEPTH=74, KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ + KERN_SOFTLOCKUP_THRESHOLD=77, /* int: softlockup tolerance threshold */ }; and zap this Index: linux-2.6.22/Documentation/sysctl/kernel.txt === --- linux-2.6.22.orig/Documentation/sysctl/kernel.txt 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/Documentation/sysctl/kernel.txt 2007-07-18 22:07:29.460146250 -0700 @@ -320,6 +320,14 @@ kernel. This value defaults to SHMMAX. == +softlockup_thresh: + +This value can be used to lower the softlockup tolerance +threshold. The default threshold is 10s. If a cpu is locked up +for 10s, the kernel complains. Valid values are 1-10s. + neato. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
* Andrew Morton [EMAIL PROTECTED] wrote: +softlockup_thresh: + +This value can be used to lower the softlockup tolerance +threshold. The default threshold is 10s. If a cpu is locked up +for 10s, the kernel complains. Valid values are 1-10s. + neato. please make sure this is applied after the softlockup watchdog patches i already did. (in particular the cleanup patch, which this could interact with) also, i think the valid threshold should be between 1 and 60 seconds i think. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Thu, Jul 19, 2007 at 11:11:42AM +0200, Ingo Molnar wrote: * Andrew Morton [EMAIL PROTECTED] wrote: +softlockup_thresh: + +This value can be used to lower the softlockup tolerance +threshold. The default threshold is 10s. If a cpu is locked up +for 10s, the kernel complains. Valid values are 1-10s. + neato. please make sure this is applied after the softlockup watchdog patches i already did. (in particular the cleanup patch, which this could interact with) also, i think the valid threshold should be between 1 and 60 seconds i think. 60 seconds! Is that not a pretty high threshold? The reason for lowering the tolerance threshold from 10s is to catch bugs early in lab environments, but why do we need to raise the tolerance thresh beyond 10s? Thanks, Kiran - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
* Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: also, i think the valid threshold should be between 1 and 60 seconds i think. 60 seconds! Is that not a pretty high threshold? The reason for lowering the tolerance threshold from 10s is to catch bugs early in lab environments, but why do we need to raise the tolerance thresh beyond 10s? just in case someone sees false positives and wants to turn it off. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
Ingo Molnar wrote: just in case someone sees false positives and wants to turn it off. Why not make 0=off? J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Thu, Jul 19, 2007 at 11:51:14AM -0700, Jeremy Fitzhardinge wrote: Ingo Molnar wrote: just in case someone sees false positives and wants to turn it off. Why not make 0=off? A patch to disable softlockup during boot already went in. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97842216b8400fe9d1a20468959e2989180f8f79 It uses kernel boot parameter to disable softlockup, not exactly disabling softlockup at run time though. Do we still need 0=off? Thanks, Kiran - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Wed, Jul 18, 2007 at 11:09:07PM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 22:41:21 -0700 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: On Wed, Jul 18, 2007 at 04:08:58PM -0700, Andrew Morton wrote: On Mon, 16 Jul 2007 15:26:50 -0700 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: Kernel warns of softlockups if the softlockup thread is not able to run on a CPU for 10s. It is useful to lower the softlockup warning threshold in testing environments to catch potential lockups early. Following patch adds a kernel parameter 'softlockup_lim' to control the softlockup threshold. Why not make it tunable at runtime? Sure! Like a sysctl? Hi Andrew, Here's another take, incorporating all of your comments. Thanks, Kiran Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh to identify possible softlockups earlier. This patch: 1. Adds a sysctl softlockup_thresh with valid values of 1-60s (Higher value to disable false positives) 2. Changes the softlockup printk to print the cpu softlockup time Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED] Signed-off-by: Shai Fultheim [EMAIL PROTECTED] --- Patch applies on top of Ingo's softlockup patches Documentation/sysctl/kernel.txt |8 include/linux/sched.h |1 + kernel/softlockup.c |8 +--- kernel/sysctl.c | 25 +++-- 4 files changed, 33 insertions(+), 9 deletions(-) Index: linux-2.6.22/kernel/softlockup.c === --- linux-2.6.22.orig/kernel/softlockup.c 2007-07-18 11:15:18.506614500 -0700 +++ linux-2.6.22/kernel/softlockup.c2007-07-18 21:39:20.498592750 -0700 @@ -23,6 +23,7 @@ static DEFINE_PER_CPU(unsigned long, pri static DEFINE_PER_CPU(struct task_struct *, watchdog_task); static int did_panic; +int softlockup_thresh = 10; static int softlock_panic(struct notifier_block *this, unsigned long event, void *ptr) @@ -101,7 +102,7 @@ void softlockup_tick(void) wake_up_process(per_cpu(watchdog_task, this_cpu)); /* Warn about unreasonable 10+ seconds delays: */ - if (now = (touch_timestamp + 10)) + if (now = (touch_timestamp + softlockup_thresh)) return; regs = get_irq_regs(); @@ -109,8 +110,9 @@ void softlockup_tick(void) per_cpu(print_timestamp, this_cpu) = touch_timestamp; spin_lock(print_lock); - printk(KERN_ERR BUG: soft lockup detected on CPU#%d! [%s:%d]\n, - this_cpu, current-comm, current-pid); + printk(KERN_ERR BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n, + this_cpu, now - touch_timestamp, + current-comm, current-pid); if (regs) show_regs(regs); else Index: linux-2.6.22/kernel/sysctl.c === --- linux-2.6.22.orig/kernel/sysctl.c 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/kernel/sysctl.c2007-07-19 13:42:50.275478000 -0700 @@ -79,6 +79,12 @@ extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; +/* Constants used for minimum and maximum */ +static int one = 1; +static int zero; +static int sixty = 60; +static int one_hundred = 100; + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; @@ -615,16 +621,23 @@ static ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif +#ifdef CONFIG_DETECT_SOFTLOCKUP + { + .ctl_name = CTL_UNNUMBERED, + .procname = softlockup_thresh, + .data = softlockup_thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .strategy = sysctl_intvec, + .extra1 = one, + .extra2 = sixty, + }, +#endif { .ctl_name = 0 } }; -/* Constants for minimum and maximum testing in vm_table. - We use these as one-element integer vectors. */ -static int zero; -static int one_hundred = 100; - - static ctl_table vm_table[] = { { .ctl_name = VM_OVERCOMMIT_MEMORY, Index: linux-2.6.22/Documentation/sysctl/kernel.txt === --- linux-2.6.22.orig/Documentation/sysctl/kernel.txt 2007-07-08 16:32:17.0 -0700 +++ linux-2.6.22/Documentation/sysctl/kernel.txt2007-07-19 13:42:09.748945250 -0700 @@ -320,6 +320,14 @@ kernel. This value defaults to SHMMAX. == +softlockup_thresh: + +This value can be used to lower the softlockup
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Mon, 16 Jul 2007 15:26:50 -0700 Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote: > Kernel warns of softlockups if the softlockup thread is not able to run > on a CPU for 10s. It is useful to lower the softlockup warning > threshold in testing environments to catch potential lockups early. > Following patch adds a kernel parameter 'softlockup_lim' to control > the softlockup threshold. > Why not make it tunable at runtime? > > Control the trigger limit for softlockup warnings. This is useful for > debugging softlockups, by lowering the softlockup_lim to identify > possible softlockups earlier. Please check your patches with scripts/checkpatch.pl. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Change softlockup trigger limit using a kernel parameter
On Mon, 16 Jul 2007 15:26:50 -0700 Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: Kernel warns of softlockups if the softlockup thread is not able to run on a CPU for 10s. It is useful to lower the softlockup warning threshold in testing environments to catch potential lockups early. Following patch adds a kernel parameter 'softlockup_lim' to control the softlockup threshold. Why not make it tunable at runtime? Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_lim to identify possible softlockups earlier. Please check your patches with scripts/checkpatch.pl. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/