Re: [PATCH v5] panic: Avoid the extra noise dmesg
On Fri, 1 Mar 2019 09:54:01 +0100 Petr Mladek wrote: I'll put my $0.02 in, and say I like the idea of the patch too. > > > Thoughts: > > > > > > - why do the suppression in vprintk_emit()? Doing it right at entry > > > to printk() seems cleaner, more explicit? > > > > Yes, I put it in printk() in one earlier post, and Petr suggested to > > put it into vprintk_emit so that it works for all printk() interfaces, > > like the devkmsg_write -> printk_emit -> vprintk_emit path. > > Yes, there are more printk interfaces. The check in vprintk_emit() > allows to calm down also prink_deferred() and dev_printk(). Agreed. > > > > - Other code sites may wish to suppress all printks. Perhaps > > > `panic_suppress_printk' should just be called `suppress_printk'? > > Ok, then I'll move the definition from panic.c to printk code. > > This change looks fine to me. > > Best Regards, > Petr > > PS: I am sorry for the late review. I have spent many days with > reviewing a proposal of rewrite of printk() internals. Thanks for doing that, BTW. -- Steve
Re: [PATCH v5] panic: Avoid the extra noise dmesg
On Fri 2019-03-01 12:11:31, Feng Tang wrote: > Hi Andrew, > > Thanks for the review! > > On Thu, Feb 28, 2019 at 12:00:14PM -0800, Andrew Morton wrote: > > On Fri, 22 Feb 2019 14:09:59 +0800 Feng Tang wrote: > > > > > When kernel panic happens, it will first print the panic call stack, > > > then the ending msg like: > > > > > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > > > [ 35.749975] [ cut here ] > > > > > > The above message are very useful for debugging. > > > > > > But if system is configured to not reboot on panic, say the > > > "panic_timeout" > > > parameter equals 0, it will likely print out many noisy message like > > > WARN() call stack for each and every CPU except the panic one, messages > > > like below: > > > > > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 > > > set_task_cpu+0x183/0x190 > > > Call Trace: > > > > > > try_to_wake_up > > > default_wake_function > > > autoremove_wake_function > > > __wake_up_common > > > __wake_up_common_lock > > > __wake_up > > > wake_up_klogd_work_func > > > irq_work_run_list > > > irq_work_tick > > > update_process_times > > > tick_sched_timer > > > __hrtimer_run_queues > > > hrtimer_interrupt > > > smp_apic_timer_interrupt > > > apic_timer_interrupt > > > > It's a fairly ugly-looking patch but I am inclined to agree. > > Yes, it's ugly :) we've changed 3 methods to tackle this. > > > The panicing CPU is spinning and blinking a LED and all CPUs have > > interrupts enabled and the system is known to be in a messed up state. > > All sorts of kernel code could emit all sorts of output in such > > circumstances. So a global printk-killing knob seems appropriate. > > > > Thoughts: > > > > - why do the suppression in vprintk_emit()? Doing it right at entry > > to printk() seems cleaner, more explicit? > > Yes, I put it in printk() in one earlier post, and Petr suggested to > put it into vprintk_emit so that it works for all printk() interfaces, > like the devkmsg_write -> printk_emit -> vprintk_emit path. Yes, there are more printk interfaces. The check in vprintk_emit() allows to calm down also prink_deferred() and dev_printk(). > > - Other code sites may wish to suppress all printks. Perhaps > > `panic_suppress_printk' should just be called `suppress_printk'? > Ok, then I'll move the definition from panic.c to printk code. This change looks fine to me. Best Regards, Petr PS: I am sorry for the late review. I have spent many days with reviewing a proposal of rewrite of printk() internals.
Re: [PATCH v5] panic: Avoid the extra noise dmesg
Hi Andrew, Thanks for the review! On Thu, Feb 28, 2019 at 12:00:14PM -0800, Andrew Morton wrote: > On Fri, 22 Feb 2019 14:09:59 +0800 Feng Tang wrote: > > > When kernel panic happens, it will first print the panic call stack, > > then the ending msg like: > > > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > > [ 35.749975] [ cut here ] > > > > The above message are very useful for debugging. > > > > But if system is configured to not reboot on panic, say the "panic_timeout" > > parameter equals 0, it will likely print out many noisy message like > > WARN() call stack for each and every CPU except the panic one, messages > > like below: > > > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 > > set_task_cpu+0x183/0x190 > > Call Trace: > > > > try_to_wake_up > > default_wake_function > > autoremove_wake_function > > __wake_up_common > > __wake_up_common_lock > > __wake_up > > wake_up_klogd_work_func > > irq_work_run_list > > irq_work_tick > > update_process_times > > tick_sched_timer > > __hrtimer_run_queues > > hrtimer_interrupt > > smp_apic_timer_interrupt > > apic_timer_interrupt > > It's a fairly ugly-looking patch but I am inclined to agree. Yes, it's ugly :) we've changed 3 methods to tackle this. > The panicing CPU is spinning and blinking a LED and all CPUs have > interrupts enabled and the system is known to be in a messed up state. > All sorts of kernel code could emit all sorts of output in such > circumstances. So a global printk-killing knob seems appropriate. > > Thoughts: > > - why do the suppression in vprintk_emit()? Doing it right at entry > to printk() seems cleaner, more explicit? Yes, I put it in printk() in one earlier post, and Petr suggested to put it into vprintk_emit so that it works for all printk() interfaces, like the devkmsg_write -> printk_emit -> vprintk_emit path. > > - Other code sites may wish to suppress all printks. Perhaps > `panic_suppress_printk' should just be called `suppress_printk'? Ok, then I'll move the definition from panic.c to printk code. Thanks, Feng
Re: [PATCH v5] panic: Avoid the extra noise dmesg
On Fri, 22 Feb 2019 14:09:59 +0800 Feng Tang wrote: > When kernel panic happens, it will first print the panic call stack, > then the ending msg like: > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > [ 35.749975] [ cut here ] > > The above message are very useful for debugging. > > But if system is configured to not reboot on panic, say the "panic_timeout" > parameter equals 0, it will likely print out many noisy message like > WARN() call stack for each and every CPU except the panic one, messages > like below: > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 > set_task_cpu+0x183/0x190 > Call Trace: > > try_to_wake_up > default_wake_function > autoremove_wake_function > __wake_up_common > __wake_up_common_lock > __wake_up > wake_up_klogd_work_func > irq_work_run_list > irq_work_tick > update_process_times > tick_sched_timer > __hrtimer_run_queues > hrtimer_interrupt > smp_apic_timer_interrupt > apic_timer_interrupt It's a fairly ugly-looking patch but I am inclined to agree. The panicing CPU is spinning and blinking a LED and all CPUs have interrupts enabled and the system is known to be in a messed up state. All sorts of kernel code could emit all sorts of output in such circumstances. So a global printk-killing knob seems appropriate. Thoughts: - why do the suppression in vprintk_emit()? Doing it right at entry to printk() seems cleaner, more explicit? - Other code sites may wish to suppress all printks. Perhaps `panic_suppress_printk' should just be called `suppress_printk'?
Re: [PATCH v5] panic: Avoid the extra noise dmesg
One small cleanup of unneeded change in last post: >From 4ecbdfe1b398aa2285a696392eddeffb109d5463 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 21 Feb 2019 16:23:32 +0800 Subject: [PATCH v5] panic: Avoid the extra noise dmesg When kernel panic happens, it will first print the panic call stack, then the ending msg like: [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] [ cut here ] The above message are very useful for debugging. But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below: WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace: try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt For people working in console mode, the screen will first show the panic call stack, but immediately overridden by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen. Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context. Adding a flag panic_suppress_printk to avoid those noisy message, without changing the current kernel behavior that both panic blinking and sysrq magic key can work as is, suggested by Petr Mladek. To verify this, make sure kernel is not configured to reboot on panic and in console # echo c > /proc/sysrq-trigger to see if console only prints out the panic call stack. Suggested-by: Petr Mladek Signed-off-by: Feng Tang Cc: Thomas Gleixner Cc: Kees Cook Cc: Borislav Petkov Cc: Sergey Senozhatsky Cc: Petr Mladek Cc: Steven Rostedt Cc: Andi Kleen Cc: Peter Zijlstra Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Sasha Levin --- Change log: v5: - use a flag to notify printk not to print unimportant messages, while keep panic blinking and sysrq working, as suggested by Petr Mladek v4: - make the local_irq_enable conditional and default off to cover possible use of interrupt/scheduling, as mentioned by Sergey and Petr v3: - Make the change log clearer as suggested by Andrew Morton v2: - Move the solution from hacking arch/scheduler code back to panic.c drivers/tty/sysrq.c| 6 ++ include/linux/kernel.h | 1 + kernel/panic.c | 5 + kernel/printk/printk.c | 4 4 files changed, 16 insertions(+) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 1f03078..763e24c 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -527,8 +527,12 @@ void __handle_sysrq(int key, bool check_mask) { struct sysrq_key_op *op_p; int orig_log_level; + int orig_panic_suppress_printk; int i; + orig_panic_suppress_printk = panic_suppress_printk; + panic_suppress_printk = 0; + rcu_sysrq_start(); rcu_read_lock(); /* @@ -574,6 +578,8 @@ void __handle_sysrq(int key, bool check_mask) } rcu_read_unlock(); rcu_sysrq_end(); + + panic_suppress_printk = orig_panic_suppress_printk; } void handle_sysrq(int key) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 8f0e68e..4120f3a 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -534,6 +534,7 @@ extern int panic_on_io_nmi; extern int panic_on_warn; extern int sysctl_panic_on_rcu_stall; extern int sysctl_panic_on_stackoverflow; +extern int panic_suppress_printk; extern bool crash_kexec_post_notifiers; diff --git a/kernel/panic.c b/kernel/panic.c index f121e6b..b9f004e 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -46,6 +46,8 @@ int panic_on_warn __read_mostly; int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); +int panic_suppress_printk; + #define PANIC_PRINT_TASK_INFO 0x0001 #define PANIC_PRINT_MEM_INFO 0x0002 #define PANIC_PRINT_TIMER_INFO 0x0004 @@ -326,6 +328,9 @@ void panic(const char *fmt, ...) } #endif pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); + + /* Do not scroll important messages printed above */ + panic_suppress_printk = 1; local_irq_enable(); for (i = 0; ; i += PANIC_TIMER_STEP) { touch_softlockup_watchdog(); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d3d1703..d460144 100644 --- a/kernel/printk/pr
[PATCH v5] panic: Avoid the extra noise dmesg
When kernel panic happens, it will first print the panic call stack, then the ending msg like: [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] [ cut here ] The above message are very useful for debugging. But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below: WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace: try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt For people working in console mode, the screen will first show the panic call stack, but immediately overridden by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen. Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context. Adding a flag panic_suppress_printk to avoid those noisy message, without changing the current kernel behavior that both panic blinking and sysrq magic key can work as is, suggested by Petr Mladek. To verify this, make sure kernel is not configured to reboot on panic and in console # echo c > /proc/sysrq-trigger to see if console only prints out the panic call stack. Suggested-by: Petr Mladek Signed-off-by: Feng Tang Cc: Thomas Gleixner Cc: Kees Cook Cc: Borislav Petkov Cc: Sergey Senozhatsky Cc: Petr Mladek Cc: Steven Rostedt Cc: Andi Kleen Cc: Peter Zijlstra Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Sasha Levin --- Change log: v5: - use a flag to notify printk not to print unimportant messages, while keep panic blinking and sysrq working, as suggested by Petr Mladek v4: - make the local_irq_enable conditional and default off to cover possible use of interrupt/scheduling, as mentioned by Sergey and Petr v3: - Make the change log clearer as suggested by Andrew Morton v2: - Move the solution from hacking arch/scheduler code back to panic.c drivers/tty/sysrq.c| 6 ++ include/linux/kernel.h | 1 + kernel/panic.c | 5 + kernel/printk/printk.c | 5 + 4 files changed, 17 insertions(+) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 1f03078..763e24c 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -527,8 +527,12 @@ void __handle_sysrq(int key, bool check_mask) { struct sysrq_key_op *op_p; int orig_log_level; + int orig_panic_suppress_printk; int i; + orig_panic_suppress_printk = panic_suppress_printk; + panic_suppress_printk = 0; + rcu_sysrq_start(); rcu_read_lock(); /* @@ -574,6 +578,8 @@ void __handle_sysrq(int key, bool check_mask) } rcu_read_unlock(); rcu_sysrq_end(); + + panic_suppress_printk = orig_panic_suppress_printk; } void handle_sysrq(int key) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 8f0e68e..4120f3a 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -534,6 +534,7 @@ extern int panic_on_io_nmi; extern int panic_on_warn; extern int sysctl_panic_on_rcu_stall; extern int sysctl_panic_on_stackoverflow; +extern int panic_suppress_printk; extern bool crash_kexec_post_notifiers; diff --git a/kernel/panic.c b/kernel/panic.c index f121e6b..b9f004e 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -46,6 +46,8 @@ int panic_on_warn __read_mostly; int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); +int panic_suppress_printk; + #define PANIC_PRINT_TASK_INFO 0x0001 #define PANIC_PRINT_MEM_INFO 0x0002 #define PANIC_PRINT_TIMER_INFO 0x0004 @@ -326,6 +328,9 @@ void panic(const char *fmt, ...) } #endif pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); + + /* Do not scroll important messages printed above */ + panic_suppress_printk = 1; local_irq_enable(); for (i = 0; ; i += PANIC_TIMER_STEP) { touch_softlockup_watchdog(); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d3d1703..4f44d64 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1899,6 +1899,10 @@ asmlinkage int vprintk_emit(int facility, int level, unsigned long flags; u64 curr_log_seq; + /* Suppress unimportant messages after panic happens */ + if (unlikely(panic_suppress_pri