Re: [PATCH v5] panic: Avoid the extra noise dmesg

2019-03-01 Thread Steven Rostedt
On Fri, 1 Mar 2019 09:54:01 +0100
Petr Mladek  wrote:

I'll put my $0.02 in, and say I like the idea of the patch too.


> > > Thoughts:
> > > 
> > > - why do the suppression in vprintk_emit()?  Doing it right at entry
> > >   to printk() seems cleaner, more explicit?  
> > 
> > Yes, I put it in printk() in one earlier post, and Petr suggested to
> > put it into vprintk_emit so that it works for all printk() interfaces,
> > like the devkmsg_write -> printk_emit -> vprintk_emit path.  
> 
> Yes, there are more printk interfaces. The check in vprintk_emit()
> allows to calm down also prink_deferred() and dev_printk().

Agreed.

> 
> > > - Other code sites may wish to suppress all printks.  Perhaps
> > >   `panic_suppress_printk' should just be called `suppress_printk'?  
> > Ok, then I'll move the definition from panic.c to printk code.  
> 
> This change looks fine to me.
> 
> Best Regards,
> Petr
> 
> PS: I am sorry for the late review. I have spent many days with
> reviewing a proposal of rewrite of printk() internals.

Thanks for doing that, BTW.

-- Steve


Re: [PATCH v5] panic: Avoid the extra noise dmesg

2019-03-01 Thread Petr Mladek
On Fri 2019-03-01 12:11:31, Feng Tang wrote:
> Hi Andrew,
> 
> Thanks for the review!
> 
> On Thu, Feb 28, 2019 at 12:00:14PM -0800, Andrew Morton wrote:
> > On Fri, 22 Feb 2019 14:09:59 +0800 Feng Tang  wrote:
> > 
> > > When kernel panic happens, it will first print the panic call stack,
> > > then the ending msg like:
> > > 
> > > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > > [   35.749975] [ cut here ]
> > > 
> > > The above message are very useful for debugging.
> > > 
> > > But if system is configured to not reboot on panic, say the 
> > > "panic_timeout"
> > > parameter equals 0, it will likely print out many noisy message like
> > > WARN() call stack for each and every CPU except the panic one, messages
> > > like below:
> > > 
> > >   WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 
> > > set_task_cpu+0x183/0x190
> > >   Call Trace:
> > >   
> > >   try_to_wake_up
> > >   default_wake_function
> > >   autoremove_wake_function
> > >   __wake_up_common
> > >   __wake_up_common_lock
> > >   __wake_up
> > >   wake_up_klogd_work_func
> > >   irq_work_run_list
> > >   irq_work_tick
> > >   update_process_times
> > >   tick_sched_timer
> > >   __hrtimer_run_queues
> > >   hrtimer_interrupt
> > >   smp_apic_timer_interrupt
> > >   apic_timer_interrupt
> > 
> > It's a fairly ugly-looking patch but I am inclined to agree.
> 
> Yes, it's ugly :) we've changed 3 methods to tackle this.
> 
> > The panicing CPU is spinning and blinking a LED and all CPUs have
> > interrupts enabled and the system is known to be in a messed up state. 
> > All sorts of kernel code could emit all sorts of output in such
> > circumstances.  So a global printk-killing knob seems appropriate.
> > 
> > Thoughts:
> > 
> > - why do the suppression in vprintk_emit()?  Doing it right at entry
> >   to printk() seems cleaner, more explicit?
> 
> Yes, I put it in printk() in one earlier post, and Petr suggested to
> put it into vprintk_emit so that it works for all printk() interfaces,
> like the devkmsg_write -> printk_emit -> vprintk_emit path.

Yes, there are more printk interfaces. The check in vprintk_emit()
allows to calm down also prink_deferred() and dev_printk().

> > - Other code sites may wish to suppress all printks.  Perhaps
> >   `panic_suppress_printk' should just be called `suppress_printk'?
> Ok, then I'll move the definition from panic.c to printk code.

This change looks fine to me.

Best Regards,
Petr

PS: I am sorry for the late review. I have spent many days with
reviewing a proposal of rewrite of printk() internals.


Re: [PATCH v5] panic: Avoid the extra noise dmesg

2019-02-28 Thread Feng Tang
Hi Andrew,

Thanks for the review!

On Thu, Feb 28, 2019 at 12:00:14PM -0800, Andrew Morton wrote:
> On Fri, 22 Feb 2019 14:09:59 +0800 Feng Tang  wrote:
> 
> > When kernel panic happens, it will first print the panic call stack,
> > then the ending msg like:
> > 
> > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > [   35.749975] [ cut here ]
> > 
> > The above message are very useful for debugging.
> > 
> > But if system is configured to not reboot on panic, say the "panic_timeout"
> > parameter equals 0, it will likely print out many noisy message like
> > WARN() call stack for each and every CPU except the panic one, messages
> > like below:
> > 
> > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 
> > set_task_cpu+0x183/0x190
> > Call Trace:
> > 
> > try_to_wake_up
> > default_wake_function
> > autoremove_wake_function
> > __wake_up_common
> > __wake_up_common_lock
> > __wake_up
> > wake_up_klogd_work_func
> > irq_work_run_list
> > irq_work_tick
> > update_process_times
> > tick_sched_timer
> > __hrtimer_run_queues
> > hrtimer_interrupt
> > smp_apic_timer_interrupt
> > apic_timer_interrupt
> 
> It's a fairly ugly-looking patch but I am inclined to agree.

Yes, it's ugly :) we've changed 3 methods to tackle this.

> The panicing CPU is spinning and blinking a LED and all CPUs have
> interrupts enabled and the system is known to be in a messed up state. 
> All sorts of kernel code could emit all sorts of output in such
> circumstances.  So a global printk-killing knob seems appropriate.
> 
> Thoughts:
> 
> - why do the suppression in vprintk_emit()?  Doing it right at entry
>   to printk() seems cleaner, more explicit?

Yes, I put it in printk() in one earlier post, and Petr suggested to
put it into vprintk_emit so that it works for all printk() interfaces,
like the devkmsg_write -> printk_emit -> vprintk_emit path.

> 
> - Other code sites may wish to suppress all printks.  Perhaps
>   `panic_suppress_printk' should just be called `suppress_printk'?
Ok, then I'll move the definition from panic.c to printk code.

Thanks,
Feng



Re: [PATCH v5] panic: Avoid the extra noise dmesg

2019-02-28 Thread Andrew Morton
On Fri, 22 Feb 2019 14:09:59 +0800 Feng Tang  wrote:

> When kernel panic happens, it will first print the panic call stack,
> then the ending msg like:
> 
> [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> [   35.749975] [ cut here ]
> 
> The above message are very useful for debugging.
> 
> But if system is configured to not reboot on panic, say the "panic_timeout"
> parameter equals 0, it will likely print out many noisy message like
> WARN() call stack for each and every CPU except the panic one, messages
> like below:
> 
>   WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 
> set_task_cpu+0x183/0x190
>   Call Trace:
>   
>   try_to_wake_up
>   default_wake_function
>   autoremove_wake_function
>   __wake_up_common
>   __wake_up_common_lock
>   __wake_up
>   wake_up_klogd_work_func
>   irq_work_run_list
>   irq_work_tick
>   update_process_times
>   tick_sched_timer
>   __hrtimer_run_queues
>   hrtimer_interrupt
>   smp_apic_timer_interrupt
>   apic_timer_interrupt

It's a fairly ugly-looking patch but I am inclined to agree.

The panicing CPU is spinning and blinking a LED and all CPUs have
interrupts enabled and the system is known to be in a messed up state. 
All sorts of kernel code could emit all sorts of output in such
circumstances.  So a global printk-killing knob seems appropriate.

Thoughts:

- why do the suppression in vprintk_emit()?  Doing it right at entry
  to printk() seems cleaner, more explicit?

- Other code sites may wish to suppress all printks.  Perhaps
  `panic_suppress_printk' should just be called `suppress_printk'?




Re: [PATCH v5] panic: Avoid the extra noise dmesg

2019-02-21 Thread Feng Tang
One small cleanup of unneeded change in last post:


>From 4ecbdfe1b398aa2285a696392eddeffb109d5463 Mon Sep 17 00:00:00 2001
From: Feng Tang 
Date: Thu, 21 Feb 2019 16:23:32 +0800
Subject: [PATCH v5] panic: Avoid the extra noise dmesg

When kernel panic happens, it will first print the panic call stack,
then the ending msg like:

[   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[   35.749975] [ cut here ]

The above message are very useful for debugging.

But if system is configured to not reboot on panic, say the "panic_timeout"
parameter equals 0, it will likely print out many noisy message like
WARN() call stack for each and every CPU except the panic one, messages
like below:

WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 
set_task_cpu+0x183/0x190
Call Trace:

try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt

For people working in console mode, the screen will first show the panic
call stack, but immediately overridden by these noisy extra messages, which
makes debugging much more difficult, as the original context gets lost on
screen.

Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real panic
call stack and context.

Adding a flag panic_suppress_printk to avoid those noisy message, without
changing the current kernel behavior that both panic blinking and sysrq
magic key can work as is, suggested by Petr Mladek.

To verify this, make sure kernel is not configured to reboot on panic and
in console
 # echo c > /proc/sysrq-trigger
to see if console only prints out the panic call stack.

Suggested-by: Petr Mladek 
Signed-off-by: Feng Tang 
Cc: Thomas Gleixner 
Cc: Kees Cook 
Cc: Borislav Petkov 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: Sasha Levin 
---
Change log:

  v5:
- use a flag to notify printk not to print unimportant
  messages, while keep panic blinking and sysrq working,
  as suggested by Petr Mladek

  v4:
- make the local_irq_enable conditional and default off
to cover possible use of interrupt/scheduling, as
mentioned by Sergey and Petr

  v3:
- Make the change log clearer as suggested by Andrew Morton

  v2:
- Move the solution from hacking arch/scheduler code back
to panic.c



 drivers/tty/sysrq.c| 6 ++
 include/linux/kernel.h | 1 +
 kernel/panic.c | 5 +
 kernel/printk/printk.c | 4 
 4 files changed, 16 insertions(+)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 1f03078..763e24c 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -527,8 +527,12 @@ void __handle_sysrq(int key, bool check_mask)
 {
struct sysrq_key_op *op_p;
int orig_log_level;
+   int orig_panic_suppress_printk;
int i;
 
+   orig_panic_suppress_printk = panic_suppress_printk;
+   panic_suppress_printk = 0;
+
rcu_sysrq_start();
rcu_read_lock();
/*
@@ -574,6 +578,8 @@ void __handle_sysrq(int key, bool check_mask)
}
rcu_read_unlock();
rcu_sysrq_end();
+
+   panic_suppress_printk = orig_panic_suppress_printk;
 }
 
 void handle_sysrq(int key)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 8f0e68e..4120f3a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -534,6 +534,7 @@ extern int panic_on_io_nmi;
 extern int panic_on_warn;
 extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_panic_on_stackoverflow;
+extern int panic_suppress_printk;
 
 extern bool crash_kexec_post_notifiers;
 
diff --git a/kernel/panic.c b/kernel/panic.c
index f121e6b..b9f004e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -46,6 +46,8 @@ int panic_on_warn __read_mostly;
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
 
+int panic_suppress_printk;
+
 #define PANIC_PRINT_TASK_INFO  0x0001
 #define PANIC_PRINT_MEM_INFO   0x0002
 #define PANIC_PRINT_TIMER_INFO 0x0004
@@ -326,6 +328,9 @@ void panic(const char *fmt, ...)
}
 #endif
pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
+
+   /* Do not scroll important messages printed above */
+   panic_suppress_printk = 1;
local_irq_enable();
for (i = 0; ; i += PANIC_TIMER_STEP) {
touch_softlockup_watchdog();
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d3d1703..d460144 100644
--- a/kernel/printk/pr

[PATCH v5] panic: Avoid the extra noise dmesg

2019-02-21 Thread Feng Tang
When kernel panic happens, it will first print the panic call stack,
then the ending msg like:

[   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[   35.749975] [ cut here ]

The above message are very useful for debugging.

But if system is configured to not reboot on panic, say the "panic_timeout"
parameter equals 0, it will likely print out many noisy message like
WARN() call stack for each and every CPU except the panic one, messages
like below:

WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 
set_task_cpu+0x183/0x190
Call Trace:

try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt

For people working in console mode, the screen will first show the panic
call stack, but immediately overridden by these noisy extra messages, which
makes debugging much more difficult, as the original context gets lost on
screen.

Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real panic
call stack and context.

Adding a flag panic_suppress_printk to avoid those noisy message, without
changing the current kernel behavior that both panic blinking and sysrq
magic key can work as is, suggested by Petr Mladek.

To verify this, make sure kernel is not configured to reboot on panic and
in console
 # echo c > /proc/sysrq-trigger
to see if console only prints out the panic call stack.

Suggested-by: Petr Mladek 
Signed-off-by: Feng Tang 
Cc: Thomas Gleixner 
Cc: Kees Cook 
Cc: Borislav Petkov 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: Sasha Levin 
---
Change log:

  v5:
- use a flag to notify printk not to print unimportant
  messages, while keep panic blinking and sysrq working,
  as suggested by Petr Mladek

  v4:
- make the local_irq_enable conditional and default off
to cover possible use of interrupt/scheduling, as
mentioned by Sergey and Petr

  v3:
- Make the change log clearer as suggested by Andrew Morton

  v2:
- Move the solution from hacking arch/scheduler code back
to panic.c

 drivers/tty/sysrq.c| 6 ++
 include/linux/kernel.h | 1 +
 kernel/panic.c | 5 +
 kernel/printk/printk.c | 5 +
 4 files changed, 17 insertions(+)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 1f03078..763e24c 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -527,8 +527,12 @@ void __handle_sysrq(int key, bool check_mask)
 {
struct sysrq_key_op *op_p;
int orig_log_level;
+   int orig_panic_suppress_printk;
int i;
 
+   orig_panic_suppress_printk = panic_suppress_printk;
+   panic_suppress_printk = 0;
+
rcu_sysrq_start();
rcu_read_lock();
/*
@@ -574,6 +578,8 @@ void __handle_sysrq(int key, bool check_mask)
}
rcu_read_unlock();
rcu_sysrq_end();
+
+   panic_suppress_printk = orig_panic_suppress_printk;
 }
 
 void handle_sysrq(int key)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 8f0e68e..4120f3a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -534,6 +534,7 @@ extern int panic_on_io_nmi;
 extern int panic_on_warn;
 extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_panic_on_stackoverflow;
+extern int panic_suppress_printk;
 
 extern bool crash_kexec_post_notifiers;
 
diff --git a/kernel/panic.c b/kernel/panic.c
index f121e6b..b9f004e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -46,6 +46,8 @@ int panic_on_warn __read_mostly;
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
 
+int panic_suppress_printk;
+
 #define PANIC_PRINT_TASK_INFO  0x0001
 #define PANIC_PRINT_MEM_INFO   0x0002
 #define PANIC_PRINT_TIMER_INFO 0x0004
@@ -326,6 +328,9 @@ void panic(const char *fmt, ...)
}
 #endif
pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
+
+   /* Do not scroll important messages printed above */
+   panic_suppress_printk = 1;
local_irq_enable();
for (i = 0; ; i += PANIC_TIMER_STEP) {
touch_softlockup_watchdog();
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d3d1703..4f44d64 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1899,6 +1899,10 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
u64 curr_log_seq;
 
+   /* Suppress unimportant messages after panic happens */
+   if (unlikely(panic_suppress_pri