On Fri 2018-04-20 10:17:51, Steven Rostedt wrote: > On Fri, 20 Apr 2018 16:01:57 +0200 > Petr Mladek <pmla...@suse.com> wrote: > > On Fri 2018-04-20 08:04:28, Steven Rostedt wrote: > > > The problem is the way rate limit works. If you print 100 lines (or > > > 1000) in 5 seconds, then you just stopped printing from that context > > > for 59 minutes and 55 seconds. That's a long time to block printing. > > > > Are we talking about the same context? > > > > I am talking about console drivers called from console_unlock(). It is > > very special context because it is more or less recursive: > > > > + could cause infinite loop > > + the errors are usually the same again and again > > The check is only when console_owner == current, which can easily > happen with an interrupt let alone an NMI.
Yeah. Sergey pointed this out and I suggested to update it to if (console_owner == current && !in_nmi() && !__ratelimit(&ratelimit_console)) return 0; Only messages from console drivers called from console_unlock() should be ratelimited. Ratelimiting any other messages was not intended (is a bug). The above does not handle recursion in NMI. But console drivers are called from NMI only when we flush consoles in panic(). I wonder if it is worth the effort. > > > What happens if you had a couple of NMIs go off that takes up that > > > time, and then you hit a bug 10 minutes later from that context. You > > > just lost it. > > > > I do not understand how this is related to the NMI context. > > The messages in NMI context are not throttled! > > > > OK, the original patch throttled also NMI messages when NMI > > interrupted console drivers. But it is easy to fix. > > My mistake in just mentioning NMIs, because the check is on > console_owner which can be set with interrupts enabled. That means an > interrupt that does a print could hide printks from other interrupts or > NMIs when console_owner is set. No, call_console_drivers() is done with interrupts disabled: console_lock_spinning_enable(); stop_critical_timings(); /* don't trace print latency */ ----> call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); if (console_lock_spinning_disable_and_check()) { ----> printk_safe_exit_irqrestore(flags); goto out; } ----> printk_safe_exit_irqrestore(flags); They were called with interrupts disabled for ages, long before printk_safe. In fact, it was all the time in the git kernel history. Therefore only NMIs are in the game. And they should be solved by the above change. > > I proposed that long delay because I want to be on the safe side. > > Also I do not see a huge benefit in repeating the same messages > > too often. > > Actually, I think we are fine with the one hour and 1000 prints if we > add to the condition. great > It can't just check console_owner. We need a way > to know that this is indeed a recursion. Perhaps we should set the > context we are in when setting console owner. Something like I have in > the ring buffer code. > enum { > CONTEXT_NONE, > CONTEXT_NMI, > CONTEXT_IRQ, > CONTEXT_SOFTIRQ, > CONTEXT_NORMAL > }; > > int get_context(void) > { > unsigned long pc = preempt_count(); > > if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) > return CONTEXT_NORMAL; > else > return pc & NMI_MASK ? CONTEXT_NMI : > pc & HARDIRQ_MASK ? CONTEXT_IRQ : CONTEXT_SOFTIRQ; > } We actually would need this only when flushing consoles in NMI in panic(). I am not sure of it is worth the effort. Best Regards, Petr