On Wed 14-08-13 21:38:18, Steven Rostedt wrote: > On Wed, 14 Aug 2013 15:28:27 +0200 > Jan Kara <j...@suse.cz> wrote: > > > A CPU can be caught in console_unlock() for a long time (tens of seconds > > are reported by our customers) when other CPUs are using printk heavily > > and serial console makes printing slow. Despite serial console drivers > > are calling touch_nmi_watchdog() this triggers softlockup warnings > > because interrupts are disabled for the whole time console_unlock() runs > > (e.g. vprintk() calls console_unlock() with interrupts disabled). Thus > > IPIs cannot be processed and other CPUs get stuck spinning in calls like > > smp_call_function_many(). Also RCU eventually starts reporting lockups. > > > > In my artifical testing I can also easily trigger a situation when disk > > disappears from the system apparently because commands to / from it > > could not be delivered for long enough. This is why just silencing > > watchdogs isn't a reliable solution to the problem and we simply have to > > avoid spending too long in console_unlock() with interrupts disabled. > > > > The solution this patch works toward is to postpone printing to a later > > moment / different CPU when we already printed over 1000 characters in > > current console_unlock() invocation. This is a crude heuristic but > > measuring time we spent printing doesn't seem to be really viable - we > > cannot rely on high resolution time being available and with interrupts > > disabled jiffies are not updated. The value 1000 was chosen so that > > things are still bearable with 9600 baud serial console and OTOH it > > shouldn't cause offloading of printing in common cases. > > > > Signed-off-by: Jan Kara <j...@suse.cz> > > --- > > kernel/printk/printk.c | 62 > > ++++++++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 52 insertions(+), 10 deletions(-) > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 9208e17..28c220a 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -2026,31 +2026,49 @@ out: > > raw_spin_unlock_irqrestore(&logbuf_lock, flags); > > } > > > > +/* > > + * How much characters can we print in one call of printk before offloading > > + * printing work > > + */ > > +#define MAX_PRINTK_CHARS 1000 > > This really needs to be configurable. 1000 may be good for you, but may > not be good for others. Heck, we should be able to disable this > offloading too. > > This should probably be added to the printk /proc/sys/kernel/... files. > > If it is set to zero, then disable any offloading. OK, I can easily do this. I'm somewhat skeptical people would want to tune this but maybe they would.
Honza > > + > > +/* Should we stop printing on this cpu? */ > > +static bool cpu_stop_printing(int printed_chars) > > +{ > > + /* Oops? Print everything now to maximize chances user will see it */ > > + if (oops_in_progress) > > + return false; > > + return printed_chars > MAX_PRINTK_CHARS; > > +} > > + > > /** -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/