> -----Original Message----- > From: Jan Kara [mailto:j...@suse.com] > Sent: Wednesday, August 19, 2015 8:38 AM > To: Andrew Morton <a...@linux-foundation.org> > Cc: LKML <linux-kernel@vger.kernel.org>; pmla...@suse.com; > rost...@goodmis.org; Gavin Hu <gavin.hu.2...@gmail.com>; KY Srinivasan > <k...@microsoft.com>; Jan Kara <j...@suse.cz> > Subject: [PATCH 0/4] printk: Softlockup avoidance > > From: Jan Kara <j...@suse.cz> > > Hello, > > since lately there were several attempts at dealing with softlockups due > to heavy printk traffic [1] [2] and I've been also privately pinged by > couple of people about the state of the patch set, I've decided to respin > the patch set. > > To remind the original problem: > > Currently, console_unlock() prints messages from kernel printk buffer to > console while the buffer is non-empty. When serial console is attached, > printing is slow and thus other CPUs in the system have plenty of time > to append new messages to the buffer while one CPU is printing. Thus the > CPU can spend unbounded amount of time doing printing in > console_unlock(). > This is especially serious when printk() gets called under some critical > spinlock or with interrupts disabled. > > In practice users have observed a CPU can spend tens of seconds printing > in console_unlock() (usually during boot when hundreds of SCSI devices > are discovered) resulting in RCU stalls (CPU doing printing doesn't > reach quiescent state for a long time), softlockup reports (IPIs for the > printing CPU don't get served and thus other CPUs are spinning waiting > for the printing CPU to process IPIs), and eventually a machine death > (as messages from stalls and lockups append to printk buffer faster than > we are able to print). So these machines are unable to boot with serial > console attached. Also during artificial stress testing SATA disk > disappears from the system because its interrupts aren't served for too > long. > > This series addresses the problem in the following way: If CPU has printed > more that printk_offload (defaults to 1000) characters, it wakes up one > of dedicated printk kthreads (we don't use workqueue because that has > deadlock potential if printk was called from workqueue code). Once we find > out kthread is spinning on a lock, we stop printing, drop console_sem, and > let kthread continue printing. Since there are two printing kthreads, they > will pass printing between them and thus no CPU gets hogged by printing. > > Changes since the last posting [3]: > * I have replaced the state machine to pass printing and spinning on > console_sem with a simple spinlock which makes the code > somewhat easier to read and verify. > * Some of the patches were merged so I dropped them. > > Honza
Thanks Jan. I would like to add that the problem described here is further aggravated in virtual machines and the solution proposed here effectively solves the problem. Regards, K. Y > > [1] > https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2flkml.or > g%2flkml%2f2015%2f7%2f8%2f215&data=01%7c01%7ckys%40microsoft.com > %7c0be64449b7734417b58e08d2a8ac4215%7c72f988bf86f141af91ab2d7cd011 > db47%7c1&sdata=tIGC5%2bms890etIzVbaj3x3B3XUrgC54C79vaniZzRIY%3d > [2] > https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fmarc.inf > o%2f%3fl%3dlinux- > kernel%26m%3d143929238407816%26w%3d2&data=01%7c01%7ckys%40micr > osoft.com%7c0be64449b7734417b58e08d2a8ac4215%7c72f988bf86f141af91a > b2d7cd011db47%7c1&sdata=DFEq8NILXnLGTo%2fscI5zjzWrX9%2buJlj9lmo8r > ahuIt0%3d > [3] > https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2flkml.or > g%2flkml%2f2014%2f3%2f17%2f68&data=01%7c01%7ckys%40microsoft.com > %7c0be64449b7734417b58e08d2a8ac4215%7c72f988bf86f141af91ab2d7cd011 > db47%7c1&sdata=j9uJalk7Cup0q78gl8rgIIjySU0l7HIwk1AhYJ5cAd4%3d -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/