On Fri 2017-04-07 16:46:34, Sergey Senozhatsky wrote: > On (04/07/17 09:15), Pavel Machek wrote: > > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote: > > > Hello, > > > > > > On (04/06/17 19:33), Pavel Machek wrote: > > > > > This patch set gives up part of the printk() reliability for bounded > > > > > latency (at least unless we detect we are really in trouble) which is > > > > > IMHO > > > > > a good trade-off for lots of users (and others can just turn this > > > > > feature > > > > > off). > > > > > > > > If they can ever realize they were bitten by this feature. > > > > > > > > Can we go for different tradeoff? > > > > > > > > In console_unlock(), if you detect too much work, print "Too many > > > > messages to print, %d bytes delayed" and wake up kernel thread. > > > > > > "too many messages" is undefined. console_unlock() can be called from > > > IRQ handler or with preemtion disabled, or under spin_lock, or under > > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthread > > > from console_unlock() it may be already too late. > > > > So lets define "too many messages" as 240 characters. We know printk > > worked rather well for us for more than 20 years. Kernel code is used > > to printk taking few miliseconds. > > serial console can be quite slow. and port->lock, that is acquired by > console_unlock()->call_console_drivers()->write(), is also accessible > by serial driver's IRQ handler, and this lock may be busy long > enough -- as long as that IRQ handler transmits/receives chars. but > that's not the point.
Well. This is what we had for 20 years.
> [..]
> > Yeah? So you know modified printk() does not work, that's why
> > "emergency mode" exists. Unfortunately, you can't rely on fact that
> > you can detect half-crashed machines by printk levels. You usually
> > can't.
>
> I'm not happy with those printk_emergency_begin()/end(), sure. but that's
> the reality -- every single solution that would offload printing duty implies
> that there will be cases when offloading would not be possible. either
> PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local CPU,
> or anything else (um... what it is?... softirq? tasklet? print one logbuf
> entry from every IRQ handler? dunno, anything else?). There will be cases
> when we won't be able to expect that something will take over and finish
> printing for us. Well, may be I'm missing some other solution that would
> offload printing, eliminating lockup conditions, and at the same time work
> in 100% of the cases.
I don't have magic solution in my sleeve. You made a good case that
spending 30 seconds in printk() is a bad idea. I agree with that. Your
solution is to introduce printk_emergency_begin()/end(). I don't agree
there.
I believe "spend at most 2 seconds in printk(), then print a warning
and offload" is a solution closer to what we had before.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
signature.asc
Description: Digital signature

