Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On (06/20/18 12:38), Linus Torvalds wrote: > On Wed, Jun 20, 2018 at 11:50 AM Sergey Senozhatsky > wrote: > > > > It's not UART on its own that immediately calls into printk(), that would > > be trivial to fix, it's all those subsystems that serial console driver > > can call into. > > We already have the whole PRINTK_SAFE_CONTEXT_MASK model that only > adds it to a secondary buffer if you get recursion. Why isn't that > triggering? That's the whole point of it. Linus, Alan, Steven, are you on board with the patch set? What shall I do to improve it? PRINTK_SAFE_CONTEXT_MASK is what we answer nowadays when someone says "printk causes deadlocks". We really can't remove all printk-s that can cause uart->...->printk->uart recursion, and the only other option is to use spin_trylock on uart_port->lock in every driver and discard con->write() if we see that we have re-entered uart. I'd rather use per-CPU printk_safe buffer in this case. -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On (06/22/18 17:21), Alan Cox wrote: > On Wed, 20 Jun 2018 11:44:13 +0900 > Linus Torvalds wrote: > > > On Wed, Jun 20, 2018 at 11:34 AM Steven Rostedt wrote: > > > > > > Perhaps we should do an audit of the console drivers and remove all > > > printk, pr_* , WARN*, BUG* from them. > > > > Only the actual _printing_ parts. > > No because they are normally rather useful because that port isn't the > console. If you trylock trylock is boring, me wants printk_safe_mask everywhere :) > Really that's all that you need - log the message to whichever console > targets you can currently safely do so. If it's none well there was > always the proposed morse code keyboard light driver 8) Hm, just discard messages? With printk_safe_mask we keep everything in a lockless per-CPU buffer, which we flush [per-CPU buffer -> printk logbuf] from irq_work, so we can print it later. -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Fri, Jun 22, 2018 at 05:21:00PM +0100, Alan Cox wrote: > Really that's all that you need - log the message to whichever console > targets you can currently safely do so. If it's none well there was > always the proposed morse code keyboard light driver 8) Only if your keyboard still has blinky lights on.. (mine doesn't)
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Wed, 20 Jun 2018 11:44:13 +0900 Linus Torvalds wrote: > On Wed, Jun 20, 2018 at 11:34 AM Steven Rostedt wrote: > > > > Perhaps we should do an audit of the console drivers and remove all > > printk, pr_* , WARN*, BUG* from them. > > Only the actual _printing_ parts. No because they are normally rather useful because that port isn't the console. If you trylock it as I suggested then at least you'd just drop the recursive cases and get messages otherwise. Really that's all that you need - log the message to whichever console targets you can currently safely do so. If it's none well there was always the proposed morse code keyboard light driver 8) Alan
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On (06/20/18 12:38), Linus Torvalds wrote: > On Wed, Jun 20, 2018 at 11:50 AM Sergey Senozhatsky > wrote: > > > > It's not UART on its own that immediately calls into printk(), that would > > be trivial to fix, it's all those subsystems that serial console driver > > can call into. > > We already have the whole PRINTK_SAFE_CONTEXT_MASK model that only > adds it to a secondary buffer if you get recursion. Why isn't that > triggering? That's the whole point of it. This is exactly what I'm doing in my patch set. PRINTK_SAFE_CONTEXT_MASK so far worked *one* way only: when we start from printk.c IOW: printk -> printk_safe_mask -> vsprinf -> printk But we also can have printk-related deadlocks the *other* way around. For instance: uart -> printk -> uart printk_safe_mask is not triggering there because we don't use printk_safe in uart / tty yet. And this is what I do in my patch set - extend printk_safe usage. The patch set does not add any _new_ locks or locking rules. It just replaces the existing spin_lock(a) with prinkt_safe_enter(); spin_lock(a) and spin_unlock(a) with spin_unlock(a) printk_safe_exit(); and that's it. So now we use printk_safe mechanism to avoid another bunch of deadlock scenarious: which don't start from printk, but from parts of the kernel which printk eventually calls. -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Wed, Jun 20, 2018 at 11:50 AM Sergey Senozhatsky wrote: > > It's not UART on its own that immediately calls into printk(), that would > be trivial to fix, it's all those subsystems that serial console driver > can call into. We already have the whole PRINTK_SAFE_CONTEXT_MASK model that only adds it to a secondary buffer if you get recursion. Why isn't that triggering? That's the whole point of it. I absolutely do *not* want to see any crazy changes to tty drivers. No no no. Linus
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On (06/19/18 22:34), Steven Rostedt wrote: > > > There is no valid reason why an UART driver should do a printk() of > > any sort inside the critical region where the console is locked. > > > > Just remove those printk's, don't add new crazy locking. > > Perhaps we should do an audit of the console drivers and remove all > printk, pr_* , WARN*, BUG* from them. I think I did a terrible job explaining my motivation. Sorry for that! What I tried to address with my patch set was not a direct uart->printk, but mostly all those uart-> tty / core kernel / who knows what else -> printk cases. When are in that special context "called from uart driver" which can backfire on us. -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On (06/20/18 11:01), Linus Torvalds wrote: > On Tue, Jun 19, 2018 at 5:30 PM Petr Mladek wrote: > > > > To make it clear. This patch set adds yet another spin_lock API. > > It behaves exactly as spin_lock_irqsafe()/spin_unlock_irqrestore() > > but in addition it sets printk_context. > > > > This patchset forces safe context around TTY and UART locks. > > In fact, the deferred context would be enough to prevent > > all the mentioned deadlocks. > > Please stop this garbage. I'm guessing the message was addressed to me, not to Petr. Let me explain myself. > The rule is simple: DO NOT DO THAT THEN. > > Don't make recursive locks. Don't make random complexity. Just stop > doing the thing that hurts. We didn't add any of those recursive printk()-s. Those are the things like kmalloc()/scheduler/etc which tty/etc invoke that hurt. > There is no valid reason why an UART driver should do a printk() of > any sort inside the critical region where the console is locked. It's not UART on its own that immediately calls into printk(), that would be trivial to fix, it's all those subsystems that serial console driver can call into. For instance, kernel/workqueue.c - it may WARN_ON/printk in various cases. And those WARNs/printks are OK. Except for one thing: workqueue can be called from a serial console driver, which suddenly will turn those WARNs/printks into illegal ones, due to possible deadlocks. And serial consoles can call into WQ. Not directly, but via tty code. A random example: s3c24xx_serial_rx_chars_dma() spin_lock_irqsave(&port->lock, flags); tty_flip_buffer_push() __queue_work() spin_unlock_irqrestore(&port->lock, flags); Should for some reason WQ warn/printk, we are done, because we will end up taking the same port->lock: __queue_work() printk() call_console_drivers() spin_lock_irqsave(&port->lock, flags); IOW, there is this tricky "we were called from a serial driver" context, which is hard to track, but printk_safe can help us in those cases. There are other examples. WQ is not the only thing serial drivers can call into. So that's why I sent the patch set. -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Wed, Jun 20, 2018 at 11:34 AM Steven Rostedt wrote: > > Perhaps we should do an audit of the console drivers and remove all > printk, pr_* , WARN*, BUG* from them. Only the actual _printing_ parts. Just randomly, look at drivers/tty/vt/vt.c that does a lot of printing, and there's a lot of valid printing. Things like pr_warn("Unable to create device for %s; errno = %ld\n", .. is fine - it's done at console registration time if things go sideways. But there are a few commented-out calls to "printk()" that are obviously bogus, because they are in the printing path. And they damn well should be commented out. The existence of something like that SHOULD ABSOLUTELY NOT be seen as a "hey, let's make up some crazy garbage locking ruls that would allow printing there". Just don't do it. It's that simple. Linus
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Wed, 20 Jun 2018 11:01:49 +0900 Linus Torvalds wrote: > There is no valid reason why an UART driver should do a printk() of > any sort inside the critical region where the console is locked. > > Just remove those printk's, don't add new crazy locking. Perhaps we should do an audit of the console drivers and remove all printk, pr_* , WARN*, BUG* from them. -- Steve
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Tue, Jun 19, 2018 at 5:30 PM Petr Mladek wrote: > > To make it clear. This patch set adds yet another spin_lock API. > It behaves exactly as spin_lock_irqsafe()/spin_unlock_irqrestore() > but in addition it sets printk_context. > > This patchset forces safe context around TTY and UART locks. > In fact, the deferred context would be enough to prevent > all the mentioned deadlocks. Please stop this garbage. The rule is simple: DO NOT DO THAT THEN. Don't make recursive locks. Don't make random complexity. Just stop doing the thing that hurts. There is no valid reason why an UART driver should do a printk() of any sort inside the critical region where the console is locked. Just remove those printk's, don't add new crazy locking. If you had a spinlock that deadlocked because it was inside an already spinlocked region, you'd say "that's buggy". This is the exact same issue. We don't work around buggy garbage. We fix the bug - by removing the problematic printk. Linus
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On (06/19/18 10:30), Petr Mladek wrote: > It is re-entrant via printk(). I mean that any printk() called inside > the locked sections might cause recursion if the same lock is needed > also by con->write() callbacks. Perhaps Alan meant that we cannot return back to tty once we passed the control from tty to printk -> uart serial console. So tty is probably (but I didn't check) not re-entrant, but uart definitely is re-entrant: IRQ -> uart console -> tty -> printk -> uart console. The patch set attempts to address several issues, and re-entrant uart is just one of them. [..] > This patchset forces safe context around TTY and UART locks. Right. > In fact, the deferred context would be enough to prevent > all the mentioned deadlocks. Agree. But we can leave it as a printk_safe implementation detail and change it later, outside of this patch, to avoid further confusion. > IMHO, the only question is if people might get familiar with > yet another spin_lock API. Right. That's why I thought about renaming uart_port and tty_port ->lock to ->lock. So the naming will suggest [hopefully] that those locks are not meant to be used directly [anymore] and instead people should use uart_port_lock/tty_port_lock API. -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Tue 2018-06-19 09:53:08, Sergey Senozhatsky wrote: > Thanks for taking a look! > > On (06/18/18 14:38), Alan Cox wrote: > > > It doesn't come as a surprise that recursive printk() calls are not the > > > only way for us to deadlock in printk() and we still have a whole bunch > > > of other printk() deadlock scenarios. For instance, those that involve > > > TTY port->lock spin_lock and UART port->lock spin_lock. > > > > The tty layer code there is not re-entrant. Nor is it supposed to be It is re-entrant via printk(). I mean that any printk() called inside the locked sections might cause recursion if the same lock is needed also by con->write() callbacks. > Could be. > > But at least we have circular locking dependency in tty, > see [1] for more details: > > tty_port->lock => uart_port->lock > >CPU0 >tty > spin_lock(&tty_port->lock) > printk() > call_console_drivers() >foo_console_write() > spin_lock(&uart_port->lock) > > Whereas we normally have > > uart_port->lock => tty_port->lock > >CPU1 >IRQ > foo_console_handle_IRQ() > spin_lock(&uart_port->lock) > tty >spin_lock(&tty_port->lock) This is even more complicated situation because printk() allowed an ABBA lock scenario. > If we switch to printk_safe when we take tty_port->lock then we > remove the printk->uart_port chain from the picture. > > > > So the idea of this patch set is to take tty_port->lock and > > > uart_port->lock from printk_safe context and to eliminate some > > > of non-recursive printk() deadlocks - the ones that don't start > > > in printk(), but involve console related locks and thus eventually > > > deadlock us in printk(). For this purpose the patch set introduces > > > several helper macros: > > > > I don't see how this helps - if you recurse into the uart code you are > > still hitting the paths that are unsafe when re-entered. All you've done > > is messed up a pile of locking code on critical performance paths. > > > > As it stands I think it's a bad idea. > > The only new thing is that we inc/dec per-CPU printk context > variable when we lock/unlock tty/uart port lock: > > printk_safe_enter() -> this_cpu_inc(printk_context); > printk_safe_exit() -> this_cpu_dec(printk_context); To make it clear. This patch set adds yet another spin_lock API. It behaves exactly as spin_lock_irqsafe()/spin_unlock_irqrestore() but in addition it sets printk_context. Where printk_context defines what printk implementation is safe. We basically have four possibilities: 1. normal (store in logbuf, try to handle consoles) 2. deferred (store in logbuf, deffer consoles) 3. safe (store in per-CPU buffer, deffer everything) 4. safe_nmi (store in another per-CPU buffer, deffer everything) This patchset forces safe context around TTY and UART locks. In fact, the deferred context would be enough to prevent all the mentioned deadlocks. IMHO, the only question is if people might get familiar with yet another spin_lock API. Best Regards, Petr
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
On Mon, Jun 18, 2018 at 02:38:18PM +0100, Alan Cox wrote: > Given printk nowdays is already somewhat unreliable with all the perf > related changes, printk is a steaming pile of @#$#@, unreliable doesn't even begin to cover it. > and we have other good debug tools What other good debug tools do you have? The only thing I have that actually works is earlyser/8250_early -- and I route everything printk into that.
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
Thanks for taking a look! On (06/18/18 14:38), Alan Cox wrote: > > It doesn't come as a surprise that recursive printk() calls are not the > > only way for us to deadlock in printk() and we still have a whole bunch > > of other printk() deadlock scenarios. For instance, those that involve > > TTY port->lock spin_lock and UART port->lock spin_lock. > > The tty layer code there is not re-entrant. Nor is it supposed to be Could be. But at least we have circular locking dependency in tty, see [1] for more details: tty_port->lock => uart_port->lock CPU0 tty spin_lock(&tty_port->lock) printk() call_console_drivers() foo_console_write() spin_lock(&uart_port->lock) Whereas we normally have uart_port->lock => tty_port->lock CPU1 IRQ foo_console_handle_IRQ() spin_lock(&uart_port->lock) tty spin_lock(&tty_port->lock) If we switch to printk_safe when we take tty_port->lock then we remove the printk->uart_port chain from the picture. > > So the idea of this patch set is to take tty_port->lock and > > uart_port->lock from printk_safe context and to eliminate some > > of non-recursive printk() deadlocks - the ones that don't start > > in printk(), but involve console related locks and thus eventually > > deadlock us in printk(). For this purpose the patch set introduces > > several helper macros: > > I don't see how this helps - if you recurse into the uart code you are > still hitting the paths that are unsafe when re-entered. All you've done > is messed up a pile of locking code on critical performance paths. > > As it stands I think it's a bad idea. The only new thing is that we inc/dec per-CPU printk context variable when we lock/unlock tty/uart port lock: printk_safe_enter() -> this_cpu_inc(printk_context); printk_safe_exit() -> this_cpu_dec(printk_context); How does this help? Suppose we have the following IRQ foo_console_handle_IRQ() spin_lock(&uart_port->lock) uart_write_wakeup() tty_port_tty_wakeup() tty_port_default_wakeup() printk() call_console_drivers() foo_console_write() spin_lock(&uart_port->lock) << deadlock If we take uart_port lock from printk_safe context, we remove the printk->call_console_drivers->foo_console_write->spin_lock chain. Because printk() output will endup in a per-CPU buffer, which will be flushed later from irq_work. So the whole thing becomes: IRQ foo_console_handle_IRQ() printk_safe_enter() spin_lock(&uart_port->lock) uart_write_wakeup() tty_port_tty_wakeup() tty_port_default_wakeup() printk() << we don't re-enter foo_console_driver << from printk() anymore printk_safe_log_store() irq_work_queue spin_unlock(&uart_port->lock) printk_safe_exit() iret #flush per-CPU buffer IRQ printk_safe_flush_buffer() vprintk_deferred() > > Of course, TTY and UART port spin_locks are not the only locks that > > we can deadlock on. So this patch set does not address all deadlock > > scenarios, it just makes a small step forward. > > > > Any opinions? > > The cure is worse than the disease. Because of this_cpu_inc(printk_context) / this_cpu_dec(printk_context)? May be. That's why I put RFC :) > The only case that's worth looking at is the direct polled console code > paths. The moment you touch the other layers you add essentially never > needed code to hot paths. > > Given printk nowdays is already somewhat unreliable with all the perf > related changes, and we have other good debug tools I think it would be > far cleaner to have some kind of > > > if (spin_trylock(...)) { > console_defer(buffer); > return; > } > > helper layer in the printk/console logic, at least for the non panic/oops > cases. spin_trylock() in every ->foo_console_write() callback? This still will not address the reported deadlock [1]. [1] lkml.kernel.org/r/d557e7056e1c7...@google.com -ss
Re: [RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
> It doesn't come as a surprise that recursive printk() calls are not the > only way for us to deadlock in printk() and we still have a whole bunch > of other printk() deadlock scenarios. For instance, those that involve > TTY port->lock spin_lock and UART port->lock spin_lock. The tty layer code there is not re-entrant. Nor is it supposed to be > So the idea of this patch set is to take tty_port->lock and > uart_port->lock from printk_safe context and to eliminate some > of non-recursive printk() deadlocks - the ones that don't start > in printk(), but involve console related locks and thus eventually > deadlock us in printk(). For this purpose the patch set introduces > several helper macros: I don't see how this helps - if you recurse into the uart code you are still hitting the paths that are unsafe when re-entered. All you've done is messed up a pile of locking code on critical performance paths. As it stands I think it's a bad idea. > Of course, TTY and UART port spin_locks are not the only locks that > we can deadlock on. So this patch set does not address all deadlock > scenarios, it just makes a small step forward. > > Any opinions? The cure is worse than the disease. The only case that's worth looking at is the direct polled console code paths. The moment you touch the other layers you add essentially never needed code to hot paths. Given printk nowdays is already somewhat unreliable with all the perf related changes, and we have other good debug tools I think it would be far cleaner to have some kind of if (spin_trylock(...)) { console_defer(buffer); return; } helper layer in the printk/console logic, at least for the non panic/oops cases. Alan
[RFC][PATCH 0/6] Use printk_safe context for TTY and UART port locks
Hello, RFC printk_safe buffers help us to avoid printk() deadlocks that are caused by recursive printk() calls, e.g. printk() vprintk_emit() spin_lock(&logbuf_lock) vsnprintf() format_decode() warn_slowpath_fmt() vprintk_emit() spin_lock(&logbuf_lock) << deadlock It doesn't come as a surprise that recursive printk() calls are not the only way for us to deadlock in printk() and we still have a whole bunch of other printk() deadlock scenarios. For instance, those that involve TTY port->lock spin_lock and UART port->lock spin_lock. syzbot recently hit one of such scenarios [1]: CPU0 CPU1 tty_ioctl() spin_lock(&tty_port->lock) IRQ kmalloc() foo_console_handle_IRQ() printk() spin_lock(&uart_port->lock) call_console_drivers() tty_wakeup() foo_console_driver() spin_lock(&uart_port->lock) spin_lock(&tty_port->lock) So the idea of this patch set is to take tty_port->lock and uart_port->lock from printk_safe context and to eliminate some of non-recursive printk() deadlocks - the ones that don't start in printk(), but involve console related locks and thus eventually deadlock us in printk(). For this purpose the patch set introduces several helper macros: - uart_port_lock_irq() / uart_port_unlock_irq() uart_port_lock_irqsave() / uart_port_unlock_irqrestore() To lock/unlock uart_port->lock spin_lock from printk_safe context. And - tty_port_lock_irq() / tty_port_unlock_irq() tty_port_lock_irqsave() / tty_port_unlock_irqrestore() To lock/unlock tty_port->lock spin_lock from printk_safe context. I converted tty/pty/serial_core to use those helper macros. Now, the boring part is that all serial console drivers must be converted to use uart_port_lock macros. In this patch set I only modified the 8250 serial console [since this is RFC patch set], but if the patch set will not see a lot of opposition I can do so for the rest of serial consoles [or ask the maintainers to help me out]. The conversion is pretty simple. It would be great if we could "forbid" direct tty_port->lock and uart_port->lock manipulation [I don't know, rename them to ->__lock] and enforce uart_port_lock/tty_port_lock macros usage. There are some other cases [2] that we can handle with this patch set. For instance: IRQ spin_lock(&uart_port->lock) tty_wakeup() tty_port_default_wakeup() tty_port_tty_get() refcount_inc() WARN_ONCE() printk() call_console_drivers() foo_console_driver() spin_lock(&uart_port->lock) << deadlock Of course, TTY and UART port spin_locks are not the only locks that we can deadlock on. So this patch set does not address all deadlock scenarios, it just makes a small step forward. Any opinions? [1]: lkml.kernel.org/r/87008b056df8f...@google.com [2]: lkml.kernel.org/r/20180607051019.GA10406@jagdpanzerIV Sergey Senozhatsky (6): printk: move printk_safe macros to printk header tty: add tty_port locking helpers tty/pty: switch to tty_port helpers serial: add uart port locking helpers serial: switch to uart_port locking helpers tty: 8250: switch to uart_port locking helpers drivers/tty/pty.c | 4 +- drivers/tty/serial/8250/8250_core.c | 8 +- drivers/tty/serial/8250/8250_dma.c | 4 +- drivers/tty/serial/8250/8250_port.c | 74 +-- drivers/tty/serial/serial_core.c| 109 ++-- drivers/tty/tty_port.c | 38 +- include/linux/printk.h | 40 ++ include/linux/serial_core.h | 35 + include/linux/tty.h | 24 ++ kernel/printk/internal.h| 37 -- 10 files changed, 218 insertions(+), 155 deletions(-) -- 2.17.1