Re: Regression in dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") [Was: Regression in fd5f7cde1b85 ("...")]
On (09/26/19 10:58), Petr Mladek wrote: [..] > > - spin_lock(>port.lock); > > - > > + uart_port_lock_irqsave(>port, flags); > > uart_port_lock_irqsave() does not exist. ... Oh. Good catch! Apparently I still carry around my patch set which added printk_safe to TTY/UART locking API. > Instead the current users do: > > spin_lock_irqsave(>lock, flags); Right. [..] > I like this approach. It allows to remove hacks with locks. [..] > Or I would keep the locking as is and add some API > just for the sysrq handling: > > >int uart_store_sysrq_char(struct uart_port *port, unsigned int ch); >unsigned int uart_get_sysrq_char(struct uart_port *port); Looks good. We also probably can remove struct uart_port's ->sysrq member and clean up locking in drivers' ->write() callbacks: if (sport->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); Because this ->sysrq branch makes driver completely lockless globally, for all CPUs, not only for sysrq-CPU. > And use it the following way: > > int handle_irq() > { > unsined int sysrq, sysrq_ch; > > spin_lock(>lock); > [...] > sysrq = uart_store_sysrq_char(port, ch); > if (!sysrq) > [...] > [...] > > out: > sysrq_ch = uart_get_sysrq_char(port); > spin_unlock(>lock); > > if (sysrq_ch) > handle_sysrq(sysrq_ch); > } Looks good. -ss
Re: Regression in dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") [Was: Regression in fd5f7cde1b85 ("...")]
On Wed 2019-09-18 16:52:52, Sergey Senozhatsky wrote: > On (09/18/19 09:11), Uwe Kleine-König wrote: > > I rechecked and indeed fd5f7cde1b85's parent has the problem, too, so I > > did a mistake during my bisection :-| > > > > Redoing the bisection (a bit quicker this time) points to > > > > dbdda842fe96 ("printk: Add console owner and waiter logic to load balance > > console writes") > > > > Sorry for the confusion. > > No worries! > > [..] > > > So I'd say that lockdep is correct, but there are several hacks which > > > prevent actual deadlock. > > The basic idea is to handle sysrq out of port->lock. Great idea! > I didn't test it all (not even sure if it compiles). > > --- > drivers/tty/serial/imx.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c > index 87c58f9f6390..f0dd807b52df 100644 > --- a/drivers/tty/serial/imx.c > +++ b/drivers/tty/serial/imx.c > @@ -731,9 +731,9 @@ static irqreturn_t imx_uart_rxint(int irq, void *dev_id) > struct imx_port *sport = dev_id; > unsigned int rx, flg, ignored = 0; > struct tty_port *port = >port.state->port; > + unsigned long flags; > > - spin_lock(>port.lock); > - > + uart_port_lock_irqsave(>port, flags); uart_port_lock_irqsave() does not exist. Instead the current users do: spin_lock_irqsave(>lock, flags); > while (imx_uart_readl(sport, USR2) & USR2_RDR) { > u32 usr2; > > @@ -749,8 +749,8 @@ static irqreturn_t imx_uart_rxint(int irq, void *dev_id) > continue; > } > > - if (uart_handle_sysrq_char(>port, (unsigned char)rx)) > - continue; > + if (uart_prepare_sysrq_char(>port, (unsigned char)rx)) > + break; > > if (unlikely(rx & URXD_ERR)) { > if (rx & URXD_BRK) > @@ -792,7 +792,7 @@ static irqreturn_t imx_uart_rxint(int irq, void *dev_id) > } > > out: > - spin_unlock(>port.lock); > + uart_unlock_and_check_sysrq(>port, flags); This API has been introduced for exactly this reason. See the commit 6e1935819db0c91ce4a5af ("serial: core: Allow processing sysrq at port unlock time"). I like this approach. It allows to remove hacks with locks. Well, Sergey's patch is nice example that the API is a bit confusing. I would either make it symmetric and make a variant without saving irq flags: uart_lock(port); uart_unlock_and_handle_sysrq(port); uart_lock_irqsave(port, flags); uart_unlock_irqrestore_and_handle_sysrq(port); Or I would keep the locking as is and add some API just for the sysrq handling: int uart_store_sysrq_char(struct uart_port *port, unsigned int ch); unsigned int uart_get_sysrq_char(struct uart_port *port); And use it the following way: int handle_irq() { unsined int sysrq, sysrq_ch; spin_lock(>lock); [...] sysrq = uart_store_sysrq_char(port, ch); if (!sysrq) [...] [...] out: sysrq_ch = uart_get_sysrq_char(port); spin_unlock(>lock); if (sysrq_ch) handle_sysrq(sysrq_ch); } I prefer the 2nd option. It is more code. But it is more self explanatory. What do you think? Best Regards, Petr
Re: Regression in dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") [Was: Regression in fd5f7cde1b85 ("...")]
On (09/18/19 09:11), Uwe Kleine-König wrote: > I rechecked and indeed fd5f7cde1b85's parent has the problem, too, so I > did a mistake during my bisection :-| > > Redoing the bisection (a bit quicker this time) points to > > dbdda842fe96 ("printk: Add console owner and waiter logic to load balance > console writes") > > Sorry for the confusion. No worries! [..] > > So I'd say that lockdep is correct, but there are several hacks which > > prevent actual deadlock. > > Just to make sure, I got you right: With the way lockdep works it is > right to assume there is a problem, but in fact there isn't? I'd probably say so... Unless I'm missing something. sysrq-over-serial is handled from the serial driver's IRQ handler, under serial driver's port->lock. sysrq handling calls printk(), which takes console_sem/owner and re-enters the serial driver via ->write() callback. So lockdep sees a reverse locking pattern: port->lock goes before console_sem/owner, which is not the usual order. > This is IMHO unfortunate because such false positives reduces the > usefulness of lockdep considerably. :-| I agree. port->sysrq state is global to uart port. IOW, if CPUA sets port->sysrq then all printk->write() paths (from any other CPU) become lockless. This makes me wonder is we really need to hold port->lock for uart_handle_sysrq_char(). I sort of doubt it... Can you try the following patch? It's against linux-next, I guess you can backport to your kernel. The basic idea is to handle sysrq out of port->lock. I didn't test it all (not even sure if it compiles). --- drivers/tty/serial/imx.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index 87c58f9f6390..f0dd807b52df 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -731,9 +731,9 @@ static irqreturn_t imx_uart_rxint(int irq, void *dev_id) struct imx_port *sport = dev_id; unsigned int rx, flg, ignored = 0; struct tty_port *port = >port.state->port; + unsigned long flags; - spin_lock(>port.lock); - + uart_port_lock_irqsave(>port, flags); while (imx_uart_readl(sport, USR2) & USR2_RDR) { u32 usr2; @@ -749,8 +749,8 @@ static irqreturn_t imx_uart_rxint(int irq, void *dev_id) continue; } - if (uart_handle_sysrq_char(>port, (unsigned char)rx)) - continue; + if (uart_prepare_sysrq_char(>port, (unsigned char)rx)) + break; if (unlikely(rx & URXD_ERR)) { if (rx & URXD_BRK) @@ -792,7 +792,7 @@ static irqreturn_t imx_uart_rxint(int irq, void *dev_id) } out: - spin_unlock(>port.lock); + uart_unlock_and_check_sysrq(>port, flags); tty_flip_buffer_push(port); return IRQ_HANDLED; }
Regression in dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") [Was: Regression in fd5f7cde1b85 ("...")]
Hello Sergey, On Wed, Sep 18, 2019 at 10:30:32AM +0900, Sergey Senozhatsky wrote: > On (09/17/19 16:10), Uwe Kleine-König wrote: > > Hello, > > > > Today it saw sysrq on an UART driven by drivers/tty/serial/imx.c report > > a lockdep issue. Bisecting pointed to > > > > fd5f7cde1b85 ("printk: Never set console_may_schedule in > > console_trylock()") > > Hmmm... > > I don't see how this patch can affect anything. It simply > disables preemption in printk(). I rechecked and indeed fd5f7cde1b85's parent has the problem, too, so I did a mistake during my bisection :-| Redoing the bisection (a bit quicker this time) points to dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") Sorry for the confusion. > > When I type t I get: > > > > [ 87.940104] sysrq: SysRq : This sysrq operation is disabled. > > [ 87.948752] > > [ 87.948772] == > > [ 87.948787] WARNING: possible circular locking dependency detected > > [ 87.948798] 4.14.0-12954-gfd5f7cde1b85 #26 Not tainted > > [ 87.948813] -- > > [ 87.948822] swapper/0 is trying to acquire lock: > > [ 87.948829] (console_owner){-...}, at: [] > > console_unlock+0x110/0x598 > > [ 87.948861] > > [ 87.948869] but task is already holding lock: > > [ 87.948874] (_lock_key){-.-.}, at: [] > > imx_rxint+0x2c/0x290 > > [ 87.948902] > > [ 87.948911] which lock already depends on the new lock. > > [ 87.948917] > > [ 87.948923] > > [ 87.948932] the existing dependency chain (in reverse order) is: > > [ 87.948938] > > [ 87.948943] -> #1 (_lock_key){-.-.}: > > [ 87.948975]_raw_spin_lock_irqsave+0x5c/0x70 > > [ 87.948983]imx_console_write+0x138/0x15c > > [ 87.948991]console_unlock+0x204/0x598 > > [ 87.949000]register_console+0x21c/0x3e8 > > [ 87.949008]uart_add_one_port+0x3e4/0x4dc > > [ 87.949019]platform_drv_probe+0x3c/0x78 > > [ 87.949027]driver_probe_device+0x25c/0x47c > > [ 87.949035]__driver_attach+0xec/0x114 > > [ 87.949044]bus_for_each_dev+0x80/0xb0 > > [ 87.949054]bus_add_driver+0x1d4/0x264 > > [ 87.949062]driver_register+0x80/0xfc > > [ 87.949069]imx_serial_init+0x28/0x48 > > [ 87.949078]do_one_initcall+0x44/0x18c > > [ 87.949087]kernel_init_freeable+0x11c/0x1cc > > [ 87.949095]kernel_init+0x10/0x114 > > [ 87.949103]ret_from_fork+0x14/0x30 > > This is "normal" locking path > > console_sem -> port->lock > > printk() >lock console_sem > imx_console_write() > lock port->lock > > > [ 87.949113] -> #0 (console_owner){-...}: > > [ 87.949145]lock_acquire+0x100/0x23c > > [ 87.949154]console_unlock+0x1a4/0x598 > > [ 87.949162]vprintk_emit+0x1a4/0x45c > > [ 87.949171]vprintk_default+0x28/0x30 > > [ 87.949180]printk+0x28/0x38 > > [ 87.949189]__handle_sysrq+0x1c4/0x244 > > [ 87.949196]imx_rxint+0x258/0x290 > > [ 87.949206]imx_int+0x170/0x178 > > [ 87.949216]__handle_irq_event_percpu+0x78/0x418 > > [ 87.949225]handle_irq_event_percpu+0x24/0x6c > > [ 87.949233]handle_irq_event+0x40/0x64 > > [ 87.949242]handle_level_irq+0xb4/0x138 > > [ 87.949252]generic_handle_irq+0x28/0x3c > > [ 87.949261]__handle_domain_irq+0x50/0xb0 > > [ 87.949269]avic_handle_irq+0x3c/0x5c > > [ 87.949277]__irq_svc+0x6c/0xa4 > > [ 87.949287]arch_cpu_idle+0x30/0x40 > > [ 87.949297]arch_cpu_idle+0x30/0x40 > > [ 87.949305]do_idle+0xa0/0x104 > > [ 87.949313]cpu_startup_entry+0x14/0x18 > > [ 87.949323]start_kernel+0x30c/0x368 > > This one is a "reverse" locking path... > > port->lock -> console_sem > > There is more to it: > > imxint() > lock port->lock >uart_handle_sysrq_char() > handle_sysrq() > printk() > lock conosole_sem >imx_console_write() > lock port->lock [boom] > > This path re-enters serial driver. But it doesn't deadlock, because > uart_handle_sysrq_char() sets a special flag port->sysrq, and serial > consoles are expected to make sure that they don't lock port->lock > in this case. Otherwise we will kill the system: > > void serial_console_write(...) > { > ... > if (sport->port.sysrq) > locked = 0; > else if (oops_in_progress) > locked = spin_trylock_irqsave(>port.lock, flags); > else > spin_lock_irqsave(>port.lock, flags); > ... > } > > So I'd say that lockdep is correct, but there are several hacks which > prevent actual deadlock. Just to make sure, I got you right: With the way lockdep works it is right to assume there is