On (03/31/17 15:33), Peter Zijlstra wrote: > On Fri, Mar 31, 2017 at 03:09:50PM +0200, Petr Mladek wrote: > > On Wed 2017-03-29 18:25:04, Sergey Senozhatsky wrote: > > > > if (waitqueue_active(&log_wait)) { > > > - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); > > > + set_bit(PRINTK_PENDING_WAKEUP, &printk_pending); > > > > We should add here a write barrier: > > > > /* > > * irq_work_queue() uses cmpxchg() and implies the memory > > * barrier only when the work is queued. An explicit barrier > > * is needed here to make sure that wake_up_klogd_work_func() > > * sees printk_pending set even when the work was already queued > > * because of an other pending event. > > */ > > smp_wmb(); > > > > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > > } > > > preempt_enable(); > > smp_mb__after_atomic() is probably better, because if you're not > ordering with the cmpxchg, you're ordering against a load done by > cmpxchg to see it doesn't need to do anything.
Petr and Peter, thanks for the review. can you educate me, what exactly is broken there? when called from console_unlock(), we have something as follows console_unlock() { for (;;) { spin_lock_irqsave(); ... spin_unlock_irqrestore(); ... } spin_unlock_irqrestore(); <<IRQs enabled>> if (wake_klogd) wake_up_klogd() { set_bit(PRINTK_PENDING_WAKEUP, &printk_pending); irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); } } we queue a per-CPU irq_work. given that by the time we execute wake_up_klogd() we have local IRQs enabled on that CPU. is it possible that we will have that CPU's irq_work still being queued? when called from printk_deferred(). I'm still trying to understand what scenario can cause the problem. so basically on that CPU we have a call into the scheduler/timer which ends up in printk_deferred()... and then we have console_unlock()->wake_up_klogd() //* local IRQs enabled but the irq_work is still queued *// and atop of it we have IRQ that executes that CPU's run_list and fails to see updated PRINTK_PENDING_WAKEUP bit, because wake_up_klogd() was called on already queued wake_up_klogd_work. is this the case? if so, can this race happen on the CPU? I don't object the barrier, I'm just trying to have a better understanding what's broken. sorry if I'm missing something very obvious. -ss