On 01/28/2016 07:42 AM, Sergey Senozhatsky wrote: > On (01/28/16 19:53), Sergey Senozhatsky wrote: >>> ah... silly me... you mean the first CPU that triggers the spin_dump() will >> ^^^ this, of course, is true for >> console_sem->lock and logbuf_lock >> only. >> >>> deadlock itself, so the rest of CPUs will see endless recursive >>> spin_lock()->spin_dump()->spin_lock()->spin_dump() calls? > [..] >>> Can you please update your bug description in the commit message? >>> It's the deadlock that is causing the recursion on other CPUs in the >>> first place. > > no, don't update anything. I was completely wrong. it's not a deadlock > that is the root cause here. > > even if at some level of recursion (nested printk calls) > spin_dump()->__spin_lock_debug()->arch_spin_trylock() acquires the > lock, it returns back with the spin lock unlocked anyway. > > vprintk_emit() > console_trylock() > spin_lock() > spin_dump() > vprintk_emit() > console_trylock() > spin_lock() > spin_dump() > vprintk_emit() > console_trylock() > spin_lock() << OK, got the lock finally
The problem is you have postulated a very shallow recursion. This looks much worse if this happens 1000 times, and probably won't recover to output anything. Additionally, what if the console_sem is simply corrupted? A livelock with no output ever is not very helpful. As I wrote earlier, I don't think this is the way to fix recursion problems with printk() [by eliding output]. Rather, a way to effectively determine a recursion is in progress, and _at a minimum_ guaranteeing that the recursive output will eventually be output should be the goal. Including dumb recursion like a console driver printing an error :/ Then, lockdep could remain enabled while calling console drivers. Regards, Peter Hurley > sem->count-- > spin_unlock() << unlock, return > arch_spin_lock() << got the lock, return > sem->count-- > spin_unlock() << unlock, return > arch_spin_lock() << got the lock, return > sem->count-- > spin_unlock() << unlock, return > > > ...um > > >> But I found there's a possiblity in the debug code *itself* to cause a >> lockup. > > please explain. > > -ss >

