On 12-07-16, 16:19, Viresh Kumar wrote: > Okay, we have tracked this BUG and its really interesting. > > I hacked the platform's serial driver to implement a putchar() routine > that simply writes to the FIFO in polling mode, that helped us in > tracing on where we are going wrong. > > The problem is that we are running asynchronous printks and we call > wake_up_process() from the last running CPU which has disabled > interrupts. That takes us to: try_to_wake_up(). > > In our case the CPU gets deadlocked on this line in try_to_wake_up(). > > raw_spin_lock_irqsave(&p->pi_lock, flags); > > I will explain how: > > The try_to_wake_up() function takes us through the scheduler code (RT > sched), to the hrtimer code, where we eventually call ktime_get() (for > the MONOTONIC clock used for hrtimer). And this function has this: > > WARN_ON(timekeeping_suspended); > > This starts another printk while we are in the middle of > wake_up_process() and the CPU tries to take the above lock again and > gets stuck there :) > > This doesn't happen everytime because we don't always call ktime_get() > and it is called only if hrtimer_active() returns false. > > This happened because of a WARN_ON() but it can happen anyway. Think > about this case: > > - offline all CPUs, except 0 > - call any routine that prints messages after disabling interrupts, > etc. > - If any of the function within wake_up_process() does a print, we are > screwed. > > So the thing is that we can't really call wake_up_process() in cases > where the last CPU disables interrupts. And that's why my fixup patch > (which moved to synchronous prints after suspend) really works.
Actually, any printk done from wake_up_process() will hit this, even if all the others CPUs are up as well :) Its only BUG_ON() which has special handling in printk, and so we print that safely. -- viresh