Re: frequent lockups in 3.18rc4

Thomas Gleixner Fri, 19 Dec 2014 17:07:15 -0800

On Fri, 19 Dec 2014, Chris Mason wrote:
> On Fri, Dec 19, 2014 at 6:22 PM, Thomas Gleixner <[email protected]> wrote:
> > But at the very end this would be detected by the runtime check of the
> > hrtimer interrupt, which does not trigger. And it would trigger at
> > some point as ALL cpus including CPU0 in that trace dump make
> > progress.
> 
> I'll admit that at some point we should be hitting one of the WARN or BUG_ON,
> but it's possible to thread that needle and corrupt the timer list, without
> hitting a warning (CPU 1 in my example has to enqueue last).  Once the rbtree
> is hosed, it can go forever.  Probably not the bug we're looking for, but
> still suspect in general.


I surely have a close look at that, but in that case we get out of
that state later on and I doubt that we have 

     A) a corruption of the rbtree
     B) a self healing of the rbtree afterwards

I doubt it, but who knows.

Though even if A & B would happen we would still get the 'hrtimer
interrupt took a gazillion of seconds' warning because CPU0 definitely
leaves the timer interrupt at some point otherwise we would not see
backtraces from usb, userspace and idle later on.

Thanks,

        tglx




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: frequent lockups in 3.18rc4

Reply via email to