On Sun, 21 Apr 2013, Borislav Petkov wrote: > + tglx. > > On Sun, Apr 21, 2013 at 01:38:33AM +0400, vita...@yourcmc.ru wrote: > > >>Stack trace picture is here: > > >>http://vmx.yourcmc.ru/var/pics/IMG_20130306_141045.jpg > > > > > >Vitaliy reported that his system crashes when suspending to disk. > > >This > > >was a regression from 3.2 to 3.7, and remains in 3.8. Some > > >details of > > >this system are in the bug log at <http://bugs.debian.org/700333>. > > > > > >The photo shows a BUG in hrtimer_interrupt() after making the > > >hibernation image and while resuming the non-boot CPUs. The HPET > > >interrupt handler was called immediately after it was registered > > >for CPU > > >2 (?), before the corresponding clock_event_device was registered. > > > > > >Seems like an obvious race condition, but then shouldn't the HPET > > >have > > >been stopped while the CPU was previously offlined? And it's strange > > >that this system apparently hits the race quite reliably. > > > > Anyone?
So what happens is, that the HPET seems to have an interrupt pending and this gets immediately fired, when the handler is installed. The core code does not remove the hpet->event_handler, so it calls into the hrtimer_interrupt where it hits the BUG and dies. With the patch below, the box should survive and we should see a "Spurious HPET timer interrupt on HPET timer..." entry in dmesg. That's a first workaround to confirm my theory. I'll look into the HPET code how we can avoid that at all. Thanks, tglx diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index b1600a6..0f0ce6e 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -323,6 +323,7 @@ static void tick_shutdown(unsigned int *cpup) */ dev->mode = CLOCK_EVT_MODE_UNUSED; clockevents_exchange_device(dev, NULL); + dev->event_handler = NULL; td->evtdev = NULL; } raw_spin_unlock_irqrestore(&tick_device_lock, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/