Konstantin Belousov <kostik...@gmail.com> wrote in <20130102174044.gb82...@kib.kiev.ua>:
ko> > I might take a closer look this evening and see if I can spot anything ko> > in the log, rick ko> > ps: I hope Alan and Kostik don't mind being added to the cc list. ko> ko> What I see in the log is that the lock cascade rooted in the thread ko> 100838, which owns system map mutex. I believe this prevents malloc(9) ko> from making a progress in other threads, which e.g. own the ZFS vnode ko> locks. As the result, the whole system wedged. ko> ko> Looking back at the thread 100838, we can see that it executes ko> smp_tlb_shootdown(). It is impossible to tell from the static dump, ko> is the appearance of the smp_tlb_shootdown() in the backtrace is ko> transient, or the thread is spinning there, waiting for other CPUs to ko> acknowledge the request. But, since the system wedged, most likely, ko> smp_tlb_shootdown spins. ko> ko> Taking this hypothesis, the situation can occur, most likely, due to ko> some other core running with the interrupts disabled. Inspection of the ko> backtraces of the processes running on all cores does not show any which ko> could legitimately own a spinlock or otherwise run with the interrupts ko> disabled. ko> ko> One thing you could try to do is to enable WITNESS for the spinlocks, ko> to try to catch the leaked spinlock. I very much doubt that this is ko> the case. ko> ko> Another thing to try is to switch the CPU idle method to something ko> else. Look at the machdep.idle* sysctls. It could be some CPU errata ko> which blocks wakeup due the interrupt in some conditions in C1 ? Thank you. It can take 1-2 weeks to reproduce this, so I set debug.witness.skipspin=0 and keeping machdep.idle acpi abd will see how it goes for a while. I will report again if I can get another freeze. -- Hiroki
pgppNW6a6Bds7.pgp
Description: PGP signature