On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker <fweis...@gmail.com> wrote: > 2013/1/11 Colin Cross <ccr...@android.com>: >> Emulate NMIs on systems where they are not available by using timer >> interrupts on other cpus. Each cpu will use its softlockup hrtimer >> to check that the next cpu is processing hrtimer interrupts by >> verifying that a counter is increasing. >> >> This patch is useful on systems where the hardlockup detector is not >> available due to a lack of NMIs, for example most ARM SoCs. >> Without this patch any cpu stuck with interrupts disabled can >> cause a hardware watchdog reset with no debugging information, >> but with this patch the kernel can detect the lockup and panic, >> which can result in useful debugging info. >> >> Signed-off-by: Colin Cross <ccr...@android.com> > > I believe this is pretty much what the RCU stall detector does > already: checks for other CPUs being responsive. The only difference > is on how it checks that. For RCU it's about checking for CPUs > reporting quiescent states when requested to do so. In your case it's > about ensuring the hrtimer interrupt is well handled. > > One thing you can do is to enqueue an RCU callback (cal_rcu()) every > minute so you can force other CPUs to report quiescent states > periodically and thus check for lockups.
That's a good point, I'll take a look at using that. A minute is too long, some SoCs have maximum HW watchdog periods of under 30 seconds, but a call_rcu every 10-20 seconds might be sufficient. > Now you'll face the same problem in the end: if you don't have NMIs, > you won't have a very useful report. Yes, but its still better than a silent reset. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/