On Tue, Jan 20, 2015 at 11:09:19AM +0800, Zhang Zhen wrote: > > > Of course back then, touch_nmi_watchdog touched all cpus. So a problem > > like this was masked. I believe this upstream commit 62572e29bc53, solved > > the problem. > > Thanks for your suggestion. > > Commit 62572e29bc53 changed the semantics of touch_nmi_watchdog and make it > only touch local cpu not every one. > But watchdog_nmi_touch = true only guarantee no hard lockup check on this cpu. > > Commit 62572e29bc53 didn't changed the semantics of touch_softlockup_watchdog.
Ah, yes. I reviewed the commit to quickly yesterday. I thought touch_softlockup_watchdog was called on every cpu and that commit changed it to the local cpu. But that was incorrect. > > > > You can apply that commit and see if you if you get both RCU stall > > messages _and_ softlockup messages. I believe that is what you were > > expecting, correct? > > > Correct, i expect i can get both RCU stall messages _and_ softlockup > messages. > I applied that commit, and i only got RCU stall messages. Hmm, I believe the act of printing to the console calls touch_nmi_watchdog which calls touch_softlockup_watchdog. I think that is the problem there. This may not cause other problems but what happens if you comment out the 'touch_softlockup_watchdog' from the touch_nmi_watchdog function like below (based on latest upstream cb59670870)? The idea is that console printing for that cpu won't reset the softlockup detector. Again other bad things might happen and this patch may not be a good final solution, but it can help give me a clue about what is going on. Cheers, Don diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 70bf118..833c015 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -209,7 +209,7 @@ void touch_nmi_watchdog(void) * going off. */ raw_cpu_write(watchdog_nmi_touch, true); - touch_softlockup_watchdog(); + //touch_softlockup_watchdog(); } EXPORT_SYMBOL(touch_nmi_watchdog); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/