On Thu, 2012-09-06 at 14:03 -0700, Paul E. McKenney wrote: > Here are a few other ways that stalls can happen: > > o A CPU looping in an RCU read-side critical section.
For a minute? That's a bug. > > o A CPU looping with interrupts disabled. This condition can > result in RCU-sched and RCU-bh stalls. Also a bug. > > o A CPU looping with preemption disabled. This condition can > result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh > stalls. Bug as well. > > o A CPU looping with bottom halves disabled. This condition can > result in RCU-sched and RCU-bh stalls. Bug too. > > o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel > without invoking schedule(). Another bug. > > o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might > happen to preempt a low-priority task in the middle of an RCU > read-side critical section. This is especially damaging if > that low-priority task is not permitted to run on any other CPU, > in which case the next RCU grace period can never complete, which > will eventually cause the system to run out of memory and hang. > While the system is in the process of running itself out of > memory, you might see stall-warning messages. Buggy system. > > o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that > is running at a higher priority than the RCU softirq threads. > This will prevent RCU callbacks from ever being invoked, > and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent > RCU grace periods from ever completing. Either way, the > system will eventually run out of memory and hang. In the > CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning > messages. Not really a bug, but the developers need a spanking. > > o A hardware or software issue shuts off the scheduler-clock > interrupt on a CPU that is not in dyntick-idle mode. This > problem really has happened, and seems to be most likely to > result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels. Driving the bug. > > o A bug in the RCU implementation. Bug in the name. > > o A hardware failure. This is quite unlikely, but has occurred > at least once in real life. A CPU failed in a running system, > becoming unresponsive, but not causing an immediate crash. > This resulted in a series of RCU CPU stall warnings, eventually > leading the realization that the CPU had failed. Hardware bug. So, where's the "spurious RCU CPU stall warnings"? All these cases deserve a warning. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

