On Thu, 2012-09-06 at 14:03 -0700, Paul E. McKenney wrote:

> Here are a few other ways that stalls can happen:
> 
> o     A CPU looping in an RCU read-side critical section.

For a minute? That's a bug.

>       
> o     A CPU looping with interrupts disabled.  This condition can
>       result in RCU-sched and RCU-bh stalls.

Also a bug.

> 
> o     A CPU looping with preemption disabled.  This condition can
>       result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
>       stalls.

Bug as well.

> 
> o     A CPU looping with bottom halves disabled.  This condition can
>       result in RCU-sched and RCU-bh stalls.

Bug too.

> 
> o     For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
>       without invoking schedule().

Another bug.

> 
> o     A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
>       happen to preempt a low-priority task in the middle of an RCU
>       read-side critical section.   This is especially damaging if
>       that low-priority task is not permitted to run on any other CPU,
>       in which case the next RCU grace period can never complete, which
>       will eventually cause the system to run out of memory and hang.
>       While the system is in the process of running itself out of
>       memory, you might see stall-warning messages.

Buggy system.

> 
> o     A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
>       is running at a higher priority than the RCU softirq threads.
>       This will prevent RCU callbacks from ever being invoked,
>       and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
>       RCU grace periods from ever completing.  Either way, the
>       system will eventually run out of memory and hang.  In the
>       CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
>       messages.

Not really a bug, but the developers need a spanking.

> 
> o     A hardware or software issue shuts off the scheduler-clock
>       interrupt on a CPU that is not in dyntick-idle mode.  This
>       problem really has happened, and seems to be most likely to
>       result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.

Driving the bug.

> 
> o     A bug in the RCU implementation.

Bug in the name.

> 
> o     A hardware failure.  This is quite unlikely, but has occurred
>       at least once in real life.  A CPU failed in a running system,
>       becoming unresponsive, but not causing an immediate crash.
>       This resulted in a series of RCU CPU stall warnings, eventually
>       leading the realization that the CPU had failed.

Hardware bug.

So, where's the "spurious RCU CPU stall warnings"?

All these cases deserve a warning.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to