Re: cond_resched() and RCU CPU stall warnings

Paul E. McKenney Mon, 17 Mar 2014 09:59:20 -0700

On Mon, Mar 17, 2014 at 11:13:00AM +0100, Peter Zijlstra wrote:
> On Sat, Mar 15, 2014 at 06:59:14PM -0700, Paul E. McKenney wrote:
> > So I have been tightening up rcutorture a bit over the past year.
> > The other day, I came across what looked like a great opportunity for
> > further tightening, namely the schedule() in rcu_torture_reader().
> > Why not turn this into a cond_resched(), speeding up the readers a bit
> > and placing more stress on RCU?
> > 
> > And boy does it increase stress!
> > 
> > Unfortunately, this increased stress sometimes shows up in the form of
> > lots of RCU CPU stall warnings.  These can appear when an instance of
> > rcu_torture_reader() gets a CPU to itself, in which case it won't ever
> > enter the scheduler, and RCU will never see a quiescent state from that
> > CPU, which means the grace period never ends.
> > 
> > So I am taking a more measured approach to cond_resched() in
> > rcu_torture_reader() for the moment.
> > 
> > But longer term, should cond_resched() imply a set of RCU
> > quiescent states?  One way to do this would be to add calls to
> > rcu_note_context_switch() in each of the various cond_resched() functions.
> > Easy change, but of course adds some overhead.  On the other hand,
> > there might be more than a few of the 500+ calls to cond_resched() that
> > expect that RCU CPU stalls will be prevented (to say nothing of
> > might_sleep() and cond_resched_lock()).
> > 
> > Thoughts?
> 
> I share Mike's concern. Some of those functions might be too expensive
> to do in the loops where we have the cond_resched()s. And while its only
> strictly required when nr_running==1, keying off off that seems
> unfortunate in that it makes things behave differently with a single
> running task.
> 
> I suppose your proposed per-cpu counter is the best option; even though
> its still an extra cacheline hit in cond_resched().
> 
> As to the other cond_resched() variants; they might be a little more
> tricky, eg. cond_resched_lock() would have you drop the lock in order to
> note the QS, etc.
> 
> So one thing that might make sense is to have something like
> rcu_should_qs() which will indicate RCUs need for a grace period end.
> Then we can augment the various should_resched()/spin_needbreak() etc.
> with that condition.
> 
> That also gets rid of the counter (or at least hides it in the
> implementation if RCU really can't do anything better).


I did code up a version having a per-CPU bitmask indicating
which flavors of RCU needed quiescent states, and only invoking
rcu_note_context_switch() if at least one of the flavors needed
a quiescent state.  This implementation ended up being more
complex, but worse, slowed down the fast path.  Hard to beat
__this_cpu_inc_return()...  Might be able to break even with a bit more
work, but on most real systems there is pretty much always a grace period
in flight anyway, so it does not appear to be worth it.

So how about the following?  Passes moderate rcutorture testing, no RCU
CPU stall warnings despite cond_resched() in rcu_torture_reader().

                                                        Thanx, Paul

------------------------------------------------------------------------

sched,rcu: Make cond_resched() report RCU quiescent states

Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states.  Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.

This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.

Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3cea28c64ebe..8d64878111ea 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -44,6 +44,7 @@
 #include <linux/debugobjects.h>
 #include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/percpu.h>
 #include <asm/barrier.h>
 
 extern int rcu_expedited; /* for sysctl */
@@ -287,6 +288,41 @@ bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || 
defined(CONFIG_SMP) */
 
 /*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+#define RCU_COND_RESCHED_LIM 256       /* ms vs. 100s of ms. */
+DECLARE_PER_CPU(int, rcu_cond_resched_count);
+void rcu_resched(void);
+
+/*
+ * Is it time to report RCU quiescent states?
+ *
+ * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
+ * increment some random CPU's count, and possibly also load the result from
+ * yet another CPU's count.  We might even clobber some other CPU's attempt
+ * to zero its counter.  This is all OK because the goal is not precision,
+ * but rather reasonable amortization of rcu_note_context_switch() overhead
+ * and extremely high probability of avoiding RCU CPU stall warnings.
+ * Note that this function has to be preempted in just the wrong place,
+ * many thousands of times in a row, for anything bad to happen.
+ */
+static inline bool rcu_should_resched(void)
+{
+       return __this_cpu_inc_return(rcu_cond_resched_count) >=
+              RCU_COND_RESCHED_LIM;
+}
+
+/*
+ * Report quiscent states to RCU if it is time to do so.
+ */
+static inline void rcu_cond_resched(void)
+{
+       if (unlikely(rcu_should_resched()))
+               rcu_resched();
+}
+
+/*
  * Infrastructure to implement the synchronize_() primitives in
  * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
  */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c0a9b0af469..30eb6bb52be6 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -337,4 +337,22 @@ static int __init check_cpu_stall_init(void)
 }
 early_initcall(check_cpu_stall_init);
 
+/*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+DEFINE_PER_CPU(int, rcu_cond_resched_count);
+
+/*
+ * Report a set of RCU quiescent states, for use by cond_resched()
+ * and friends.  Out of line due to being called infrequently.
+ */
+void rcu_resched(void)
+{
+       preempt_disable();
+       __this_cpu_write(rcu_cond_resched_count, 0);
+       rcu_note_context_switch(smp_processor_id());
+       preempt_enable();
+}
+
 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131ef6aab..b5c942a5f7ae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4071,6 +4071,7 @@ static void __cond_resched(void)
 
 int __sched _cond_resched(void)
 {
+       rcu_cond_resched();
        if (should_resched()) {
                __cond_resched();
                return 1;
@@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int __cond_resched_lock(spinlock_t *lock)
 {
+       bool need_rcu_resched = rcu_should_resched();
        int resched = should_resched();
        int ret = 0;
 
@@ -4098,6 +4100,8 @@ int __cond_resched_lock(spinlock_t *lock)
                spin_unlock(lock);
                if (resched)
                        __cond_resched();
+               else if (unlikely(need_rcu_resched))
+                       rcu_resched();
                else
                        cpu_relax();
                ret = 1;
@@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
 {
        BUG_ON(!in_softirq());
 
+       rcu_cond_resched();
        if (should_resched()) {
                local_bh_enable();
                __cond_resched();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cond_resched() and RCU CPU stall warnings

Reply via email to