Re: cond_resched() and RCU CPU stall warnings
On Tue, Mar 18, 2014 at 02:45:34PM +0100, Peter Zijlstra wrote: > > sched,rcu: Make cond_resched() report RCU quiescent states > > > > Given a CPU running a loop containing cond_resched(), with no > > other tasks runnable on that CPU, RCU will eventually report RCU > > CPU stall warnings due to lack of quiescent states. Fortunately, > > every call to cond_resched() is a perfectly good quiescent state. > > Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight > > for cond_resched(), especially given the need to disable preemption, > > and, for RCU-preempt, interrupts as well. > > > > This commit therefore maintains a per-CPU counter that causes > > cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call > > rcu_note_context_switch(), but only about once per 256 invocations. > > This ratio was chosen in keeping with the relative time constants of > > RCU grace periods. > > > > Signed-off-by: Paul E. McKenney > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 3cea28c64ebe..8d64878111ea 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -44,6 +44,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > extern int rcu_expedited; /* for sysctl */ > > @@ -287,6 +288,41 @@ bool __rcu_is_watching(void); > > #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || > > defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ > > > > /* > > + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. > > + */ > > + > > +#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ > > +DECLARE_PER_CPU(int, rcu_cond_resched_count); > > +void rcu_resched(void); > > + > > +/* > > + * Is it time to report RCU quiescent states? > > + * > > + * Note unsynchronized access to rcu_cond_resched_count. Yes, we might > > + * increment some random CPU's count, and possibly also load the result > > from > > + * yet another CPU's count. We might even clobber some other CPU's attempt > > + * to zero its counter. This is all OK because the goal is not precision, > > + * but rather reasonable amortization of rcu_note_context_switch() overhead > > + * and extremely high probability of avoiding RCU CPU stall warnings. > > + * Note that this function has to be preempted in just the wrong place, > > + * many thousands of times in a row, for anything bad to happen. > > + */ > > +static inline bool rcu_should_resched(void) > > +{ > > + return __this_cpu_inc_return(rcu_cond_resched_count) >= > > + RCU_COND_RESCHED_LIM; > > +} > > + > > +/* > > + * Report quiscent states to RCU if it is time to do so. > > + */ > > +static inline void rcu_cond_resched(void) > > +{ > > + if (unlikely(rcu_should_resched())) > > + rcu_resched(); > > +} > > + > > +/* > > * Infrastructure to implement the synchronize_() primitives in > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > */ > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > > index 4c0a9b0af469..ed7a0d72562c 100644 > > --- a/kernel/rcu/update.c > > +++ b/kernel/rcu/update.c > > @@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void) > > early_initcall(check_cpu_stall_init); > > > > #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ > > + > > +/* > > + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. > > + */ > > + > > +DEFINE_PER_CPU(int, rcu_cond_resched_count); > > + > > +/* > > + * Report a set of RCU quiescent states, for use by cond_resched() > > + * and friends. Out of line due to being called infrequently. > > + */ > > +void rcu_resched(void) > > +{ > > + preempt_disable(); > > + __this_cpu_write(rcu_cond_resched_count, 0); > > Somehow I've got the urge to write: > > __this_cpu_add(rcu_cond_resched_count, -RCU_COND_RESCHED_LIM); > > But the immediate write is saver, and I suppose it doesn't matter at > all. My concern is that an unfortunate series of preemptions might cause a victim CPU's counter to be driven to a large negative number, which could then result in RCU CPU stall warnings given a subsequent time with no schedules. In contrast, the series of preemptions that result in large positive numbers result in context switches during that time, which keeps the stall warnings at bay. > > + rcu_note_context_switch(smp_processor_id()); > > + preempt_enable(); > > +} > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index b46131ef6aab..1f53e8871dd2 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4071,6 +4071,7 @@ static void __cond_resched(void) > > > > int __sched _cond_resched(void) > > { > > + rcu_cond_resched(); > > if (should_resched()) { > > __cond_resched(); > > return 1; > > @@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched); > > */ > > int __cond_resched_lock(spinlock_t *lock) > > { > > + bool need_rcu_resched = rcu_should_resched(); > > int resched = should_resched()
Re: cond_resched() and RCU CPU stall warnings
> sched,rcu: Make cond_resched() report RCU quiescent states > > Given a CPU running a loop containing cond_resched(), with no > other tasks runnable on that CPU, RCU will eventually report RCU > CPU stall warnings due to lack of quiescent states. Fortunately, > every call to cond_resched() is a perfectly good quiescent state. > Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight > for cond_resched(), especially given the need to disable preemption, > and, for RCU-preempt, interrupts as well. > > This commit therefore maintains a per-CPU counter that causes > cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call > rcu_note_context_switch(), but only about once per 256 invocations. > This ratio was chosen in keeping with the relative time constants of > RCU grace periods. > > Signed-off-by: Paul E. McKenney > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 3cea28c64ebe..8d64878111ea 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -44,6 +44,7 @@ > #include > #include > #include > +#include > #include > > extern int rcu_expedited; /* for sysctl */ > @@ -287,6 +288,41 @@ bool __rcu_is_watching(void); > #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) > || defined(CONFIG_SMP) */ > > /* > + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. > + */ > + > +#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ > +DECLARE_PER_CPU(int, rcu_cond_resched_count); > +void rcu_resched(void); > + > +/* > + * Is it time to report RCU quiescent states? > + * > + * Note unsynchronized access to rcu_cond_resched_count. Yes, we might > + * increment some random CPU's count, and possibly also load the result from > + * yet another CPU's count. We might even clobber some other CPU's attempt > + * to zero its counter. This is all OK because the goal is not precision, > + * but rather reasonable amortization of rcu_note_context_switch() overhead > + * and extremely high probability of avoiding RCU CPU stall warnings. > + * Note that this function has to be preempted in just the wrong place, > + * many thousands of times in a row, for anything bad to happen. > + */ > +static inline bool rcu_should_resched(void) > +{ > + return __this_cpu_inc_return(rcu_cond_resched_count) >= > +RCU_COND_RESCHED_LIM; > +} > + > +/* > + * Report quiscent states to RCU if it is time to do so. > + */ > +static inline void rcu_cond_resched(void) > +{ > + if (unlikely(rcu_should_resched())) > + rcu_resched(); > +} > + > +/* > * Infrastructure to implement the synchronize_() primitives in > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > */ > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 4c0a9b0af469..ed7a0d72562c 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void) > early_initcall(check_cpu_stall_init); > > #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ > + > +/* > + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. > + */ > + > +DEFINE_PER_CPU(int, rcu_cond_resched_count); > + > +/* > + * Report a set of RCU quiescent states, for use by cond_resched() > + * and friends. Out of line due to being called infrequently. > + */ > +void rcu_resched(void) > +{ > + preempt_disable(); > + __this_cpu_write(rcu_cond_resched_count, 0); Somehow I've got the urge to write: __this_cpu_add(rcu_cond_resched_count, -RCU_COND_RESCHED_LIM); But the immediate write is saver, and I suppose it doesn't matter at all. > + rcu_note_context_switch(smp_processor_id()); > + preempt_enable(); > +} > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b46131ef6aab..1f53e8871dd2 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4071,6 +4071,7 @@ static void __cond_resched(void) > > int __sched _cond_resched(void) > { > + rcu_cond_resched(); > if (should_resched()) { > __cond_resched(); > return 1; > @@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched); > */ > int __cond_resched_lock(spinlock_t *lock) > { > + bool need_rcu_resched = rcu_should_resched(); > int resched = should_resched(); > int ret = 0; > I suppose you also need: - if (spin_needbreak(lock) || resched) + if (spin_needbreak(lock) || resched || need_rcu_resched) > @@ -4098,6 +4100,8 @@ int __cond_resched_lock(spinlock_t *lock) > spin_unlock(lock); > if (resched) > __cond_resched(); > + else if (unlikely(need_rcu_resched)) > + rcu_resched(); > else > cpu_relax(); > ret = 1; > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void) > { > BUG_ON(!in_softirq()); > > + rcu_cond_resched(); /* BH disabled OK, just recor
Re: cond_resched() and RCU CPU stall warnings
On Tue, Mar 18, 2014 at 09:51:18AM +0100, Peter Zijlstra wrote: > On Mon, Mar 17, 2014 at 07:17:06PM -0700, Paul E. McKenney wrote: > > On Mon, Mar 17, 2014 at 06:14:34PM +0100, Peter Zijlstra wrote: > > > On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote: > > > > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void) > > > > { > > > > BUG_ON(!in_softirq()); > > > > > > > > + rcu_cond_resched(); > > > > > > Don't you have to enable BHs before doing that? And if not; that needs a > > > comment for sure! :-) > > > > No need to enable BHs, just RCU marking quiescent states. But yes, > > the name does look a bit scary in that context, doesn't it? Added > > a comment, please see below for updated patch. > > The below seemed like an unrelated patch entirely :-) Well, it certainly avoids issues with BH... > > > > > > torture: Better summary diagnostics for build failures > > > > The reaction of kvm-recheck.sh is obscure at best, and easy to miss > > completely. This commit therefore prints "BUG: Build failed" in the > > summary at the end of a run. > > > > Signed-off-by: Paul E. McKenney How about this one instead? ;-) Thanx, Paul sched,rcu: Make cond_resched() report RCU quiescent states Given a CPU running a loop containing cond_resched(), with no other tasks runnable on that CPU, RCU will eventually report RCU CPU stall warnings due to lack of quiescent states. Fortunately, every call to cond_resched() is a perfectly good quiescent state. Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight for cond_resched(), especially given the need to disable preemption, and, for RCU-preempt, interrupts as well. This commit therefore maintains a per-CPU counter that causes cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call rcu_note_context_switch(), but only about once per 256 invocations. This ratio was chosen in keeping with the relative time constants of RCU grace periods. Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3cea28c64ebe..8d64878111ea 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -44,6 +44,7 @@ #include #include #include +#include #include extern int rcu_expedited; /* for sysctl */ @@ -287,6 +288,41 @@ bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ /* + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. + */ + +#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ +DECLARE_PER_CPU(int, rcu_cond_resched_count); +void rcu_resched(void); + +/* + * Is it time to report RCU quiescent states? + * + * Note unsynchronized access to rcu_cond_resched_count. Yes, we might + * increment some random CPU's count, and possibly also load the result from + * yet another CPU's count. We might even clobber some other CPU's attempt + * to zero its counter. This is all OK because the goal is not precision, + * but rather reasonable amortization of rcu_note_context_switch() overhead + * and extremely high probability of avoiding RCU CPU stall warnings. + * Note that this function has to be preempted in just the wrong place, + * many thousands of times in a row, for anything bad to happen. + */ +static inline bool rcu_should_resched(void) +{ + return __this_cpu_inc_return(rcu_cond_resched_count) >= + RCU_COND_RESCHED_LIM; +} + +/* + * Report quiscent states to RCU if it is time to do so. + */ +static inline void rcu_cond_resched(void) +{ + if (unlikely(rcu_should_resched())) + rcu_resched(); +} + +/* * Infrastructure to implement the synchronize_() primitives in * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. */ diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 4c0a9b0af469..ed7a0d72562c 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void) early_initcall(check_cpu_stall_init); #endif /* #ifdef CONFIG_RCU_STALL_COMMON */ + +/* + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. + */ + +DEFINE_PER_CPU(int, rcu_cond_resched_count); + +/* + * Report a set of RCU quiescent states, for use by cond_resched() + * and friends. Out of line due to being called infrequently. + */ +void rcu_resched(void) +{ + preempt_disable(); + __this_cpu_write(rcu_cond_resched_count, 0); + rcu_note_context_switch(smp_processor_id()); + preempt_enable(); +} diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b46131ef6aab..1f53e8871dd2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4071,6 +4071,7 @@ static void __cond_resched(void) int __s
Re: cond_resched() and RCU CPU stall warnings
On Mon, Mar 17, 2014 at 07:17:06PM -0700, Paul E. McKenney wrote: > On Mon, Mar 17, 2014 at 06:14:34PM +0100, Peter Zijlstra wrote: > > On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote: > > > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void) > > > { > > > BUG_ON(!in_softirq()); > > > > > > + rcu_cond_resched(); > > > > Don't you have to enable BHs before doing that? And if not; that needs a > > comment for sure! :-) > > No need to enable BHs, just RCU marking quiescent states. But yes, > the name does look a bit scary in that context, doesn't it? Added > a comment, please see below for updated patch. The below seemed like an unrelated patch entirely :-) > > > torture: Better summary diagnostics for build failures > > The reaction of kvm-recheck.sh is obscure at best, and easy to miss > completely. This commit therefore prints "BUG: Build failed" in the > summary at the end of a run. > > Signed-off-by: Paul E. McKenney -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Mon, Mar 17, 2014 at 06:14:34PM +0100, Peter Zijlstra wrote: > On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote: > > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void) > > { > > BUG_ON(!in_softirq()); > > > > + rcu_cond_resched(); > > Don't you have to enable BHs before doing that? And if not; that needs a > comment for sure! :-) No need to enable BHs, just RCU marking quiescent states. But yes, the name does look a bit scary in that context, doesn't it? Added a comment, please see below for updated patch. > > if (should_resched()) { > > local_bh_enable(); > > __cond_resched(); > > torture: Better summary diagnostics for build failures The reaction of kvm-recheck.sh is obscure at best, and easy to miss completely. This commit therefore prints "BUG: Build failed" in the summary at the end of a run. Signed-off-by: Paul E. McKenney diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh index 829186e19eb1..7f1ff1a8fc4b 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh @@ -35,7 +35,7 @@ configfile=`echo $i | sed -e 's/^.*\///'` ncs=`grep "Writes: Total:" $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* Total: //' -e 's/ .*$//'` if test -z "$ncs" then - echo $configfile + echo "$configfile ---" else title="$configfile --- $ncs acquisitions/releases" dur=`sed -e 's/^.* locktorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null` diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh index d75b1dc5ae53..307c4b95f325 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh @@ -35,7 +35,7 @@ configfile=`echo $i | sed -e 's/^.*\///'` ngps=`grep ver: $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* ver: //' -e 's/ .*$//'` if test -z "$ngps" then - echo $configfile + echo "$configfile ---" else title="$configfile --- $ngps grace periods" dur=`sed -e 's/^.* rcutorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null` diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh index 26d78b7eaccf..ee1f6cae3d70 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh @@ -25,6 +25,7 @@ # Authors: Paul E. McKenney PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH +. tools/testing/selftests/rcutorture/bin/functions.sh for rd in "$@" do firsttime=1 @@ -39,13 +40,24 @@ do fi TORTURE_SUITE="`cat $i/../TORTURE_SUITE`" kvm-recheck-${TORTURE_SUITE}.sh $i - configcheck.sh $i/.config $i/ConfigFragment - parse-build.sh $i/Make.out $configfile - parse-torture.sh $i/console.log $configfile - parse-console.sh $i/console.log $configfile - if test -r $i/Warnings + if test -f "$i/console.log" then - cat $i/Warnings + configcheck.sh $i/.config $i/ConfigFragment + parse-build.sh $i/Make.out $configfile + parse-torture.sh $i/console.log $configfile + parse-console.sh $i/console.log $configfile + if test -r $i/Warnings + then + cat $i/Warnings + fi + else + if test -f "$i/qemu-cmd" + then + print_bug qemu failed + else + print_bug Build failed + fi + echo " $i" fi done done diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 7a95f86cc85a..51c34a91a375 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -106,6 +106,7 @@ then fi else cp $builddir/Make*.out $resdir + cp $builddir/.config $resdir || : echo Build failed, not running KVM, see $resdir. if test -f $builddir.wait then -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote: > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void) > { > BUG_ON(!in_softirq()); > > + rcu_cond_resched(); Don't you have to enable BHs before doing that? And if not; that needs a comment for sure! :-) > if (should_resched()) { > local_bh_enable(); > __cond_resched(); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Mon, Mar 17, 2014 at 11:13:00AM +0100, Peter Zijlstra wrote: > On Sat, Mar 15, 2014 at 06:59:14PM -0700, Paul E. McKenney wrote: > > So I have been tightening up rcutorture a bit over the past year. > > The other day, I came across what looked like a great opportunity for > > further tightening, namely the schedule() in rcu_torture_reader(). > > Why not turn this into a cond_resched(), speeding up the readers a bit > > and placing more stress on RCU? > > > > And boy does it increase stress! > > > > Unfortunately, this increased stress sometimes shows up in the form of > > lots of RCU CPU stall warnings. These can appear when an instance of > > rcu_torture_reader() gets a CPU to itself, in which case it won't ever > > enter the scheduler, and RCU will never see a quiescent state from that > > CPU, which means the grace period never ends. > > > > So I am taking a more measured approach to cond_resched() in > > rcu_torture_reader() for the moment. > > > > But longer term, should cond_resched() imply a set of RCU > > quiescent states? One way to do this would be to add calls to > > rcu_note_context_switch() in each of the various cond_resched() functions. > > Easy change, but of course adds some overhead. On the other hand, > > there might be more than a few of the 500+ calls to cond_resched() that > > expect that RCU CPU stalls will be prevented (to say nothing of > > might_sleep() and cond_resched_lock()). > > > > Thoughts? > > I share Mike's concern. Some of those functions might be too expensive > to do in the loops where we have the cond_resched()s. And while its only > strictly required when nr_running==1, keying off off that seems > unfortunate in that it makes things behave differently with a single > running task. > > I suppose your proposed per-cpu counter is the best option; even though > its still an extra cacheline hit in cond_resched(). > > As to the other cond_resched() variants; they might be a little more > tricky, eg. cond_resched_lock() would have you drop the lock in order to > note the QS, etc. > > So one thing that might make sense is to have something like > rcu_should_qs() which will indicate RCUs need for a grace period end. > Then we can augment the various should_resched()/spin_needbreak() etc. > with that condition. > > That also gets rid of the counter (or at least hides it in the > implementation if RCU really can't do anything better). I did code up a version having a per-CPU bitmask indicating which flavors of RCU needed quiescent states, and only invoking rcu_note_context_switch() if at least one of the flavors needed a quiescent state. This implementation ended up being more complex, but worse, slowed down the fast path. Hard to beat __this_cpu_inc_return()... Might be able to break even with a bit more work, but on most real systems there is pretty much always a grace period in flight anyway, so it does not appear to be worth it. So how about the following? Passes moderate rcutorture testing, no RCU CPU stall warnings despite cond_resched() in rcu_torture_reader(). Thanx, Paul sched,rcu: Make cond_resched() report RCU quiescent states Given a CPU running a loop containing cond_resched(), with no other tasks runnable on that CPU, RCU will eventually report RCU CPU stall warnings due to lack of quiescent states. Fortunately, every call to cond_resched() is a perfectly good quiescent state. Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight for cond_resched(), especially given the need to disable preemption, and, for RCU-preempt, interrupts as well. This commit therefore maintains a per-CPU counter that causes cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call rcu_note_context_switch(), but only about once per 256 invocations. This ratio was chosen in keeping with the relative time constants of RCU grace periods. Signed-off-by: Paul E. McKenney diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3cea28c64ebe..8d64878111ea 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -44,6 +44,7 @@ #include #include #include +#include #include extern int rcu_expedited; /* for sysctl */ @@ -287,6 +288,41 @@ bool __rcu_is_watching(void); #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */ /* + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings. + */ + +#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */ +DECLARE_PER_CPU(int, rcu_cond_resched_count); +void rcu_resched(void); + +/* + * Is it time to report RCU quiescent states? + * + * Note unsynchronized access to rcu_cond_resched_count. Yes, we might + * increment some random CPU's count, and possibly also load the result from + * yet another CPU's count. We might even clobber some other CPU's attempt + * t
Re: cond_resched() and RCU CPU stall warnings
On Sat, Mar 15, 2014 at 06:59:14PM -0700, Paul E. McKenney wrote: > So I have been tightening up rcutorture a bit over the past year. > The other day, I came across what looked like a great opportunity for > further tightening, namely the schedule() in rcu_torture_reader(). > Why not turn this into a cond_resched(), speeding up the readers a bit > and placing more stress on RCU? > > And boy does it increase stress! > > Unfortunately, this increased stress sometimes shows up in the form of > lots of RCU CPU stall warnings. These can appear when an instance of > rcu_torture_reader() gets a CPU to itself, in which case it won't ever > enter the scheduler, and RCU will never see a quiescent state from that > CPU, which means the grace period never ends. > > So I am taking a more measured approach to cond_resched() in > rcu_torture_reader() for the moment. > > But longer term, should cond_resched() imply a set of RCU > quiescent states? One way to do this would be to add calls to > rcu_note_context_switch() in each of the various cond_resched() functions. > Easy change, but of course adds some overhead. On the other hand, > there might be more than a few of the 500+ calls to cond_resched() that > expect that RCU CPU stalls will be prevented (to say nothing of > might_sleep() and cond_resched_lock()). > > Thoughts? I share Mike's concern. Some of those functions might be too expensive to do in the loops where we have the cond_resched()s. And while its only strictly required when nr_running==1, keying off off that seems unfortunate in that it makes things behave differently with a single running task. I suppose your proposed per-cpu counter is the best option; even though its still an extra cacheline hit in cond_resched(). As to the other cond_resched() variants; they might be a little more tricky, eg. cond_resched_lock() would have you drop the lock in order to note the QS, etc. So one thing that might make sense is to have something like rcu_should_qs() which will indicate RCUs need for a grace period end. Then we can augment the various should_resched()/spin_needbreak() etc. with that condition. That also gets rid of the counter (or at least hides it in the implementation if RCU really can't do anything better). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Sat, 2014-03-15 at 23:25 -0700, Paul E. McKenney wrote: > On Sun, Mar 16, 2014 at 07:09:42AM +0100, Mike Galbraith wrote: > > Hm. Since you only care about the case where your task is solo, how > > about do racy checks, 100% accuracy isn't required is it? Seems you > > wouldn't want to unconditionally do that in tight loops. > > And indeed, my current workaround unconditionally does schedule() one > out of 256 loops. I would do something similar here, perhaps based > on per-CPU counters, perhaps even with racy accesses to avoid always > doing preempt_disable()/preempt_enable(). > > Or did you have something else in mind? Exactly what I meant, take a racy peek or two first. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Sun, Mar 16, 2014 at 07:14:15AM +0100, Mike Galbraith wrote: > On Sun, 2014-03-16 at 07:09 +0100, Mike Galbraith wrote: > > On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index b46131ef6aab..994d2b0fd0b2 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void) > > > __cond_resched(); > > > return 1; > > > } > > > + preempt_disable(); > > > + rcu_note_context_switch(smp_processor_id()); > > > + preempt_enable(); > > > return 0; > > > } > > > EXPORT_SYMBOL(_cond_resched); > > > > Hm. Since you only care about the case where your task is solo, how > > about do racy checks, 100% accuracy isn't required is it? Seems you > > wouldn't want to unconditionally do that in tight loops. > > (or do that in scheduler_tick()?) There are checks in scheduler_tick(), but they have to assume that if they interrupt kernel execution, they might have also interrupted an RCU read-side critical section. And in fact, the point of having cond_resched() (sometimes) do a quiescent state is to communicate with the existing checks invoked from scheduler_tick(). Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Sun, Mar 16, 2014 at 07:09:42AM +0100, Mike Galbraith wrote: > On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: > > So I have been tightening up rcutorture a bit over the past year. > > The other day, I came across what looked like a great opportunity for > > further tightening, namely the schedule() in rcu_torture_reader(). > > Why not turn this into a cond_resched(), speeding up the readers a bit > > and placing more stress on RCU? > > > > And boy does it increase stress! > > > > Unfortunately, this increased stress sometimes shows up in the form of > > lots of RCU CPU stall warnings. These can appear when an instance of > > rcu_torture_reader() gets a CPU to itself, in which case it won't ever > > enter the scheduler, and RCU will never see a quiescent state from that > > CPU, which means the grace period never ends. > > > > So I am taking a more measured approach to cond_resched() in > > rcu_torture_reader() for the moment. > > > > But longer term, should cond_resched() imply a set of RCU > > quiescent states? One way to do this would be to add calls to > > rcu_note_context_switch() in each of the various cond_resched() functions. > > Easy change, but of course adds some overhead. On the other hand, > > there might be more than a few of the 500+ calls to cond_resched() that > > expect that RCU CPU stalls will be prevented (to say nothing of > > might_sleep() and cond_resched_lock()). > > > > Thoughts? > > > > (Untested patch below, FWIW.) > > > > Thanx, Paul > > > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index b46131ef6aab..994d2b0fd0b2 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void) > > __cond_resched(); > > return 1; > > } > > + preempt_disable(); > > + rcu_note_context_switch(smp_processor_id()); > > + preempt_enable(); > > return 0; > > } > > EXPORT_SYMBOL(_cond_resched); > > Hm. Since you only care about the case where your task is solo, how > about do racy checks, 100% accuracy isn't required is it? Seems you > wouldn't want to unconditionally do that in tight loops. And indeed, my current workaround unconditionally does schedule() one out of 256 loops. I would do something similar here, perhaps based on per-CPU counters, perhaps even with racy accesses to avoid always doing preempt_disable()/preempt_enable(). Or did you have something else in mind? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Sun, 2014-03-16 at 07:09 +0100, Mike Galbraith wrote: > On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index b46131ef6aab..994d2b0fd0b2 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void) > > __cond_resched(); > > return 1; > > } > > + preempt_disable(); > > + rcu_note_context_switch(smp_processor_id()); > > + preempt_enable(); > > return 0; > > } > > EXPORT_SYMBOL(_cond_resched); > > Hm. Since you only care about the case where your task is solo, how > about do racy checks, 100% accuracy isn't required is it? Seems you > wouldn't want to unconditionally do that in tight loops. (or do that in scheduler_tick()?) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cond_resched() and RCU CPU stall warnings
On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: > So I have been tightening up rcutorture a bit over the past year. > The other day, I came across what looked like a great opportunity for > further tightening, namely the schedule() in rcu_torture_reader(). > Why not turn this into a cond_resched(), speeding up the readers a bit > and placing more stress on RCU? > > And boy does it increase stress! > > Unfortunately, this increased stress sometimes shows up in the form of > lots of RCU CPU stall warnings. These can appear when an instance of > rcu_torture_reader() gets a CPU to itself, in which case it won't ever > enter the scheduler, and RCU will never see a quiescent state from that > CPU, which means the grace period never ends. > > So I am taking a more measured approach to cond_resched() in > rcu_torture_reader() for the moment. > > But longer term, should cond_resched() imply a set of RCU > quiescent states? One way to do this would be to add calls to > rcu_note_context_switch() in each of the various cond_resched() functions. > Easy change, but of course adds some overhead. On the other hand, > there might be more than a few of the 500+ calls to cond_resched() that > expect that RCU CPU stalls will be prevented (to say nothing of > might_sleep() and cond_resched_lock()). > > Thoughts? > > (Untested patch below, FWIW.) > > Thanx, Paul > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b46131ef6aab..994d2b0fd0b2 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void) > __cond_resched(); > return 1; > } > + preempt_disable(); > + rcu_note_context_switch(smp_processor_id()); > + preempt_enable(); > return 0; > } > EXPORT_SYMBOL(_cond_resched); Hm. Since you only care about the case where your task is solo, how about do racy checks, 100% accuracy isn't required is it? Seems you wouldn't want to unconditionally do that in tight loops. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
cond_resched() and RCU CPU stall warnings
So I have been tightening up rcutorture a bit over the past year. The other day, I came across what looked like a great opportunity for further tightening, namely the schedule() in rcu_torture_reader(). Why not turn this into a cond_resched(), speeding up the readers a bit and placing more stress on RCU? And boy does it increase stress! Unfortunately, this increased stress sometimes shows up in the form of lots of RCU CPU stall warnings. These can appear when an instance of rcu_torture_reader() gets a CPU to itself, in which case it won't ever enter the scheduler, and RCU will never see a quiescent state from that CPU, which means the grace period never ends. So I am taking a more measured approach to cond_resched() in rcu_torture_reader() for the moment. But longer term, should cond_resched() imply a set of RCU quiescent states? One way to do this would be to add calls to rcu_note_context_switch() in each of the various cond_resched() functions. Easy change, but of course adds some overhead. On the other hand, there might be more than a few of the 500+ calls to cond_resched() that expect that RCU CPU stalls will be prevented (to say nothing of might_sleep() and cond_resched_lock()). Thoughts? (Untested patch below, FWIW.) Thanx, Paul diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b46131ef6aab..994d2b0fd0b2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void) __cond_resched(); return 1; } + preempt_disable(); + rcu_note_context_switch(smp_processor_id()); + preempt_enable(); return 0; } EXPORT_SYMBOL(_cond_resched); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/