On Mon, Jul 09, 2018 at 07:50:54PM +0100, David Woodhouse wrote: > > > On Mon, 2018-07-09 at 09:34 -0700, Paul E. McKenney wrote: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 51919985f6cf..33b0a1ec0536 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2496,6 +2496,10 @@ void rcu_check_callbacks(int user) > > { > > trace_rcu_utilization(TPS("Start scheduler-tick")); > > raw_cpu_inc(rcu_data.ticks_this_gp); > > + if (smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs)) && > > + !is_idle_task(current)) > > + set_tsk_need_resched(current); > > OK, so this will make KVM (and various other code) see that > need_resched() is true, and they'll call cond_resched() or something > else that might not actually schedule another task, but will at least > end up in rcu_all_qs()... > > > + __this_cpu_write(rcu_dynticks.rcu_urgent_qs, false); > > ... which bails out immediately and does nothing, because that's set to > false? > > Am I missing something?
If this is the idle task, RCU will detect that as a quiescent state via its dyntick-idle mechanism. In which case, there is no point in leaving .rcu_urgent_qs being true. If this is not the idle task, the scheduler will invoke rcu_note_context_switch(), which will in turn invoke rcu_sched_qs(), rcu_preempt_qs(), or rcu_qs(), depending on kernel version and configuration. This will happen independently of .rcu_urgent_qs, so it is OK to set .rcu_urgent_qs to false. And doing so reduces the overhead of the next cond_resched(). I may end up using rcu_is_cpu_rrupt_from_idle() instead of is_idle_task() at some point, but the former eases backporting. And the only difference is if someone has a long loop within an _rcuidle tracepoint used in the idle loop, and where that loop check need_resched(). Which currently seems to be the empty set. And I should treat interruption of a usermode task the same as that of an idle task. In the PREEMPT case, ->rcu_read_lock_nesting better be zero (lockdep would have complained), so the quiescent state will be reported. In the !PREEMPT case, user=1 directly causes reporting of a quiescent state. So here are the possible code paths when .rcu_urgent_qs is set to true: 1. A context switch will record the quiescent state and clear .rcu_urgent_qs. (The failure to do the clearing in current -rcu for PREEMPT builds is a performance bug that I need to fix.) 2. A cond_resched() will cause rcu_all_qs() to be invoked, which will record a quiescent state and clear .rcu_urgent_qs. 3. With the patch below, a scheduling-clock interrupt of a non-idle non-userspace task will force a reschedule, which will result in #1 above happening. However, I should avoid setting .rcu_urgent_qs to false when it is already false, shouldn't I? So how about the following instead? I am doing some light testing and will let you know how that goes. Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 51919985f6cf..c3b688c7127a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2496,6 +2496,15 @@ void rcu_check_callbacks(int user) { trace_rcu_utilization(TPS("Start scheduler-tick")); raw_cpu_inc(rcu_data.ticks_this_gp); + /* The load-acquire pairs with the store-release setting to true. */ + if (smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs))) { + /* Idle already is a quiescent state. */ + if (!is_idle_task(current) && !user) { + set_tsk_need_resched(current); + set_preempt_need_resched(); + } + __this_cpu_write(rcu_dynticks.rcu_urgent_qs, false); + } rcu_flavor_check_callbacks(user); if (rcu_pending()) invoke_rcu_core();