From: "Paul E. McKenney" <[email protected]>

The rcu_tasks_need_gpcb() samples ->percpu_dequeue_lim as part of the
condition clause of a "for" loop, which is a bit confusing.  This commit
therefore hoists this sampling out of the loop, using the result loaded
in the condition clause.

So why does this work in the face of a concurrent switch from single-CPU
queueing to per-CPU queueing?

o       The call_rcu_tasks_generic() that makes the change has already
        enqueued its callback, which means that all of the other CPU's
        callback queues are empty.

o       For the call_rcu_tasks_generic() that first notices
        the switch to per-CPU queues, the smp_store_release()
        used to update ->percpu_enqueue_lim pairs with the
        raw_spin_trylock_rcu_node()'s full barrier that is
        between the READ_ONCE(rtp->percpu_enqueue_shift) and the
        rcu_segcblist_enqueue() that enqueues the callback.

o       Because this CPU's queue is empty (unless it happens to
        be the original single queue, in which case there is no
        need for synchronization), this call_rcu_tasks_generic()
        will do an irq_work_queue() to schedule a handler for the
        needed rcuwait_wake_up() call.  This call will be ordered
        after the first call_rcu_tasks_generic() function's change to
        ->percpu_dequeue_lim.

o       This rcuwait_wake_up() will either happen before or after the
        set_current_state() in rcuwait_wait_event().  If it happens
        before, the "condition" argument's call to rcu_tasks_need_gpcb()
        will be ordered after the original change, and all callbacks on
        all CPUs will be visible.  Otherwise, if it happens after, then
        the grace-period kthread's state will be set back to running,
        which will result in a later call to rcuwait_wait_event() and
        thus to rcu_tasks_need_gpcb(), which will again see the change.

So it all works out.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
 kernel/rcu/tasks.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 83049a893de5..94bb5abdbb37 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -432,6 +432,7 @@ static void rcu_barrier_tasks_generic(struct rcu_tasks *rtp)
 static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
 {
        int cpu;
+       int dequeue_limit;
        unsigned long flags;
        bool gpdone = poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq);
        long n;
@@ -439,7 +440,8 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp)
        long ncbsnz = 0;
        int needgpcb = 0;
 
-       for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) {
+       dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim);
+       for (cpu = 0; cpu < dequeue_limit; cpu++) {
                struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
 
                /* Advance and accelerate any new callbacks. */
-- 
2.34.1

Reply via email to