Consider the following admittedly improbable sequence of events:

o       RCU is initially idle.

o       Task A on CPU 0 executes rcu_read_lock().

o       Task B on CPU 1 executes synchronize_rcu(), which must
        wait on Task A:

        o       Task B registers the callback, which starts a new
                grace period, awakening the grace-period kthread
                on CPU 3, which immediately starts a new grace period.

        o       Task B migrates to CPU 2, which provides a quiescent
                state for both CPUs 1 and 2.

        o       Both CPUs 1 and 2 take scheduling-clock interrupts,
                and both invoke RCU_SOFTIRQ, both thus learning of the
                new grace period.

        o       Task B is delayed, perhaps by vCPU preemption on CPU 2.

o       CPUs 2 and 3 pass through quiescent states, which are reported
        to core RCU.

o       Task B is resumed just long enough to be migrated to CPU 3,
        and then is once again delayed.

o       Task A executes rcu_read_unlock(), exiting its RCU read-side
        critical section.

o       CPU 0 passes through a quiescent sate, which is reported to
        core RCU.  Only CPU 1 continues to block the grace period.

o       CPU 1 passes through a quiescent state, which is reported to
        core RCU.  This ends the grace period, and CPU 1 therefore
        invokes its callbacks, one of which awakens Task B via
        complete().

o       Task B resumes (still on CPU 3) and starts executing
        wait_for_completion(), which sees that the completion has
        already completed, and thus does not block.  It returns from
        the synchronize_rcu() without any ordering against the
        end of Task A's RCU read-side critical section.

        It can therefore mess up Task A's RCU read-side critical section,
        in theory, anyway.

However, if CPU hotplug ever gets rid of stop_machine(), there will be
more straightforward ways for this sort of thing to happen, so this
commit adds a memory barrier in order to enforce the needed ordering.

Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 5033b66d2753..9e599fcdd7bf 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -413,6 +413,16 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t 
*crcu_array,
                        wait_for_completion(&rs_array[i].completion);
                destroy_rcu_head_on_stack(&rs_array[i].head);
        }
+
+       /*
+        * If we migrated after we registered a callback, but before the
+        * corresponding wait_for_completion(), we might now be running
+        * on a CPU that has not yet noticed that the corresponding grace
+        * period has ended.  That CPU might not yet be fully ordered
+        * against the completion of the grace period, so the full memory
+        * barrier below enforces that ordering via the completion's state.
+        */
+       smp_mb(); /* ^^^ */
 }
 EXPORT_SYMBOL_GPL(__wait_rcu_gp);
 
-- 
2.5.2

Reply via email to