From: Frederic Weisbecker <frede...@kernel.org>

When a NOCB CPU fails to create a nocb kthread on bringup, the CPU is
then deoffloaded. The barrier mutex is locked at this stage. It is
typically used to protect against concurrent (de-)offloading and/or
concurrent rcu_barrier() that would otherwise risk a nocb locking
imbalance. However:

* rcu_barrier() can't run concurrently if it's the boot CPU on early
  boot-up.

* rcu_barrier() can run concurrently if it's a secondary CPU but it is
  expected to see 0 callbacks on this target because it's the first
  time it boots.

* (de-)offloading can't happen concurrently with smp_init(), as
  rcutorture is initialized later, at least not before device_initcall(),
  and userspace isn't available yet.

* (de-)offloading can't happen concurrently with cpu_up(), courtesy of
  cpu_hotplug_lock.

But:

* The lazy shrinker might run concurrently with cpu_up(). It shouldn't
  try to grab the nocb_lock and risk an imbalance due to lazy_len
  supposed to be 0 but be extra cautious.

* Also be cautious against resume from hibernation potential subtleties.

So keep the locking and add some assertions and comments.

Signed-off-by: Frederic Weisbecker <frede...@kernel.org>
Signed-off-by: Paul E. McKenney <paul...@kernel.org>
Reviewed-by: Paul E. McKenney <paul...@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadh...@kernel.org>
---
 kernel/rcu/tree_nocb.h | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f4112fc663a7..fdd0616f2fd1 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1442,7 +1442,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
                                "rcuog/%d", rdp_gp->cpu);
                if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, 
OOM is now expected behavior\n", __func__)) {
                        mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
-                       goto end;
+                       goto err;
                }
                WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
                if (kthread_prio)
@@ -1454,7 +1454,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
        t = kthread_create(rcu_nocb_cb_kthread, rdp,
                           "rcuo%c/%d", rcu_state.abbr, cpu);
        if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is 
now expected behavior\n", __func__))
-               goto end;
+               goto err;
 
        if (rcu_rdp_is_offloaded(rdp))
                wake_up_process(t);
@@ -1467,7 +1467,15 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
        WRITE_ONCE(rdp->nocb_cb_kthread, t);
        WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
        return;
-end:
+
+err:
+       /*
+        * No need to protect against concurrent rcu_barrier()
+        * because the number of callbacks should be 0 for a non-boot CPU,
+        * therefore rcu_barrier() shouldn't even try to grab the nocb_lock.
+        * But hold barrier_mutex to avoid nocb_lock imbalance from shrinker.
+        */
+       WARN_ON_ONCE(system_state > SYSTEM_BOOTING && 
rcu_segcblist_n_cbs(&rdp->cblist));
        mutex_lock(&rcu_state.barrier_mutex);
        if (rcu_rdp_is_offloaded(rdp)) {
                rcu_nocb_rdp_deoffload(rdp);
-- 
2.40.1


Reply via email to