Hi Frederic, On Thu, Dec 25, 2025 at 11:20:23PM +0100, Frederic Weisbecker wrote: > Le Thu, Dec 25, 2025 at 02:44:50AM -0500, Joel Fernandes a écrit : > > The WakeOvfIsDeferred code path in __call_rcu_nocb_wake() attempts to > > wake rcuog when the callback count exceeds qhimark and callbacks aren't > > done with their GP (newly queued or awaiting GP). However, a lot of > > testing proves this wake is always redundant or useless. > > > > In the flooding case, rcuog is always waiting for a GP to finish. So > > waking up the rcuog thread is pointless. The timer wakeup adds overhead, > > rcuog simply wakes up and goes back to sleep achieving nothing. > > > > This path also adds a full memory barrier, and additional timer expiry > > modifications unnecessarily. > > > > The root cause is that WakeOvfIsDeferred fires when > > !rcu_segcblist_ready_cbs() (GP not complete), but waking rcuog cannot > > accelerate GP completion. > > > > This commit therefore removes this path, which also adding some rdp > > counters to ensure we don't have lost wake ups. > > There should be two patches: one that removes the useless path and the > other that adds the debugging.
Sure, will split. > > Tested with rcutorture scenarios: TREE01, TREE05, TREE08 (all NOCB > > configurations) - all pass. Also stress tested using a kernel module > > that floods call_rcu() to trigger the overload conditions and made the > > observations confirming the findings. > > > > Signed-off-by: Joel Fernandes <[email protected]> > > Cool! Just a few comments: > > > @@ -549,24 +546,26 @@ static void __call_rcu_nocb_wake(struct rcu_data > > *rdp, bool was_alldone, > > lazy_len = READ_ONCE(rdp->lazy_len); > > if (was_alldone) { > > rdp->qlen_last_fqs_check = len; > > + rdp->nocb_gp_wake_attempt = true; > > + rcu_nocb_unlock(rdp); > > // Only lazy CBs in bypass list > > if (lazy_len && bypass_len == lazy_len) { > > - rcu_nocb_unlock(rdp); > > wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY, > > TPS("WakeLazy")); > > } else if (!irqs_disabled_flags(flags)) { > > /* ... if queue was empty ... */ > > - rcu_nocb_unlock(rdp); > > wake_nocb_gp(rdp, false); > > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, > > TPS("WakeEmpty")); > > } else { > > - rcu_nocb_unlock(rdp); > > wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE, > > TPS("WakeEmptyIsDeferred")); > > } > > + > > + return; > > } else if (len > rdp->qlen_last_fqs_check + qhimark) { > > - /* ... or if many callbacks queued. */ > > + /* Callback overload condition. */ > > + WARN_ON_ONCE(!rdp->nocb_gp_wake_attempt && > > !rdp->nocb_gp_serving); > > With this test, the point of ->nocb_gp_serving is unclear given that both > states are cleared in the same place but ->nocb_gp_serving is set later by > the gp kthread. ->nocb_gp_serving implies ->nocb_gp_wake_attempt so the above > test is the same as WARN_ON_ONCE(!rdp->nocb_gp_wake_attempt). > > In fact ->nocb_gp_wake_attempt alone probably makes sense? Ah true, I got a bit paranoid about false positive warnings hence I added the extra variable, however on further analysis I realized the nocb lock takes care of preventing potential false positive warnings. So yes, I will just use the single variable. Thanks. > > > rdp->qlen_last_fqs_check = len; > > j = jiffies; > > if (j != rdp->nocb_gp_adv_time && > > @@ -575,21 +574,10 @@ static void __call_rcu_nocb_wake(struct rcu_data > > *rdp, bool was_alldone, > > rcu_advance_cbs_nowake(rdp->mynode, rdp); > > rdp->nocb_gp_adv_time = j; > > } > > - smp_mb(); /* Enqueue before timer_pending(). */ > > You need to remove the pairing smp_mb__after_spin_lock() in > do_nocb_deferred_wakeup_timer(). Ah, will do. Thanks! - Joel

