Hi Walter Chang,

Le Wed, Mar 26, 2025 at 05:46:38AM +0000, Walter Chang (張維哲) a écrit :
> On Tue, 2025-01-21 at 09:08 -0800, Paul E. McKenney wrote:
> > On Sat, Jan 18, 2025 at 12:24:33AM +0100, Frederic Weisbecker wrote:
> > > hrtimers are migrated away from the dying CPU to any online target
> > > at
> > > the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth
> > > timers
> > > handling tasks involved in the CPU hotplug forward progress.
> > > 
> > > However wake ups can still be performed by the outgoing CPU after
> > > CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers
> > > being armed. Depending on several considerations (crystal ball
> > > power management based election, earliest timer already enqueued,
> > > timer
> > > migration enabled or not), the target may eventually be the current
> > > CPU even if offline. If that happens, the timer is eventually
> > > ignored.
> > > 
> > > The most notable example is RCU which had to deal with each and
> > > every of
> > > those wake-ups by deferring them to an online CPU, along with
> > > related
> > > workarounds:
> > > 
> > > _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
> > > _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from
> > > offline CPU)
> > > _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
> > > 
> > > The problem isn't confined to RCU though as the stop machine
> > > kthread
> > > (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the
> > > end
> > > of its work through cpu_stop_signal_done() and performs a wake up
> > > that
> > > eventually arms the deadline server timer:
> > > 
> > >            WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086
> > > hrtimer_start_range_ns+0x289/0x2d0
> > >            CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
> > >            Stopper: multi_cpu_stop+0x0/0x120 <-
> > > stop_machine_cpuslocked+0x66/0xc0
> > >            RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
> > >            Call Trace:
> > >             <TASK>
> > >             ? hrtimer_start_range_ns
> > >             start_dl_timer
> > >             enqueue_dl_entity
> > >             dl_server_start
> > >             enqueue_task_fair
> > >             enqueue_task
> > >             ttwu_do_activate
> > >             try_to_wake_up
> > >             complete
> > >             cpu_stopper_thread
> > >             smpboot_thread_fn
> > >             kthread
> > >             ret_from_fork
> > >             ret_from_fork_asm
> > >             </TASK>
> > > 
> > > Instead of providing yet another bandaid to work around the
> > > situation,
> > > fix it from hrtimers infrastructure instead: always migrate away a
> > > timer to an online target whenever it is enqueued from an offline
> > > CPU.
> > > 
> > > This will also allow to revert all the above RCU disgraceful hacks.
> > > 
> > > Reported-by: Vlad Poenaru <vlad.w...@gmail.com>
> > > Reported-by: Usama Arif <usamaarif...@gmail.com>
> > > Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from
> > > outgoing CPU earlier")
> > > Closes: 20241213203739.1519801-1-usamaarif...@gmail.com
> > > Signed-off-by: Frederic Weisbecker <frede...@kernel.org>
> > > Signed-off-by: Paul E. McKenney <paul...@kernel.org>
> > 
> > This passes over-holiday testing rcutorture, so, perhaps redundantly:
> > 
> > Tested-by: Paul E. McKenney <paul...@kernel.org>
> 
> Hi,
> 
> I encountered the same issue even after applying this patch.
> Below are the details of the warning and call trace.
> 
> 
> migration/3: ------------[ cut here ]------------
> migration/3: WARNING: CPU: 3 PID: 42 at kernel/time/hrtimer.c:1125
> enqueue_hrtimer+0x7c/0xec
> migration/3: CPU: 3 UID: 0 PID: 42 Comm: migration/3 Tainted: G       
> OE      6.12.18-android16-0-g59cb5a849beb-4k #1
> 0b440e43fa7b24aaa3b7e6e5d2b938948e0cacdb
> migration/3: Stopper: multi_cpu_stop+0x0/0x184 <-
> stop_machine_cpuslocked+0xc0/0x15c

It's not the first time I get such a report on an out of tree
kernel. The problem is I don't know if the tainted modules are
involved. But something is probably making an offline CPU visible within
the hierarchy on get_nohz_timer_target(). And that new warning made
that visible.

Can you try this and tell us if the warning fires?

Thanks.

diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 6d67e9a5af6b..f49512628269 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -9,6 +9,7 @@
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 extern void nohz_balance_enter_idle(int cpu);
 extern int get_nohz_timer_target(void);
+extern void assert_domain_online(void);
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 #endif
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 07455d25329c..98c8f8408403 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -13,6 +13,7 @@
 #include <linux/sched/isolation.h>
 #include <linux/sched/task.h>
 #include <linux/sched/smt.h>
+#include <linux/sched/nohz.h>
 #include <linux/unistd.h>
 #include <linux/cpu.h>
 #include <linux/oom.h>
@@ -1277,6 +1278,7 @@ static int take_cpu_down(void *_param)
        if (err < 0)
                return err;
 
+       assert_domain_online();
        /*
         * Must be called from CPUHP_TEARDOWN_CPU, which means, as we are going
         * down, that the current state is CPUHP_TEARDOWN_CPU - 1.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 175a5a7ac107..88157b1645cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1163,6 +1163,20 @@ void resched_cpu(int cpu)
 
 #ifdef CONFIG_SMP
 #ifdef CONFIG_NO_HZ_COMMON
+void assert_domain_online(void)
+{
+       int cpu = smp_processor_id();
+       int i;
+       struct sched_domain *sd;
+
+       guard(rcu)();
+
+       for_each_domain(cpu, sd) {
+               for_each_cpu(i, sched_domain_span(sd)) {
+                       WARN_ON_ONCE(cpu_is_offline(i));
+               }
+       }
+}
 /*
  * In the semi idle case, use the nearest busy CPU for migrating timers
  * from an idle CPU.  This is good for power-savings.

Reply via email to