On Tue, Jun 03, 2025 at 03:22:42PM -0400, Joel Fernandes wrote: > > > On 6/3/2025 3:03 PM, Joel Fernandes wrote: > > > > > > On 6/3/2025 2:59 PM, Joel Fernandes wrote: > >> On Fri, May 30, 2025 at 09:55:45AM +0800, Xiongfeng Wang wrote: > >>> Hi Joel, > >>> > >>> On 2025/5/29 0:30, Joel Fernandes wrote: > >>>> On Wed, May 21, 2025 at 5:43 AM Xiongfeng Wang > >>>> <wangxiongfe...@huawei.com> wrote: > >>>>> > >>>>> Hi RCU experts, > >>>>> > >>>>> When I ran syskaller in Linux 6.6 with CONFIG_PREEMPT_RCU enabled, I got > >>>>> the following soft lockup. The Calltrace is too long. I put it in the > >>>>> end. > >>>>> The issue can also be reproduced in the latest kernel. > >>>>> > >>>>> The issue is as follows. CPU3 is waiting for a spin_lock, which is got > >>>>> by CPU1. > >>>>> But CPU1 stuck in the following dead loop. > >>>>> > >>>>> irq_exit() > >>>>> __irq_exit_rcu() > >>>>> /* in_hardirq() returns false after this */ > >>>>> preempt_count_sub(HARDIRQ_OFFSET) > >>>>> tick_irq_exit() > >>>>> tick_nohz_irq_exit() > >>>>> tick_nohz_stop_sched_tick() > >>>>> trace_tick_stop() /* a bpf prog is hooked on this trace > >>>>> point */ > >>>>> __bpf_trace_tick_stop() > >>>>> bpf_trace_run2() > >>>>> rcu_read_unlock_special() > >>>>> /* will send a IPI to itself */ > >>>>> irq_work_queue_on(&rdp->defer_qs_iw, > >>>>> rdp->cpu); > >>>>> > >>>>> /* after interrupt is enabled again, the irq_work is called */ > >>>>> asm_sysvec_irq_work() > >>>>> sysvec_irq_work() > >>>>> irq_exit() /* after handled the irq_work, we again enter into > >>>>> irq_exit() */ > >>>>> __irq_exit_rcu() > >>>>> ...skip... > >>>>> /* we queue a irq_work again, and enter a dead loop */ > >>>>> irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu); > >>>>
The following is a candidate fix (among other fixes being considered/discussed). The change is to check if context tracking thinks we're in IRQ and if so, avoid the irq_work. IMO, this should be rare enough that it shouldn't be an issue and it is dangerous to self-IPI consistently while we're exiting an IRQ anyway. Thoughts? Xiongfeng, do you want to try it? Btw, I could easily reproduce it as a boot hang by doing: --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -638,6 +638,10 @@ void irq_enter(void) static inline void tick_irq_exit(void) { + rcu_read_lock(); + WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true); + rcu_read_unlock(); + #ifdef CONFIG_NO_HZ_COMMON int cpu = smp_processor_id(); ---8<----------------------- From: Joel Fernandes <joelagn...@nvidia.com> Subject: [PATCH] Do not schedule irq_work when IRQ is exiting Signed-off-by: Joel Fernandes <joelagn...@nvidia.com> --- include/linux/context_tracking_irq.h | 2 ++ kernel/context_tracking.c | 12 ++++++++++++ kernel/rcu/tree_plugin.h | 3 ++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h index 197916ee91a4..35a5ad971514 100644 --- a/include/linux/context_tracking_irq.h +++ b/include/linux/context_tracking_irq.h @@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void); void ct_irq_exit_irqson(void); void ct_nmi_enter(void); void ct_nmi_exit(void); +bool ct_in_irq(void); #else static __always_inline void ct_irq_enter(void) { } static __always_inline void ct_irq_exit(void) { } @@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { } static inline void ct_irq_exit_irqson(void) { } static __always_inline void ct_nmi_enter(void) { } static __always_inline void ct_nmi_exit(void) { } +static inline bool ct_in_irq(void) { return false; } #endif #endif diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index fb5be6e9b423..8e8055cf04af 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -392,6 +392,18 @@ noinstr void ct_irq_exit(void) ct_nmi_exit(); } +/** + * ct_in_irq - check if CPU is currently in a tracked IRQ context. + * + * Returns true if ct_irq_enter() has been called and ct_irq_exit() + * has not yet been called. This indicates the CPU is currently + * processing an interrupt. + */ +bool ct_in_irq(void) +{ + return ct_nmi_nesting() != 0; +} + /* * Wrapper for ct_irq_enter() where interrupts are enabled. * diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3c0bbbbb686f..a3eebd4c841e 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -673,7 +673,8 @@ static void rcu_read_unlock_special(struct task_struct *t) set_tsk_need_resched(current); set_preempt_need_resched(); if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled && - expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) { + expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu) && + !ct_in_irq()) { // Get scheduler to re-evaluate and call hooks. // If !IRQ_WORK, FQS scan will eventually IPI. if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && -- 2.34.1