Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop

Joel Fernandes Wed, 04 Jun 2025 02:31:08 -0700

On 6/3/2025 11:37 PM, Qi Xi wrote:
> Hi Joel,
> 
> The patch works as expected. Previously, the issue triggered a soft lockup
> within ~10 minutes, but after applying the fix, the system ran stably for 30+
> minutes without any issues.


Great to hear! Thanks for testing. I/we will roll this into a proper patch and
will provide it once have something ready.

 - Joel


> 
> Thanks,
> Qi
> 
> On 2025/6/4 11:25, Xiongfeng Wang wrote:
>> On 2025/6/4 9:35, Joel Fernandes wrote:
>>> On Tue, Jun 03, 2025 at 03:22:42PM -0400, Joel Fernandes wrote:
>>>> On 6/3/2025 3:03 PM, Joel Fernandes wrote:
>>>>> On 6/3/2025 2:59 PM, Joel Fernandes wrote:
>>>>>> On Fri, May 30, 2025 at 09:55:45AM +0800, Xiongfeng Wang wrote:
>>>>>>> Hi Joel,
>>>>>>>
>>>>>>> On 2025/5/29 0:30, Joel Fernandes wrote:
>>>>>>>> On Wed, May 21, 2025 at 5:43 AM Xiongfeng Wang
>>>>>>>> <wangxiongfe...@huawei.com> wrote:
>>>>>>>>> Hi RCU experts,
>>>>>>>>>
>>>>>>>>> When I ran syskaller in Linux 6.6 with CONFIG_PREEMPT_RCU enabled, I 
>>>>>>>>> got
>>>>>>>>> the following soft lockup. The Calltrace is too long. I put it in the 
>>>>>>>>> end.
>>>>>>>>> The issue can also be reproduced in the latest kernel.
>>>>>>>>>
>>>>>>>>> The issue is as follows. CPU3 is waiting for a spin_lock, which is 
>>>>>>>>> got by CPU1.
>>>>>>>>> But CPU1 stuck in the following dead loop.
>>>>>>>>>
>>>>>>>>> irq_exit()
>>>>>>>>>   __irq_exit_rcu()
>>>>>>>>>     /* in_hardirq() returns false after this */
>>>>>>>>>     preempt_count_sub(HARDIRQ_OFFSET)
>>>>>>>>>     tick_irq_exit()
>>>>>>>>>       tick_nohz_irq_exit()
>>>>>>>>>             tick_nohz_stop_sched_tick()
>>>>>>>>>               trace_tick_stop()  /* a bpf prog is hooked on this 
>>>>>>>>> trace point */
>>>>>>>>>                    __bpf_trace_tick_stop()
>>>>>>>>>                       bpf_trace_run2()
>>>>>>>>>                             rcu_read_unlock_special()
>>>>>>>>>                               /* will send a IPI to itself */
>>>>>>>>>                               irq_work_queue_on(&rdp->defer_qs_iw, 
>>>>>>>>> rdp->cpu);
>>>>>>>>>
>>>>>>>>> /* after interrupt is enabled again, the irq_work is called */
>>>>>>>>> asm_sysvec_irq_work()
>>>>>>>>>   sysvec_irq_work()
>>>>>>>>> irq_exit() /* after handled the irq_work, we again enter into 
>>>>>>>>> irq_exit() */
>>>>>>>>>   __irq_exit_rcu()
>>>>>>>>>     ...skip...
>>>>>>>>>            /* we queue a irq_work again, and enter a dead loop */
>>>>>>>>>            irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
>>> The following is a candidate fix (among other fixes being
>>> considered/discussed). The change is to check if context tracking thinks
>>> we're in IRQ and if so, avoid the irq_work. IMO, this should be rare enough
>>> that it shouldn't be an issue and it is dangerous to self-IPI consistently
>>> while we're exiting an IRQ anyway.
>>>
>>> Thoughts? Xiongfeng, do you want to try it?
>> Thanks a lot for the fast response. My colleague is testing the modification.
>>  She will feedback the result.
>>
>> Thanks,
>> Xiongfeng
>>
>>> Btw, I could easily reproduce it as a boot hang by doing:
>>>
>>> --- a/kernel/softirq.c
>>> +++ b/kernel/softirq.c
>>> @@ -638,6 +638,10 @@ void irq_enter(void)
>>>  
>>>  static inline void tick_irq_exit(void)
>>>  {
>>> +   rcu_read_lock();
>>> +   WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
>>> +   rcu_read_unlock();
>>> +
>>>  #ifdef CONFIG_NO_HZ_COMMON
>>>     int cpu = smp_processor_id();
>>>  
>>> ---8<-----------------------
>>>
>>> From: Joel Fernandes <joelagn...@nvidia.com>
>>> Subject: [PATCH] Do not schedule irq_work when IRQ is exiting
>>>
>>> Signed-off-by: Joel Fernandes <joelagn...@nvidia.com>
>>> ---
>>>  include/linux/context_tracking_irq.h |  2 ++
>>>  kernel/context_tracking.c            | 12 ++++++++++++
>>>  kernel/rcu/tree_plugin.h             |  3 ++-
>>>  3 files changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/context_tracking_irq.h 
>>> b/include/linux/context_tracking_irq.h
>>> index 197916ee91a4..35a5ad971514 100644
>>> --- a/include/linux/context_tracking_irq.h
>>> +++ b/include/linux/context_tracking_irq.h
>>> @@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void);
>>>  void ct_irq_exit_irqson(void);
>>>  void ct_nmi_enter(void);
>>>  void ct_nmi_exit(void);
>>> +bool ct_in_irq(void);
>>>  #else
>>>  static __always_inline void ct_irq_enter(void) { }
>>>  static __always_inline void ct_irq_exit(void) { }
>>> @@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { }
>>>  static inline void ct_irq_exit_irqson(void) { }
>>>  static __always_inline void ct_nmi_enter(void) { }
>>>  static __always_inline void ct_nmi_exit(void) { }
>>> +static inline bool ct_in_irq(void) { return false; }
>>>  #endif
>>>  
>>>  #endif
>>> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
>>> index fb5be6e9b423..8e8055cf04af 100644
>>> --- a/kernel/context_tracking.c
>>> +++ b/kernel/context_tracking.c
>>> @@ -392,6 +392,18 @@ noinstr void ct_irq_exit(void)
>>>     ct_nmi_exit();
>>>  }
>>>  
>>> +/**
>>> + * ct_in_irq - check if CPU is currently in a tracked IRQ context.
>>> + *
>>> + * Returns true if ct_irq_enter() has been called and ct_irq_exit()
>>> + * has not yet been called. This indicates the CPU is currently
>>> + * processing an interrupt.
>>> + */
>>> +bool ct_in_irq(void)
>>> +{
>>> +   return ct_nmi_nesting() != 0;
>>> +}
>>> +
>>>  /*
>>>   * Wrapper for ct_irq_enter() where interrupts are enabled.
>>>   *
>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>> index 3c0bbbbb686f..a3eebd4c841e 100644
>>> --- a/kernel/rcu/tree_plugin.h
>>> +++ b/kernel/rcu/tree_plugin.h
>>> @@ -673,7 +673,8 @@ static void rcu_read_unlock_special(struct task_struct 
>>> *t)
>>>                     set_tsk_need_resched(current);
>>>                     set_preempt_need_resched();
>>>                     if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
>>> -                       expboost && !rdp->defer_qs_iw_pending && 
>>> cpu_online(rdp->cpu)) {
>>> +                       expboost && !rdp->defer_qs_iw_pending && 
>>> cpu_online(rdp->cpu) &&
>>> +                       !ct_in_irq()) {
>>>                             // Get scheduler to re-evaluate and call hooks.
>>>                             // If !IRQ_WORK, FQS scan will eventually IPI.
>>>                             if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) 
>>> &&
>>>

Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop

Reply via email to