On 12.03.26 18:07, Sebastian Andrzej Siewior wrote:
> On 2026-02-16 17:24:56 [+0100], Jan Kiszka wrote:
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -25,6 +25,7 @@
>>  #include <linux/cpu.h>
>>  #include <linux/sched/isolation.h>
>>  #include <linux/sched/task_stack.h>
>> +#include <linux/smpboot.h>
>>  
>>  #include <linux/delay.h>
>>  #include <linux/panic_notifier.h>
>> @@ -1350,7 +1351,7 @@ static void vmbus_message_sched(struct 
>> hv_per_cpu_context *hv_cpu, void *message
>>      }
>>  }
>>  
>> -void vmbus_isr(void)
>> +static void __vmbus_isr(void)
>>  {
>>      struct hv_per_cpu_context *hv_cpu
>>              = this_cpu_ptr(hv_context.cpu_context);
>> @@ -1363,6 +1364,53 @@ void vmbus_isr(void)
>>  
>>      add_interrupt_randomness(vmbus_interrupt);
> 
> This is feeding entropy and would like to see interrupt registers. But
> since this is invoked from a thread it won't.
> 

Good point, will move this to vmbus_isr.

>>  }
>> +
> …
>> +void vmbus_isr(void)
>> +{
>> +    if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
>> +            vmbus_irqd_wake();
>> +    } else {
>> +            lockdep_hardirq_threaded();
> 
> What clears this? This is wrongly placed. This should go to
> sysvec_hyperv_callback() instead with its matching canceling part. The
> add_interrupt_randomness() should also be there and not here.
> sysvec_hyperv_stimer0() managed to do so.

First of all, we need to keep all this in generic code to avoid missing
arm64.

But the question about lockdep_hardirq_threaded() is valid - and that
not only for this new code: I tried hard to understand from the code how
hardirq_threaded is managed, but I simply couldn't find the spot where
it is reset after lockdep_hardirq_threaded() but before returning from
the interrupt to the task that now has hardirq_threaded=1. I failed, and
so I started a debugger. That confirms for the existing code path
(__handle_irq_event_percpu) that we are indeed returning to the
interrupted task with hardirq_threaded set. I'm not sure if that is
intended that only the next irq_enter_rcu->lockdep_hardirq_enter of the
next IRQ over this same task will reset the flag again.

With that in mind, the new logic here is no different from the one the
kernel used before. If both are not doing what they should, we likely
want to add a generic reset of hardirq_threaded to the IRQ exit path(s).

> 
> Different question: What guarantees that there won't be another
> interrupt before this one is done? The handshake appears to be
> deprecated. The interrupt itself returns ACKing (or not) but the actual
> handler is delayed to this thread. Depending on the userland it could
> take some time and I don't know how impatient the host is.
> 

Good question. I guess people familiar with the hv interface need to
comment on that.

>> +            __vmbus_isr();
> Moving on. This (trying very hard here) even schedules tasklets. Why?
> You need to disable BH before doing so. Otherwise it ends in ksoftirqd.
> You don't want that.
> 

You are referring to the re-existing logic now, aren't you?

> Couldn't the whole logic be integrated into the IRQ code? Then we could
> have mask/ unmask if supported/ provided and threaded interrupts. Then
> sysvec_hyperv_reenlightenment() could use a proper threaded interrupt
> instead apic_eoi() + schedule_delayed_work(). 
> 

Again, you are thinking x86-only. We need a portable solution.

>> +    }
>> +}
>>  EXPORT_SYMBOL_FOR_MODULES(vmbus_isr, "mshv_vtl");
>>  
>>  static irqreturn_t vmbus_percpu_isr(int irq, void *dev_id)
> 
> Sebastian

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

Reply via email to