Re: [PATCH v3] drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT

Jan Kiszka Tue, 17 Mar 2026 22:52:32 -0700

On 17.03.26 18:25, Michael Kelley wrote:
> From: Sebastian Andrzej Siewior <[email protected]> Sent: Thursday, March 
> 12, 2026 10:07 AM
>>
> 
> Let me try to address the range of questions here and in the follow-up
> discussion. As background, an overview of VMBus interrupt handling is in:
> 
> Documentation/virt/hyperv/vmbus.rst
> 
> in the section entitled "Synthetic Interrupt Controller (synic)". The
> relevant text is:
> 
>    The SINT is mapped to a single per-CPU architectural interrupt (i.e,
>    an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
>    each CPU in the guest has a synic and may receive VMBus interrupts,
>    they are best modeled in Linux as per-CPU interrupts. This model works
>    well on arm64 where a single per-CPU Linux IRQ is allocated for
>    VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
>    "Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
>    interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
>    across all CPUs and explicitly coded to call vmbus_isr(). In this case,
>    there's no Linux IRQ, and the interrupts are visible in aggregate in
>    /proc/interrupts on the "HYP" line.
> 
> The use of a statically allocated sysvec pre-dates my involvement in this
> code starting in 2017, but I believe it was modelled after what Xen does,
> and for the same reason -- to effectively create a per-CPU interrupt on
> x86/x64. Acorn is also using HYPERVISOR_CALLBACK_VECTOR, but I
> don't know if that is also to create a per-CPU interrupt.


Long ago, we demonstrated via Jailhouse that you do not necessarily gain
complexity on the hypervisor side by providing a minimal PCI host and
attaching all your virtual devices to that instead. Even longer ago in
the absence of proper IRQ controller virtualization on the various
archs, there was a bit of performance to gain doing "special"
interrupts. All these design decisions made sense at a certain time but
you would likely no longer repeat them today.

> 
> More below ....
> 
>> On 2026-02-16 17:24:56 [+0100], Jan Kiszka wrote:
>>> --- a/drivers/hv/vmbus_drv.c
>>> +++ b/drivers/hv/vmbus_drv.c
>>> @@ -25,6 +25,7 @@
>>>  #include <linux/cpu.h>
>>>  #include <linux/sched/isolation.h>
>>>  #include <linux/sched/task_stack.h>
>>> +#include <linux/smpboot.h>
>>>
>>>  #include <linux/delay.h>
>>>  #include <linux/panic_notifier.h>
>>> @@ -1350,7 +1351,7 @@ static void vmbus_message_sched(struct 
>>> hv_per_cpu_context *hv_cpu, void *message
>>>     }
>>>  }
>>>
>>> -void vmbus_isr(void)
>>> +static void __vmbus_isr(void)
>>>  {
>>>     struct hv_per_cpu_context *hv_cpu
>>>             = this_cpu_ptr(hv_context.cpu_context);
>>> @@ -1363,6 +1364,53 @@ void vmbus_isr(void)
>>>
>>>     add_interrupt_randomness(vmbus_interrupt);
>>
>> This is feeding entropy and would like to see interrupt registers. But
>> since this is invoked from a thread it won't.
> 
> I'll respond to this topic on the new thread for the new patch
> where Jan has moved the call to add_interrupt_randomness().
> 
>>
>>>  }
>>> +
>> …
>>> +void vmbus_isr(void)
>>> +{
>>> +   if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
>>> +           vmbus_irqd_wake();
>>> +   } else {
>>> +           lockdep_hardirq_threaded();
>>
>> What clears this? This is wrongly placed. This should go to
>> sysvec_hyperv_callback() instead with its matching canceling part. The
>> add_interrupt_randomness() should also be there and not here.
>> sysvec_hyperv_stimer0() managed to do so.
> 
> I don't have any knowledge to bring regarding the use of
> lockdep_hardirq_threaded().
> 
>>
>> Different question: What guarantees that there won't be another
>> interrupt before this one is done? The handshake appears to be
>> deprecated. The interrupt itself returns ACKing (or not) but the actual
>> handler is delayed to this thread. Depending on the userland it could
>> take some time and I don't know how impatient the host is.
> 
> In more recent versions of Hyper-V, what's deprecated is Hyper-V implicitly
> and automatically doing the EOI. So in sysvec_hyperv_callback(), apic_eoi()
> is usually explicitly called to ack the interrupt.
> 
> There's no guarantee, in either the existing case or the new PREEMPT_RT
> case, that another VMBus interrupt won't come in on the same CPU
> before the tasklets scheduled by vmbus_message_sched() or
> vmbus_chan_sched() have run. From a functional standpoint, the Linux
> code and interaction with Hyper-V handles another interrupt correctly.
> 
> From a delay standpoint, there's not a problem for the normal (i.e., not
> PREEMPT_RT) case because the tasklets run as the interrupt exits -- they
> don't end up in ksoftirqd. For the PREEMPT_RT case, I can see your point
> about delays since the tasklets are scheduled from the new per-CPU thread.
> But my understanding is that Jan's motivation for these changes is not to
> achieve true RT behavior, since Hyper-V doesn't provide that anyway.
> The goal is simply to make PREEMPT_RT builds functional, though Jan may
> have further comments on the goal.
> 

That is exactly the goal: A Linux guest happening to use a PREEMPT_RT
kernel should correctly run on Hyper-V, and that without losing relevant
performance. However, we do not expect any deterministic timing behavior
from such a setup.

>>
>>> +           __vmbus_isr();
>> Moving on. This (trying very hard here) even schedules tasklets. Why?
>> You need to disable BH before doing so. Otherwise it ends in ksoftirqd.
>> You don't want that.
> 
> Again, Jan can comment on the impact of delays due to ending up
> in ksoftirqd.
> 
>>
>> Couldn't the whole logic be integrated into the IRQ code? Then we could
>> have mask/ unmask if supported/ provided and threaded interrupts. Then
>> sysvec_hyperv_reenlightenment() could use a proper threaded interrupt
>> instead apic_eoi() + schedule_delayed_work().
> 
> As I described above, Hyper-V needs a per-CPU interrupt. It's faked up
> on x86/x64 with the hardcoded HYPERVISOR_CALLBACK_VECTOR sysvec
> entry, but on arm64 a normal Linux per-CPU IRQ is used. Once the execution
> path gets to vmbus_isr(), the two architectures share the same code. Same
> thing is done with the Hyper-V STIMER0 interrupt as a per-CPU interrupt.
> If there's a better way to fake up a per-CPU interrupt on x86/x64, I'm open
> to looking at it.
> 
> As I recently discovered in discussion with Jan, standard Linux IRQ handling
> will *not* thread per-CPU interrupts. So even on arm64 with a standard
> Linux per-CPU IRQ is used for VMBus and STIMER0 interrupts, we can't
> request threading.
> 
> I need to refresh my memory on sysvec_hyperv_reenlightenment(). If
> I recall correctly, it's not a per-CPU interrupt, so it probably doesn't
> need to have a hardcoded vector. Overall, the Hyper-V reenlightenment
> functionality is a bit of a fossil that isn't needed on modern x86/x64
> processors that support TSC scaling. And it doesn't exist for arm64.
> It might be worth seeing if it could be dropped entirely ...
> 

I suppose that all depends on how long Linux needs to support the
underlying hypervisor versions and interfaces, no? It's a bit like
supporting old hardware...

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

Re: [PATCH v3] drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT

Reply via email to