From: Sebastian Andrzej Siewior <[email protected]> Sent: Thursday, March
12, 2026 10:07 AM
>
Let me try to address the range of questions here and in the follow-up
discussion. As background, an overview of VMBus interrupt handling is in:
Documentation/virt/hyperv/vmbus.rst
in the section entitled "Synthetic Interrupt Controller (synic)". The
relevant text is:
The SINT is mapped to a single per-CPU architectural interrupt (i.e,
an 8-bit x86/x64 interrupt vector, or an arm64 PPI INTID). Because
each CPU in the guest has a synic and may receive VMBus interrupts,
they are best modeled in Linux as per-CPU interrupts. This model works
well on arm64 where a single per-CPU Linux IRQ is allocated for
VMBUS_MESSAGE_SINT. This IRQ appears in /proc/interrupts as an IRQ labelled
"Hyper-V VMbus". Since x86/x64 lacks support for per-CPU IRQs, an x86
interrupt vector is statically allocated (HYPERVISOR_CALLBACK_VECTOR)
across all CPUs and explicitly coded to call vmbus_isr(). In this case,
there's no Linux IRQ, and the interrupts are visible in aggregate in
/proc/interrupts on the "HYP" line.
The use of a statically allocated sysvec pre-dates my involvement in this
code starting in 2017, but I believe it was modelled after what Xen does,
and for the same reason -- to effectively create a per-CPU interrupt on
x86/x64. Acorn is also using HYPERVISOR_CALLBACK_VECTOR, but I
don't know if that is also to create a per-CPU interrupt.
More below ....
> On 2026-02-16 17:24:56 [+0100], Jan Kiszka wrote:
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -25,6 +25,7 @@
> > #include <linux/cpu.h>
> > #include <linux/sched/isolation.h>
> > #include <linux/sched/task_stack.h>
> > +#include <linux/smpboot.h>
> >
> > #include <linux/delay.h>
> > #include <linux/panic_notifier.h>
> > @@ -1350,7 +1351,7 @@ static void vmbus_message_sched(struct
> > hv_per_cpu_context *hv_cpu, void *message
> > }
> > }
> >
> > -void vmbus_isr(void)
> > +static void __vmbus_isr(void)
> > {
> > struct hv_per_cpu_context *hv_cpu
> > = this_cpu_ptr(hv_context.cpu_context);
> > @@ -1363,6 +1364,53 @@ void vmbus_isr(void)
> >
> > add_interrupt_randomness(vmbus_interrupt);
>
> This is feeding entropy and would like to see interrupt registers. But
> since this is invoked from a thread it won't.
I'll respond to this topic on the new thread for the new patch
where Jan has moved the call to add_interrupt_randomness().
>
> > }
> > +
> …
> > +void vmbus_isr(void)
> > +{
> > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > + vmbus_irqd_wake();
> > + } else {
> > + lockdep_hardirq_threaded();
>
> What clears this? This is wrongly placed. This should go to
> sysvec_hyperv_callback() instead with its matching canceling part. The
> add_interrupt_randomness() should also be there and not here.
> sysvec_hyperv_stimer0() managed to do so.
I don't have any knowledge to bring regarding the use of
lockdep_hardirq_threaded().
>
> Different question: What guarantees that there won't be another
> interrupt before this one is done? The handshake appears to be
> deprecated. The interrupt itself returns ACKing (or not) but the actual
> handler is delayed to this thread. Depending on the userland it could
> take some time and I don't know how impatient the host is.
In more recent versions of Hyper-V, what's deprecated is Hyper-V implicitly
and automatically doing the EOI. So in sysvec_hyperv_callback(), apic_eoi()
is usually explicitly called to ack the interrupt.
There's no guarantee, in either the existing case or the new PREEMPT_RT
case, that another VMBus interrupt won't come in on the same CPU
before the tasklets scheduled by vmbus_message_sched() or
vmbus_chan_sched() have run. From a functional standpoint, the Linux
code and interaction with Hyper-V handles another interrupt correctly.
From a delay standpoint, there's not a problem for the normal (i.e., not
PREEMPT_RT) case because the tasklets run as the interrupt exits -- they
don't end up in ksoftirqd. For the PREEMPT_RT case, I can see your point
about delays since the tasklets are scheduled from the new per-CPU thread.
But my understanding is that Jan's motivation for these changes is not to
achieve true RT behavior, since Hyper-V doesn't provide that anyway.
The goal is simply to make PREEMPT_RT builds functional, though Jan may
have further comments on the goal.
>
> > + __vmbus_isr();
> Moving on. This (trying very hard here) even schedules tasklets. Why?
> You need to disable BH before doing so. Otherwise it ends in ksoftirqd.
> You don't want that.
Again, Jan can comment on the impact of delays due to ending up
in ksoftirqd.
>
> Couldn't the whole logic be integrated into the IRQ code? Then we could
> have mask/ unmask if supported/ provided and threaded interrupts. Then
> sysvec_hyperv_reenlightenment() could use a proper threaded interrupt
> instead apic_eoi() + schedule_delayed_work().
As I described above, Hyper-V needs a per-CPU interrupt. It's faked up
on x86/x64 with the hardcoded HYPERVISOR_CALLBACK_VECTOR sysvec
entry, but on arm64 a normal Linux per-CPU IRQ is used. Once the execution
path gets to vmbus_isr(), the two architectures share the same code. Same
thing is done with the Hyper-V STIMER0 interrupt as a per-CPU interrupt.
If there's a better way to fake up a per-CPU interrupt on x86/x64, I'm open
to looking at it.
As I recently discovered in discussion with Jan, standard Linux IRQ handling
will *not* thread per-CPU interrupts. So even on arm64 with a standard
Linux per-CPU IRQ is used for VMBus and STIMER0 interrupts, we can't
request threading.
I need to refresh my memory on sysvec_hyperv_reenlightenment(). If
I recall correctly, it's not a per-CPU interrupt, so it probably doesn't
need to have a hardcoded vector. Overall, the Hyper-V reenlightenment
functionality is a bit of a fossil that isn't needed on modern x86/x64
processors that support TSC scaling. And it doesn't exist for arm64.
It might be worth seeing if it could be dropped entirely ...
Michael
>
> > + }
> > +}
> > EXPORT_SYMBOL_FOR_MODULES(vmbus_isr, "mshv_vtl");
> >
> > static irqreturn_t vmbus_percpu_isr(int irq, void *dev_id)
>
> Sebastian