This is an RFC posted for design feedback rather than merge.
Problem
-------
In split-irqchip mode, KVM unconditionally advertises x2APIC Suppress EOI
Broadcast (SEOIB) support to the guest. This is wrong in two ways:
- IOAPIC v0x11 has no EOI register, so advertising SEOIB is incorrect.
- Even with IOAPIC v0x20, KVM ignores the guest's suppression request
and continues to broadcast LAPIC EOIs to the userspace IOAPIC.
This can cause interrupt storms in guests that rely on Directed EOI
semantics (e.g. Windows with Credential Guard, which hangs during boot).
KVM fix
-------
KVM now exposes two new x2APIC API flags to give userspace control:
- KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
- KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST
This patch
----------
This patch wires those flags into QEMU via a machine-level field
(kvm_lapic_seoib_state) with three policy states:
- SEOIB_STATE_QUIRKED (default): legacy behavior, no flags set
- SEOIB_STATE_RESPECTED: advertise SEOIB and honor guest suppression
- SEOIB_STATE_NOT_ADVERTISED: hide SEOIB from guest (for IOAPIC v0x11)
The current implementation automatically selects a policy based on IOAPIC
version at VM power-on time, and migrates the state as a VMState subsection.
Design challenges
-----------------
The KVM x2APIC API is one-way: once a flag is set, it cannot be reverted
back to the quirked state (consistent with other x2APIC API flags). This
has several implications:
- During incoming migration, we must defer setting the flags until after
the SEOIB state is loaded from the migration stream, since we cannot
know the source VM's policy in advance.
- Snapshot restore (loadvm) is problematic: if the running VM has already
set enabled/disabled, restoring a QUIRKED snapshot cannot revert the
KVM state. QMP/HMP loadvm makes this worse since it cannot be detected
at init time.
- The flags only take effect when kvm_apic_set_version() is called inside
KVM, which happens from limited paths such as IOAPIC initialization or
APIC state setting. This requires setting the flags before IOAPIC is
initialized, necessitating fetching the IOAPIC version from global
properties rather than from the initialized device.
- Automatic policy selection (current approach) changes behavior for
existing machine types without user opt-in, breaking migratability
from a new QEMU to an older QEMU for the same machine type.
- Machine-type gating (restrict to 10.2+) helps with older machine types,
but still breaks migration to a destination with an older kernel that
lacks the SEOIB API, again without user opt-in.
Preferred direction
-------------------
Given the above, I am leaning toward a KVM accelerator property (for
example, `seoib-policy=respected|not-advertised|quirked`) with the default
being quirked.
This approach:
- The user explicitly opts in. But requires user to understand/configure
the property.
- Works with any machine type, no machine-type gating needed.
- No need to fetch the IOAPIC version from globals.
Note: The current implementation also does not handle the QMP/HMP
loadvm corner cases.
I would appreciate feedback on the preferred approach.
Changes in v2:
- Update flags in patch description to match kernel naming.
Khushit Shah (1):
target/i386/kvm: Configure proper KVM SEOIB behavior
hw/i386/x86-common.c | 99 ++++++++++++++++++++++++++++++++++++
hw/i386/x86.c | 1 +
hw/intc/ioapic.c | 2 -
include/hw/i386/x86.h | 12 +++++
include/hw/intc/ioapic.h | 2 +
include/system/system.h | 1 +
system/vl.c | 5 ++
target/i386/kvm/kvm.c | 43 ++++++++++++++++
target/i386/kvm/kvm_i386.h | 12 +++++
target/i386/kvm/trace-events | 4 ++
10 files changed, 179 insertions(+), 2 deletions(-)
--
2.39.3