On Sat, 2024-09-28 at 15:59 +0100, David Woodhouse wrote:
> On Tue, 2024-07-02 at 05:17 +0000, Sandesh Patel wrote:
> > 
> > The error is due to invalid MSIX routing entry passed to KVM.
> > 
> > The VM boots fine if we attach a vIOMMU but adding a vIOMMU can
> > potentially result in IO performance loss in guest.
> > I was interested to know if someone could boot a large Windows VM by
> > some other means like kvm-msi-ext-dest-id.
> 
> I think I may (with Alex Graf's suggestion) have found the Windows bug
> with Intel IOMMU.
> 
> It looks like when interrupt remapping is enabled with an AMD CPU,
> Windows *assumes* it can generate AMD-style MSI messages even if the
> IOMMU is an Intel one. If we put a little hack into the IOMMU interrupt
> remapping to make it interpret an AMD-style message, Windows seems to
> boot at least a little bit further than it did before...

Sadly, Windows has *more* bugs than that.

The previous hack extracted the Interrupt Remapping Table Entry (IRTE)
index from an AMD-style MSI message, and looked it up in the Intel
IOMMU's IR Table.

That works... for the MSIs generated by the I/O APIC.

However... in the Intel IOMMU model, there is a single global IRT, and
each entry specifies which devices are permitted to invoke it. The AMD
model is slightly nicer, in that it allows a per-device IRT.

So for a PCI device, Windows just seems to configure each MSI vector in
order, with IRTE#0, 1, onwards. Because it's a per-device number space,
right? Which means that first MSI vector on a PCI device gets aliased
to IRQ#0 on the I/O APIC.

I dumped the whole IRT, and it isn't just that Windows is using the
wrong index; it hasn't even set up the correct destination in *any* of
the entries. So we can't even do a nasty trick like scanning and
funding the Nth entry which is valid for a particular source-id.

Happily, Windows has *more* bugs than that... if I run with
`-cpu host,+hv-avic' then it puts the high bits of the target APIC ID
into the high bits of the MSI address. This *ought* to mean that MSIs
from device miss the APIC (at 0x00000000FEExxxxx) and scribble over
guest memory at addresses like 0x1FEE00004. But we can add yet
*another* hack to catch that. For now I just hacked it to move the low
7 extra bits in to the "right" place for the 15-bit extension.

--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -361,6 +361,14 @@ static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
         return;
     }
     attrs.requester_id = pci_requester_id(dev);
+    printf("Send MSI 0x%lx/0x%x from 0x%x\n", msg.address, msg.data, 
attrs.requester_id);
+    if (msg.address >> 32) {
+        uint64_t ext_id = msg.address >> 32;
+        msg.address &= 0xffffffff;
+        msg.address |= ext_id << 5;
+        printf("Now 0x%lx/0x%x with ext_id %lx\n", msg.address, msg.data, 
ext_id);
+    }
+        
     address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
                          attrs, NULL);
 }

We also need to stop forcing Windows to use logical mode, and force it
to use physical mode instead:

--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -158,7 +158,7 @@ static void init_common_fadt_data(MachineState *ms, Object 
*o,
              * used
              */
             ((ms->smp.max_cpus > 8) ?
-                        (1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL) : 0),
+                        (1 << 
ACPI_FADT_F_FORCE_APIC_PHYSICAL_DESTINATION_MODE) : 0),
         .int_model = 1 /* Multiple APIC */,
         .rtc_century = RTC_CENTURY,
         .plvl2_lat = 0xfff /* C2 state not supported */,


So now, with *no* IOMMU configured, Windows Server 2022 is booting and
using CPUs > 255:
  Send MSI 0x1fee01000/0x41b0 from 0xfa
  Now 0xfee01020/0x41b0 with ext_id 1

That trick obviously can't work the the I/O APIC, but I haven't managed
to persuade Windows to target I/O APIC interrupts at any CPU other than
#0 yet. I'm trying to make QEMU run with *only* higher APIC IDs, to
test.

It may be that we need to advertise an Intel IOMMU that *only* has the
I/O APIC behind it, and all the actual PCI devices are direct, so we
can abuse that last Windows bug.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to