Thanks David for the response.

On 2 Jul 2024, at 2:34 PM, David Woodhouse <dw...@infradead.org> wrote:

On Tue, 2024-07-02 at 05:17 +0000, Sandesh Patel wrote:
Hi All,
Is it possible to setup a large Windows VM (say 512 vcpus) without
adding viommu (EIM=on, IR=on)?
When I try to power such VM, the qemu process crashes with error-
```
qemu-kvm: ../accel/kvm/kvm-all.c:1837: kvm_irqchip_commit_routes: Assertion 
`ret == 0’ failed


Interesting. What exactly has Windows *done* in those MSI entries? That
might give a clue about how to support it.

The KVM_SET_GSI_ROUTING ioctl calls kvm_set_routing_entry function in kvm.

int kvm_set_routing_entry(struct kvm *kvm, struct kvm_kernel_irq_routing_entry 
*e,
                             const struct kvm_irq_routing_entry *ue) {

    switch (ue->type) {
        case KVM_IRQ_ROUTING_MSI:
             e->set = kvm_set_msi;
             e->msi.address_lo = ue->u.msi.address_lo;
             e->msi.address_hi = ue->u.msi.address_hi;
             e->msi.data = ue->u.msi.data;

             if (kvm_msi_route_invalid(kvm, e))
                 return -EINVAL;
            break;
    }
}

static inline bool kvm_msi_route_invalid(struct kvm *kvm,
         struct kvm_kernel_irq_routing_entry *e)
{
    return kvm->arch.x2apic_format && (e->msi.address_hi & 0xff);
}

That means msi.address_hi must have 0 in the last byte.

Qemu function kvm_arch_fixup_msi_route is responsible for fixing msi.address_hi 
value in
msi routing entry that is passed to kvm.
This function got msi.addr_hi:  0x0 in input when iommu was enabled and 
msi.addr_hi:  0x1
when viommu was not enabled for one of the entry. The same value was returned 
in the output.
and saved as routing entry.


The VM boots fine if we attach a vIOMMU but adding a vIOMMU can
potentially result in IO performance loss in guest.
I was interested to know if someone could boot a large Windows VM by
some other means like kvm-msi-ext-dest-id.

I worked with Microsoft folks when I was defining the msi-ext-dest-id
support, and Hyper-V does it exactly the same way. But that's on the
*hypervisor* side. At the time, I don't believe Windows as a guest was
planning to use it.

But I actually thought Windows worked OK without being able to direct
external interrupts to all vCPUs, so it didn't matter?

I think not. Looks like there is difference in approach how hyperv limits the 
irq delivery vs how Qemu/kvm do it.

Overheads of viommu have been shown for example in -
https://static.sched.com/hosted_files/kvmforum2021/da/vIOMMU%20KVM%20
Forum%202021%20-%20v4.pdf

Isn't that for DMA translation though? If you give the guest an
intel_iommu with dma_translation=off then it should *only* do interrupt
remapping.

Thanks for the suggestion. It avoids DMA translations and hence no major 
performance loss.

Reply via email to