Hi Eric, >-----Original Message----- >From: Eric Auger <eric.au...@redhat.com> >Sent: Tuesday, July 4, 2023 7:15 PM >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device >assignment > >When running on a 64kB page size host and protecting a VFIO device >with the virtio-iommu, qemu crashes with this kind of message: > >qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible >with mask 0x20010000
Does 0x20010000 mean only 512MB and 64KB super page mapping is supported for host iommu hw? 4KB mapping not supported? There is a check in guest kernel side hint only 4KB is supported, with this patch we force viommu->pgsize_bitmap to 0x20010000 and fail below check? Does this device work in guest? Please correct me if I understand wrong. static int viommu_domain_finalise(struct viommu_endpoint *vdev, struct iommu_domain *domain) { ... viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap); if (viommu_page_size > PAGE_SIZE) { dev_err(vdev->dev, "granule 0x%lx larger than system page size 0x%lx\n", viommu_page_size, PAGE_SIZE); return -ENODEV; } Another question is: Presume 0x20010000 does mean only 512MB and 64KB is supported. Is host hva->hpa mapping ensured to be compatible with at least 64KB mapping? If host mmu only support 4KB mapping or other non-compatible with 0x20010000, will vfio_dma_map() fail? >qemu: hardware error: vfio: DMA mapping failed, unable to continue > >This is due to the fact the IOMMU MR corresponding to the VFIO device >is enabled very late on domain attach, after the machine init. >The device reports a minimal 64kB page size but it is too late to be >applied. virtio_iommu_set_page_size_mask() fails and this causes >vfio_listener_region_add() to end up with hw_error(); > >To work around this issue, we transiently enable the IOMMU MR on >machine init to collect the page size requirements and then restore >the bypass state. > >Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned >device") >Signed-off-by: Eric Auger <eric.au...@redhat.com> >--- > include/hw/virtio/virtio-iommu.h | 2 ++ > hw/virtio/virtio-iommu.c | 30 ++++++++++++++++++++++++++++-- > hw/virtio/trace-events | 1 + > 3 files changed, 31 insertions(+), 2 deletions(-) > >diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio- >iommu.h >index 2ad5ee320b..a93fc5383e 100644 >--- a/include/hw/virtio/virtio-iommu.h >+++ b/include/hw/virtio/virtio-iommu.h >@@ -61,6 +61,8 @@ struct VirtIOIOMMU { > QemuRecMutex mutex; > GTree *endpoints; > bool boot_bypass; >+ Notifier machine_done; >+ bool granule_frozen; > }; > > #endif >diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c >index 1cd258135d..1eaf81bab5 100644 >--- a/hw/virtio/virtio-iommu.c >+++ b/hw/virtio/virtio-iommu.c >@@ -24,6 +24,7 @@ > #include "hw/virtio/virtio.h" > #include "sysemu/kvm.h" > #include "sysemu/reset.h" >+#include "sysemu/sysemu.h" > #include "qapi/error.h" > #include "qemu/error-report.h" > #include "trace.h" >@@ -1106,12 +1107,12 @@ static int >virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr, > } > > /* >- * After the machine is finalized, we can't change the mask anymore. If by >+ * Once the granule is frozen we can't change the mask anymore. If by > * chance the hotplugged device supports the same granule, we can still > * accept it. Having a different masks is possible but the guest will use > * sub-optimal block sizes, so warn about it. > */ >- if (phase_check(PHASE_MACHINE_READY)) { >+ if (s->granule_frozen) { > int new_granule = ctz64(new_mask); > int cur_granule = ctz64(cur_mask); > >@@ -1146,6 +1147,27 @@ static void virtio_iommu_system_reset(void >*opaque) > > } > >+static void virtio_iommu_freeze_granule(Notifier *notifier, void *data) >+{ >+ VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done); >+ bool boot_bypass = s->config.bypass; >+ int granule; >+ >+ /* >+ * Transient IOMMU MR enable to collect page_size_mask requirement >+ * through memory_region_iommu_set_page_size_mask() called by >+ * VFIO region_add() callback >+ */ >+ s->config.bypass = 0; >+ virtio_iommu_switch_address_space_all(s); >+ /* restore default */ >+ s->config.bypass = boot_bypass; >+ virtio_iommu_switch_address_space_all(s); >+ s->granule_frozen = true; >+ granule = ctz64(s->config.page_size_mask); >+ trace_virtio_iommu_freeze_granule(BIT(granule)); >+} It looks a bit heavy here just in order to get page_size_mask from host side. But maybe this is the only way with current interface. Thanks Zhenzhong >+ > static void virtio_iommu_device_realize(DeviceState *dev, Error **errp) > { > VirtIODevice *vdev = VIRTIO_DEVICE(dev); >@@ -1189,6 +1211,9 @@ static void virtio_iommu_device_realize(DeviceState >*dev, Error **errp) > error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!"); > } > >+ s->machine_done.notify = virtio_iommu_freeze_granule; >+ qemu_add_machine_init_done_notifier(&s->machine_done); >+ > qemu_register_reset(virtio_iommu_system_reset, s); > } > >@@ -1198,6 +1223,7 @@ static void >virtio_iommu_device_unrealize(DeviceState *dev) > VirtIOIOMMU *s = VIRTIO_IOMMU(dev); > > qemu_unregister_reset(virtio_iommu_system_reset, s); >+ qemu_remove_machine_init_done_notifier(&s->machine_done); > > g_hash_table_destroy(s->as_by_busptr); > if (s->domains) { >diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events >index 8f8d05cf9b..68b752e304 100644 >--- a/hw/virtio/trace-events >+++ b/hw/virtio/trace-events >@@ -131,6 +131,7 @@ virtio_iommu_set_page_size_mask(const char *name, >uint64_t old, uint64_t new) "m > virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s" > virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s" > virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, >bool on) "Device %02x:%02x.%x switching address space (iommu >enabled=%d)" >+virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to >0x%"PRIx64 > > # virtio-mem.c > virtio_mem_send_response(uint16_t type) "type=%" PRIu16 >-- >2.38.1