On 1/24/24 13:58, Philippe Mathieu-Daudé wrote: > On 24/1/24 12:53, Cédric Le Goater wrote: >> On 1/18/24 20:20, Vivek Kasireddy wrote: >>> Recent updates in OVMF and Seabios have resulted in MMIO regions >>> being placed at the upper end of the physical address space. As a >>> result, when a Host device is assigned to the Guest via VFIO, the >>> following mapping failures occur when VFIO tries to map the MMIO >>> regions of the device: >>> VFIO_MAP_DMA failed: Invalid argument >>> vfio_dma_map(0x557b2f2736d0, 0x380000000000, 0x1000000, >>> 0x7f98ac400000) = -22 (Invalid argument) >>> >>> The above failures are mainly seen on some Intel platforms where >>> the physical address width is larger than the Host's IOMMU >>> address width. In these cases, VFIO fails to map the MMIO regions >>> because the IOVAs would be larger than the IOMMU aperture regions. >>> >>> Therefore, one way to solve this problem would be to ensure that >>> cpu->phys_bits = <IOMMU phys_bits> >>> This can be done by parsing the IOMMU caps value from sysfs and >>> extracting the address width and using it to override the >>> phys_bits value as shown in this patch. >>> >>> Previous attempt at solving this issue in OVMF: >>> https://edk2.groups.io/g/devel/topic/102359124 >>> >>> Cc: Gerd Hoffmann <kra...@redhat.com> >>> Cc: Philippe Mathieu-Daudé <phi...@linaro.org> >>> Cc: Alex Williamson <alex.william...@redhat.com> >>> Cc: Cédric Le Goater <c...@redhat.com> >>> Cc: Laszlo Ersek <ler...@redhat.com> >>> Cc: Dongwon Kim <dongwon....@intel.com> >>> Acked-by: Gerd Hoffmann <kra...@redhat.com> >>> Tested-by: Yanghang Liu <yangh...@redhat.com> >>> Signed-off-by: Vivek Kasireddy <vivek.kasire...@intel.com> >>> >>> --- >>> v2: >>> - Replace the term passthrough with assigned (Laszlo) >>> - Update the commit message to note that both OVMF and Seabios >>> guests are affected (Cédric) >>> - Update the subject to indicate what is done in the patch >>> --- >>> target/i386/host-cpu.c | 61 +++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 60 insertions(+), 1 deletion(-) > > >>> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error >>> **errp) >>> +{ >>> + g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL; >>> + const char *driver = qemu_opt_get(opts, "driver"); >>> + const char *device = qemu_opt_get(opts, "host"); >>> + uint32_t *iommu_phys_bits = opaque; >>> + struct stat st; >>> + uint64_t iommu_caps; >>> + >>> + /* >>> + * Check if the user requested VFIO device assignment. We don't >>> have >>> + * to limit phys_bits if there are no valid assigned devices. >>> + */ >>> + if (g_strcmp0(driver, "vfio-pci") || !device) { >>> + return 0; >>> + } >>> + >>> + dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device); >>> + if (stat(dev_path, &st) < 0) { >>> + return 0; >>> + } >>> + >>> + iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path); >>> + if (stat(iommu_path, &st) < 0) { >>> + return 0; >>> + } >>> + >>> + if (g_file_get_contents(iommu_path, &caps, NULL, NULL)) { >>> + if (sscanf(caps, "%lx", &iommu_caps) != 1) { >> >> nit. This should use a PRIx64 define. >> >>> + return 0; >>> + } >>> + *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1; >> >> Please use 0x3fULL > > or: > > *iommu_phys_bits = 1 + extract32(iommu_caps, 16, 6);
Huh, interesting; I've never seen this recommended before, even though it comes from a very old commit -- 84988cf910a6 ("bitops.h: Add functions to extract and deposit bitfields", 2012-07-07). I thought only edk2 had BitFieldRead32() :) Laszlo