Hi Jean, On 1/30/24 19:22, Jean-Philippe Brucker wrote: > On Mon, Jan 29, 2024 at 05:38:55PM +0100, Eric Auger wrote: >>> There may be a separate argument for clearing bypass. With a coldplugged >>> VFIO device the flow is: >>> >>> 1. Map the whole guest address space in VFIO to implement boot-bypass. >>> This allocates all guest pages, which takes a while and is wasteful. >>> I've actually crashed a host that way, when spawning a guest with too >>> much RAM. >> interesting >>> 2. Start the VM >>> 3. When the virtio-iommu driver attaches a (non-identity) domain to the >>> assigned endpoint, then unmap the whole address space in VFIO, and most >>> pages are given back to the host. >>> >>> We can't disable boot-bypass because the BIOS needs it. But instead the >>> flow could be: >>> >>> 1. Start the VM, with only the virtual endpoints. Nothing to pin. >>> 2. The virtio-iommu driver disables bypass during boot >> We needed this boot-bypass mode for booting with virtio-blk-scsi >> protected with virtio-iommu for instance. >> That was needed because we don't have any virtio-iommu driver in edk2 as >> opposed to intel iommu driver, right? > Yes. What I had in mind is the x86 SeaBIOS which doesn't have any IOMMU > driver and accesses the default SATA device: > > $ qemu-system-x86_64 -M q35 -device virtio-iommu,boot-bypass=off > qemu: virtio_iommu_translate sid=250 is not known!! > qemu: no buffer available in event queue to report event > qemu: AHCI: Failed to start FIS receive engine: bad FIS receive buffer > address > > But it's the same problem with edk2. Also a guest OS without a > virtio-iommu driver needs boot-bypass. Once firmware boot is complete, the > OS with a virtio-iommu driver normally can turn bypass off in the config > space, it's not useful anymore. If it needs to put some endpoints in > bypass, then it can attach them to a bypass domain.
yup > >>> 3. Hotplug the VFIO device. With bypass disabled there is no need to pin >>> the whole guest address space, unless the guest explicitly asks for an >>> identity domain. >>> >>> However, I don't know if this is a realistic scenario that will actually >>> be used. >>> >>> By the way, do you have an easy way to reproduce the issue described here? >>> I've had to enable iommu.forcedac=1 on the command-line, otherwise Linux >>> just allocates 32-bit IOVAs. >> I don't have a simple generic reproducer. It happens when assigning this >> device: >> Ethernet Controller E810-C for QSFP (Ethernet Network Adapter E810-C-Q2) >> >> I have not encountered that issue with another device yet. >> I see on guest side in dmesg: >> [ 6.849292] ice 0000:00:05.0: Using 64-bit DMA addresses >> >> That's emitted in dma-iommu.c iommu_dma_alloc_iova(). >> Looks like the guest first tries to allocate an iova in the 32-bit AS >> and if this fails use the whole dma_limit. >> Seems the 32b IOVA alloc failed here ;-) > Interesting, are you running some demanding workload and a lot of CPUs? > That's a lot of IOVAs used up, I'm curious about what kind of DMA pattern > does that. Well nothing smart, just booting the guest with the assigned NIC. 8 vcpus Thanks Eric > > Thanks, > Jean >