On Tue, Oct 30, 2018 at 11:00:51AM +0800, Peter Xu wrote: > On Mon, Oct 29, 2018 at 12:29:22PM -0600, Alex Williamson wrote: > > On Mon, 29 Oct 2018 17:14:46 +0800 > > Jason Wang <jasow...@redhat.com> wrote: > > > > > On 2018/10/29 上午10:42, Simon Guo wrote: > > > > Hi, > > > > > > > > I am using network device pass through mode with qemu x86(-device > > > > vfio-pci,host=0000:xx:yy.z) > > > > and “intel_iommu=on” in host kernel command line, and it shows the > > > > whole guest memory > > > > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I > > > > understand it is due > > > > to device can DMA to any guest memory address and it cannot be swapped. > > > > > > > > However can we just pin a rang of address space allowed by iommu group > > > > of that device, > > > > instead of pin whole address space? I do notice some code like > > > > vtd_host_dma_iommu(). > > > > Maybe there is already some way to enable that? > > > > > > > > Sorry if I missed some basics. I googled some but no luck to find the > > > > answer yet. Please > > > > let me know if any discussion already raised on that. > > > > > > > > Any other suggestion will also be appreciated. For example, can we > > > > modify the guest network > > > > card driver to allocate only from a specific memory region(zone), and > > > > qemu advises guest > > > > kernel to only pin that memory region(zone) accordingly? > > > > > > > > Thanks, > > > > - Simon > > > > > > > > > One possible method is to enable IOMMU of VM. > > > > Right, making use of a virtual IOMMU in the VM is really the only way > > to bound the DMA to some subset of guest memory, but vIOMMU usage by > > the guest is optional on x86 and even if the guest does use it, it might > > enable passthrough mode, which puts you back at the problem that all > > guest memory is pinned with the additional problem that it might also > > be accounted for once per assigned device and may hit locked memory > > limits. Also, the DMA mapping and unmapping path with a vIOMMU is very > > slow, so performance of the device in the guest will be abysmal unless > > the use case is limited to very static mappings, such as userspace use > > within the guest for nested assignment or perhaps DPDK use cases. > > > > Modifying the guest to only use a portion of memory for DMA sounds like > > a quite intrusive option. There are certainly IOMMU models where the > > IOMMU provides a fixed IOVA range, but creating dynamic mappings within > > that range doesn't really solve anything given that it simply returns > > us to a vIOMMU with slow mapping. A window with a fixed identity > > mapping used as a DMA zone seems plausible, but again, also pretty > > intrusive to the guest, possibly also to the drivers. Host IOMMU page > > faulting can also help the pinned memory footprint, but of course > > requires hardware support and lots of new code paths, many of which are > > already being discussed for things like Scalable IOV and SVA. Thanks, > > Agree with Jason's and Alex's comments. One trivial additional: the > whole guest RAM will possibly still be pinned for a very short period > during guest system boot (e.g., when running guest BIOS) and before > the guest kernel enables the vIOMMU for the assigned device since the > bootup code like BIOS would still need to be able to access the whole > guest memory. >
Peter, Alex, Jason, Thanks for your nice/detailed explanation. BR, - Simon