On 23/02/2023 20:55, Jason Gunthorpe wrote: > On Thu, Feb 23, 2023 at 01:06:33PM -0700, Alex Williamson wrote: >>> #2 is the presumption that the guest is using an identity map. >> Isn't it reasonable to require that a device support dirty tracking for >> the entire extent if its DMA address width in order to support this >> feature? > > No, 2**64 is too big a number to be reasonable. > +1
> Ideally we'd work it the other way and tell the vIOMMU that the vHW > only supports a limited number of address bits for the translation, eg > through the ACPI tables. Then the dirty tracking could safely cover > the larger of all system memory or the limited IOVA address space. > > Or even better figure out how to get interrupt remapping without IOMMU > support :\ FWIW That's generally my use of `iommu=pt` because all I want is interrupt remapping, not the DMA remapping part. And this is going to be specially relevant with these new boxes that easily surprass the >255 dedicated physical CPUs mark with just two sockets. The only other alternative I could see is to rely on IOMMU attribute for DMA translation. Today you can actually toggle that 'off' in VT-d (and I can imagine the same thing working for AMD-vIOMMU). In Intel it just omits the 39 Address-width cap. And it means it doesn't have virtual addressing. Similar to what Avihai already does for MAX_IOVA, we would do for DMA_TRANSLATION, and let each vIOMMU implementation support that. But to be honest I am not sure how robust relying on that is as that doesn't really represent a hardware implementation. Without vIOMMU you have a (KVM) PV op in new *guest* kernels that (ab)uses some unused bits in IOAPIC for a 24-bit DestID. But this is only on new guests and hypervisors, old *guests* running older < 5.15 kernels won't work. ... So iommu=pt really is the most convenient right now :/ Joao