On Thu, 2015-09-03 at 07:26 +0200, Knut Omang wrote: > On Thu, 2015-09-03 at 08:33 +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2015-09-02 at 10:12 -0600, Alex Williamson wrote: > > > > > There are very specific rules for translating requester IDs across > > > bridges. Bus numbers can change during enumeration, devfn cannot. > > Thanks for clarifying that point, Alex, I realize I was a bit imprecise > in my last mail, > > > > devfn can however be masked by topology changes from PCIe to PCI. If > > > we pretend that the IOMMU can distinguish requester IDs where it > > > can't on real hardware, we're going to break the guest. Thanks, > > > > Note that whether a PCI / PCI-X bridge will mask devfn, bus# or both or > > even mask it partially (number of bits) or replace some transfers with > > its own RID ... depends on a given bridge implementation. > > > > Another thing is while I agree that the bus number is problematic, > > since it changes, it is still what the HW actually uses to match the > > requester in practice, at least on PHB and I would think on Intel. > > > > The problem is more fundamental. qemu is trying to bind devices to > > address spaces in a fixed way at device creation time, while this is > > lazily resolved in HW at the point of the DMA occurring. > > So let me try to sum up my understanding in context of the patch in > terms of these two approaches, > > > One way to fix it is to effectively have an address space per device, > > and have the iommu translate function figure out the binding > > dynamically and flush things if it detects a change. But that is tricky > > for vfio and it means invalidations will have to iterate all address > > spaces. > > So my patch is along these lines by actually moving the address space > pointer into the device struct. > The benefit is that: > * The data structure for the DMA address space can be reused across > IOMMUs, and the address spaces can be set up before bus numbers are > > assigned, and the implementation is fairly simple. > * The IOMMU does not have to be notified of bus changes, except for > invalidation purposes (but wouldn't a new enumeration cause a full > IOMMU invalidate anyway?) > > The drawbacks are: > * The IOMMUs get to know explicitly about devices behind a bridge, > which logically deviates from how hardware works and > complicates future attempts to implement bridges that > translate RIDs. > * Each device can have only one DMA address space mapping associated > with it (I suppose it might be possible to have a topology that > would allow multiple paths to a device, but do we care at this > stage?) > > > The other option is to create Address Spaces on the fly as we lookup > > domains, and bind them to devices lazily, but again, we need to deal > > with changes/invalidations and that can be nasty with VFIO. > > We could get here without changing the interfaces, by refining the > current implementation to just cache bus pointers at setup, then lazily > add address spaces for each device. This approach would yield IOMMU > device specific implementations, but would still in practice associate > devices with address spaces.
As the thread went silent after our conclusions, I have made a second implementation for the Intel IOMMU according to this alternate scheme, It keeps the current API and handles the bus number resolution lazily within the IOMMU implementation, I will post the (single) patch as v3 of this. Hopefully this is acceptable and can be leveraged to do a similar rework, or be abstracted as generic functionality (?) for the other architectures,.. Thanks, Knut