On 2016年11月30日 17:23, Peter Xu wrote: > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote: >> * intel_iommu's replay op is not implemented yet (May come in different >> patch >> set). >> The replay function is required for hotplug vfio device and to move >> devices >> between existing domains. > > I am thinking about this replay thing recently and now I start to > doubt whether the whole vt-d vIOMMU framework suites this... > > Generally speaking, current work is throwing away the IOMMU "domain" > layer here. We maintain the mapping only per device, and we don't care > too much about which domain it belongs. This seems problematic. > > A simplest wrong case for this is (let's assume cache-mode is > enabled): if we have two assigned devices A and B, both belong to the > same domain 1. Meanwhile, in domain 1 assume we have one mapping which > is the first page (iova range 0-0xfff). Then, if guest wants to > invalidate the page, it'll notify VT-d vIOMMU with an invalidation > message. If we do this invalidation per-device, we'll need to UNMAP > the region twice - once for A, once for B (if we have more devices, we > will unmap more times), and we can never know we have done duplicated > work since we don't keep domain info, so we don't know they are using > the same address space. The first unmap will work, and then we'll > possibly get some errors on the rest of dma unmap failures.
Hi Peter: According VTD spec 6.2.2.1, "Software must ensure that, if multiple context-entries (or extended-context-entries) are programmed with the same Domain-id (DID), such entries must be programmed with same value for the secondlevel page-table pointer (SLPTPTR) field, and same value for the PASID Table Pointer (PASIDTPTR) field.". So if two assigned device may have different IO page table, they should be put into different domains. > > Looks like we just cannot live without knowing this domain layer. > Because the address space is binded to the domain. If we want to sync > the address space (here to setup a correct shadow page table), we need > to do it per-domain. > > What I can think of as a solution is that we introduce this "domain" > layer - like a memory region per domain. When invalidation happens, > it's per-domain, not per-device any more (actually I guess that's what > current vt-d iommu driver in kernel is doing, we just ignored it - we > fetch the devices that matches the domain ID). We can/need to maintain > something different, like sid <-> domain mappings (we can do this as > long as we are notified when context entries changed), per-domain > mappings (just like per-device mappings that we are trying to build in > this series, but what we really need is IMHO per domain one), etc. > When device switches domain, we switch the IOMMU memory region > accordingly. > > Does this make any sense? Comments are greatly welcomed (especially > from AlexW and DavidG). > > Thanks, > > -- peterx > -- Best regards Tianyu Lan