> -----Original Message----- > From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel....@nongnu.org] > On Behalf Of Tian, Kevin > Sent: Wednesday, January 18, 2017 5:39 PM > To: Peter Xu <pet...@redhat.com>; Jason Wang <jasow...@redhat.com> > Cc: Lan, Tianyu <tianyu....@intel.com>; Raj, Ashok <ashok....@intel.com>; > m...@redhat.com; jan.kis...@siemens.com; bd.a...@gmail.com; qemu- > de...@nongnu.org; alex.william...@redhat.com > Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio > devices > > > From: Peter Xu [mailto:pet...@redhat.com] > > Sent: Wednesday, January 18, 2017 4:46 PM > > > > On Wed, Jan 18, 2017 at 04:36:05PM +0800, Jason Wang wrote: > > > > > > > > > On 2017年01月18日 16:11, Peter Xu wrote: > > > >On Wed, Jan 18, 2017 at 11:10:53AM +0800, Jason Wang wrote: > > > >> > > > >>On 2017年01月17日 22:45, Peter Xu wrote: > > > >>>On Mon, Jan 16, 2017 at 05:54:55PM +0800, Jason Wang wrote: > > > >>>>On 2017年01月16日 17:18, Peter Xu wrote: > > > >>>>>>> static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, > > > >>>>>>> uint16_t > > domain_id, > > > >>>>>>> hwaddr addr, uint8_t > > > >>>>>>>am) > > > >>>>>>> { > > > >>>>>>>@@ -1222,6 +1251,7 @@ static void > > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, > > > >>>>>>> info.addr = addr; > > > >>>>>>> info.mask = ~((1 << am) - 1); > > > >>>>>>> g_hash_table_foreach_remove(s->iotlb, > > > >>>>>>> vtd_hash_remove_by_page, > > &info); > > > >>>>>>>+ vtd_iotlb_page_invalidate_notify(s, domain_id, addr, > > > >>>>>>>+ am); > > > >>>>>>Is the case of GLOBAL or DSI flush missed, or we don't care it at > > > >>>>>>all? > > > >>>>>IMHO we don't. For device assignment, since we are having CM=1 > > > >>>>>here, we should have explicit page invalidations even if guest > > > >>>>>sends global/domain invalidations. > > > >>>>> > > > >>>>>Thanks, > > > >>>>> > > > >>>>>-- peterx > > > >>>>Is this spec required? > > > >>>I think not. IMO the spec is very coarse grained on describing > > > >>>cache mode... > > > >>> > > > >>>>Btw, it looks to me that both DSI and GLOBAL are indeed explicit > > > >>>>flush. > > > >>>Actually when cache mode is on, it is unclear to me on how we > > > >>>should treat domain/global invalidations, at least from the spec > > > >>>(as mentioned earlier). My understanding is that they are not > > > >>>"explicit flushes", which IMHO should only mean page selective > > > >>>IOTLB invalidations. > > > >>Probably not, at least from the view of performance. DSI and > > > >>global should be more efficient in some cases. > > > >I agree with you that DSI/GLOBAL flushes are more efficient in some > > > >way. But IMHO that does not mean these invalidations are "explicit > > > >invalidations", and I suspect whether cache mode has to coop with it. > > > > > > Well, the spec does not forbid DSI/GLOBAL with CM and the driver > > > codes had used them for almost ten years. I can hardly believe it's wrong. > > > > I think we have misunderstanding here. :) > > > > I never thought we should not send DSI/GLOBAL invalidations with cache > > mode. I just think we should not do anything special even if we have > > cache mode on when we receive these signals. > > > > In the spec, "explicit invalidation" is mentioned in the cache mode > > chapter: > > > > The Caching Mode (CM) field in Capability Register indicates if > > the hardware implementation caches not-present or erroneous > > translation-structure entries. When the CM field is reported as > > Set, any software updates to any remapping structures (including > > updates to not-present entries or present entries whose > > programming resulted in translation faults) requires explicit > > invalidation of the caches. > > > > And I thought we were discussion about "what is explicit invalidation" > > mentioned above. > > Check 6.5.3.1 Implicit Invalidation on Page Requests > > In addition to the explicit invalidation through invalidation commands > (see Section 6.5.1 and Section 6.5.2) or through deferred invalidation > messages (see Section 6.5.4), identified above, Page Requests from > endpoint devices invalidate entries in the IOTLBs and paging-structure > caches. > > My impression is that above indirectly defines invalidation commands ( > PSI/DSI/GLOBAL) as explicit invalidation, because they are explicitly issued > by > driver. Then section 6.5.3.1 further describes implicit invalidations caused > by > other VT-d operations. > > I will check with VT-d spec owner to clarify. > > > > > > > > > > > > > >But here I should add one more thing besides PSI - context entry > > > >invalidation should be one of "the explicit invalidations" as well, > > > >which we need to handle just like PSI when cache mode is on. > > > > > > > >>>>Just have a quick go through on driver codes and find this > > > >>>>something interesting in intel_iommu_flush_iotlb_psi(): > > > >>>> > > > >>>>... > > > >>>> /* > > > >>>> * Fallback to domain selective flush if no PSI support or the > > > >>>> size is > > > >>>> * too big. > > > >>>> * PSI requires page size to be 2 ^ x, and the base address is > naturally > > > >>>> * aligned to the size > > > >>>> */ > > > >>>> if (!cap_pgsel_inv(iommu->cap) || mask > > > cap_max_amask_val(iommu->cap)) > > > >>>> iommu->flush.flush_iotlb(iommu, did, 0, 0, > > > >>>> DMA_TLB_DSI_FLUSH); > > > >>>> else > > > >>>> iommu->flush.flush_iotlb(iommu, did, addr | ih, mask, > > > >>>> DMA_TLB_PSI_FLUSH); ... > > > >>>I think this is interesting... and I doubt its correctness while > > > >>>with cache mode enabled. > > > >>> > > > >>>If so (sending domain invalidation instead of a big range of page > > > >>>invalidations), how should we capture which pages are unmapped in > > > >>>emulated IOMMU? > > > >>We don't need to track individual pages here, since all pages for > > > >>a specific domain were unmapped I believe? > > > >IMHO this might not be the correct behavior. > > > > > > > >If we receive one domain specific invalidation, I agree that we > > > >should invalidate the IOTLB cache for all the devices inside the domain. > > > >However, when cache mode is on, we should be depending on the PSIs > > > >to unmap each page (unless we want to unmap the whole address > > > >space, in this case it's very possible that the guest is just > > > >unmapping a range, not the entire space). If we convert several > > > >PSIs into one big DSI, IMHO we will leave those pages > > > >mapped/unmapped while we should unmap/map them. > > > > > > Confused, do you have an example for this? (I fail to understand why > > > DSI can't work, at least implementation can convert DSI to several > > > PSIs internally). > > > > That's how I understand it. It might be wrong. Btw, could you > > elaborate a bit on how can we convert a DSI into several PSIs? > > > > Thanks, > > If my understanding above is correct, there is nothing wrong with above > IOMMU driver code - actually it makes sense on bare metal when CM is > disabled. > > But yes, DSI/GLOBAL is far less efficient than PSI when CM is enabled. > We rely on cache invalidations to indirectly capture remapping structure > change. > PSI provides accurate info, while DSI/GLOBAL doesn't. To emulate correct > behavior of DSI/GLOBAL, we have to pretend that the whole address space > (iova=0, size=agaw) needs to be unmapped (for GLOBAL it further means > multiple address spaces) > > Though not efficient, it doesn't mean it's wrong since guest driver follows > spec. > We can ask for linux IOMMU driver change (CC Ashok) to not use above > optimization when cache mode is enabled, but anyway we need emulate correct > DSI/GLOBAL behavior to follow spec, because: > > - even when driver fix is in place, old version still has this logic; > > - there is still scenario where guest IOMMU driver does want to invalidate the > whole address space, e.g. when changing context entry. Asking guest driver to > use PSI for such purpose is another bad thing.
Hi Kevin/Peter/Jason, I agree we should think DSI/GLOBAL. Herby, I guess there may be a chance to ignore DSI/GLOBAL flush if the following assumption is correct. It seems like that all DSI/GLOBAL flush would always be after a context entry invalidation. With this assumption, I remember Peter added memory_replay in context invalidation. This memory_replay would walk guest second-level page table and do map. So the second-level page table in host should be able to get the latest mapping info. Guest IOMMU driver would issue an DSI/GLOBAL flush after changing context. Since the mapping info has updated in host, then there is no need to deal this DSI/GLOBAL flush. So gentlemen, pls help judge if the assumption is correct. If it is correct, then Peter's patch may just work without special process against DSI/GLOBAL flush. Regards, Yi L