Hi Will, On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote: > I really like the idea of this, but I have a couple of questions and > comments below.
Great, this together with the common iova-flush it should make it possible to solve the performance problems of the dma-iommu code. > > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, > > + phys_addr_t paddr, size_t size, int prot) > > +{ > > + int ret = iommu_map(domain, iova, paddr, size, prot); > > + > > + iommu_tlb_range_add(domain, iova, size); > > + iommu_tlb_sync(domain); > > Many IOMMUs don't need these callbacks on ->map operations, but they won't > be able to distinguish them easily with this API. Could you add a flags > parameter or something to the iommu_tlb_* functions, please? Yeah, this is only needed for virtualized IOMMUs that have a non-present cache. My idea was to let the iommu-drivers tell the common code whether the iommu needs it and the code above just checks a flag and omits the calls to the flush-functions then. Problem currently is how to get this information from 'struct iommu_device' to 'struct iommu_domain'. As a workaround I consider a per-domain flag in the iommu drivers which checks whether any unmap has happened and just do nothing on the flush-call-back if there were none. > I think we will struggle to implement this efficiently on ARM SMMUv3. The > way invalidation works there is that there is a single in-memory command > queue into which we can put TLB invalidation commands (they are inserted > under a lock). These are then processed asynchronously by the hardware, and > you can complete them by inserting a SYNC command and waiting for that to > be consumed by the SMMU. Sounds like a perfect fit, right? Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d. > The problem is that we want to add those invalidation commands as early > as possible, so that they can be processed by the hardware concurrently > with us unmapping other pages. I think that's a bad idea, because then you re-introduce the performance problems again because everyone will spin on the cmd-queue lock in the unmap path of the dma-api. > That means adding the invalidation commands in the ->unmap callback > and not bothering to implement ->iotlb_range_add callback at all. > Then, we will do the sync in ->iotlb_sync. This falls apart if > somebody decides to use iommu_flush_tlb_all(), where we would prefer > not to insert all of the invalidation commands in unmap and instead > insert a single invalidate-all command, followed up with a SYNC. This problem can be solved with the deferred iova flushing code I posted to the ML. When a queue fills up, iommu_flush_tlb_all() is called and every entry that was unmapped before can be released. This works well on x86, are there reasons it wouldn't on ARM? Regards, Joerg _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu