On Wed, 2013-09-25 at 21:11 +0100, David Woodhouse wrote: > On Wed, 2013-09-25 at 13:44 -0600, Alex Williamson wrote: > > On Wed, 2013-09-25 at 19:52 +0100, David Woodhouse wrote: > > > On Wed, 2013-09-25 at 11:36 -0600, Alex Williamson wrote: > > > > On Wed, 2013-09-25 at 17:05 +0100, David Woodhouse wrote: > > > > > Why would it ever care? If it *happens* to map something that can use > > > > > large pages, yay!. If it subsequently breaks apart those large pages > > > > > by > > > > > unmapping 4KiB in the middle, let the IOMMU driver break that apart. > > > > > > > > Can this be done atomically? I thought part of the reason for this > > > > interface was that iommu drivers typically couldn't replace a huge page > > > > with multiple smaller pages in the presence of DMA. > > > > > > For the Intel IOMMU it can. You can atomically change from a large page > > > entry, to a pointer to a full set of smaller page tables. Do the IOTLB > > > flush, and at no time is there an interruption in service. > > > > Cool > > > > > Not sure if this is true for *all* IOMMU hardware; I'd be perfectly > > > happy to accept a variant of Jörg's proposal that we should only ever > > > unmap exactly the same range that we mapped. Except we should allow the > > > unmapping of adjacent regions together; just not a partial unmap of > > > something that was mapped in one go. > > > > Well, except if we've just trusted the IOMMU driver to add a device > > behind a non-SP capable IOMMU to our domain and convert the page tables, > > that partial unmap is no longer partial and now we get different > > behavior than before so we can't depend on that adjacent unmapping. > > Que? > > Jörg's proposal was that if you add a mapping at a given address+size, > you should always remove *exactly* that address+size. Which will always > work exactly the same, regardless of superpages. > > My slight change to that was that if you also added an *adjacent* > mapping at address2+size2, you should be able to unmap both at the same > time. Which will *also* always work the same regardless of superpages. > > Even if your two mappings were also *physically* contiguous, and *could* > have used superpages, they probably won't anyway because you mapped them > in two parts.
Ok, sounds reasonable. I somehow read it to still include some aspect of the "fill in the size" API we have now. > > > > What happens if my IOMMU domain makes use of super pages and > > > > then I add a new device behind a new IOMMU without hardware super page > > > > support? > > > > > > Currently, you end up with the domain happily including superpages, and > > > the less capable IOMMU that you added later won't cope. > > > > This is the trouble with trusting the iommu driver. ;) > > Sorry, I should have made it clearer that this is a *bug*. It's not by > design. The IOMMU driver ought to get this right, and will do. > > > > What we probably > > > *ought* to do is walk the page tables and convert any pre-existing > > > superpages to small pages, at the time we add the non-SP-capable IOMMU. > > > > And then we need to figure out how to handle that in the proposed > > interface changes above since it changes the unmap behavior to the naive > > user. > > Isn't that what you'd *expect*? Surely you don't *expect* the breakage > you currently get? For vfio I currently make the same statement that you're pushing for the IOMMU API; I only guarantee unmapping at the same granularity as the original mapping. So that seems fine to me. > > There's also the question of whether the IOMMU driver should > > re-evaluate super pages when the less capable IOMMU is removed from the > > domain. > > I wouldn't bother to go looking for opportunities to use super pages if > we remove the last non-SP-capable IOMMU from the domain. I predict bugs getting filed if a guest sees a performance hit after adding a device that is not restored when the device is removed. If only we could assume a similar feature set among IOMMUs in a system. > > > FWIW we currently screw up the handling of cache-coherent vs. > > > non-coherent page tables too. That one wants a wbinvd somewhere when we > > > add a non-coherent IOMMU to the domain. > > > > You're not selling the "trust the IOMMU driver" story very well here. > > Can we assume that the IOMMU_CACHE flag (SNP) is ignored appropriately > > by non-coherent IOMMUs? Is there any downside to ignoring it and always > > setting SNP in the IOMMU page tables? AMD IOMMU ignores it, but it's > > also always cache coherent. Thanks, > > SNP is a separate issue. I'm speaking of cache coherency of the hardware > page table walk — the feature bit that all the horrid clflush calls are > predicated on. > > Again, this is just a bug. We *should* be getting this right, but don't > yet. And for DMA_PTE_SNP? intel_iommu_map() won't let us set this bit if the domain contains a hardware unit that doesn't support ecap.SC, but it also doesn't update existing mappings. Barring hardware bugs, it seems much easier to unconditionally set DMA_PTE_SNP but still advertise IOMMU_CAP_CACHE_COHERENCY based on the composition of the domain. Otherwise we have to reject adding devices to a domain that change the coherency once DMA mappings are in play or never set IOMMU_CACHE and advertise to KVM that the domain is always non-coherent. Thanks, Alex _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu