Re: [PATCH] On unmap, flush IOMMU TLB and return correct size

Alex Williamson Wed, 25 Sep 2013 14:54:24 -0700

On Wed, 2013-09-25 at 21:11 +0100, David Woodhouse wrote:
> On Wed, 2013-09-25 at 13:44 -0600, Alex Williamson wrote:
> > On Wed, 2013-09-25 at 19:52 +0100, David Woodhouse wrote:
> > > On Wed, 2013-09-25 at 11:36 -0600, Alex Williamson wrote:
> > > > On Wed, 2013-09-25 at 17:05 +0100, David Woodhouse wrote:
> > > > > Why would it ever care? If it *happens* to map something that can use
> > > > > large pages, yay!. If it subsequently breaks apart those large pages 
> > > > > by
> > > > > unmapping 4KiB in the middle, let the IOMMU driver break that apart.
> > > > 
> > > > Can this be done atomically?  I thought part of the reason for this
> > > > interface was that iommu drivers typically couldn't replace a huge page
> > > > with multiple smaller pages in the presence of DMA.
> > > 
> > > For the Intel IOMMU it can. You can atomically change from a large page
> > > entry, to a pointer to a full set of smaller page tables. Do the IOTLB
> > > flush, and at no time is there an interruption in service.
> > 
> > Cool
> > 
> > > Not sure if this is true for *all* IOMMU hardware; I'd be perfectly
> > > happy to accept a variant of Jörg's proposal that we should only ever
> > > unmap exactly the same range that we mapped. Except we should allow the
> > > unmapping of adjacent regions together; just not a partial unmap of
> > > something that was mapped in one go.
> > 
> > Well, except if we've just trusted the IOMMU driver to add a device
> > behind a non-SP capable IOMMU to our domain and convert the page tables,
> > that partial unmap is no longer partial and now we get different
> > behavior than before so we can't depend on that adjacent unmapping.
> 
> Que?
> 
> Jörg's proposal was that if you add a mapping at a given address+size,
> you should always remove *exactly* that address+size. Which will always
> work exactly the same, regardless of superpages.
> 
> My slight change to that was that if you also added an *adjacent*
> mapping at address2+size2, you should be able to unmap both at the same
> time. Which will *also* always work the same regardless of superpages.
> 
> Even if your two mappings were also *physically* contiguous, and *could*
> have used superpages, they probably won't anyway because you mapped them
> in two parts.


Ok, sounds reasonable.  I somehow read it to still include some aspect
of the "fill in the size" API we have now.

> > > >  What happens if my IOMMU domain makes use of super pages and
> > > > then I add a new device behind a new IOMMU without hardware super page
> > > > support? 
> > > 
> > > Currently, you end up with the domain happily including superpages, and
> > > the less capable IOMMU that you added later won't cope.
> > 
> > This is the trouble with trusting the iommu driver. ;)
> 
> Sorry, I should have made it clearer that this is a *bug*. It's not by
> design. The IOMMU driver ought to get this right, and will do.
> 
> > >  What we probably
> > > *ought* to do is walk the page tables and convert any pre-existing
> > > superpages to small pages, at the time we add the non-SP-capable IOMMU.
> > 
> > And then we need to figure out how to handle that in the proposed
> > interface changes above since it changes the unmap behavior to the naive
> > user. 
> 
> Isn't that what you'd *expect*? Surely you don't *expect* the breakage
> you currently get?

For vfio I currently make the same statement that you're pushing for the
IOMMU API; I only guarantee unmapping at the same granularity as the
original mapping.  So that seems fine to me.

> >  There's also the question of whether the IOMMU driver should
> > re-evaluate super pages when the less capable IOMMU is removed from the
> > domain.
> 
> I wouldn't bother to go looking for opportunities to use super pages if
> we remove the last non-SP-capable IOMMU from the domain.

I predict bugs getting filed if a guest sees a performance hit after
adding a device that is not restored when the device is removed.  If
only we could assume a similar feature set among IOMMUs in a system.

> > > FWIW we currently screw up the handling of cache-coherent vs.
> > > non-coherent page tables too. That one wants a wbinvd somewhere when we
> > > add a non-coherent IOMMU to the domain.
> > 
> > You're not selling the "trust the IOMMU driver" story very well here.
> > Can we assume that the IOMMU_CACHE flag (SNP) is ignored appropriately
> > by non-coherent IOMMUs?  Is there any downside to ignoring it and always
> > setting SNP in the IOMMU page tables?  AMD IOMMU ignores it, but it's
> > also always cache coherent.  Thanks,
> 
> SNP is a separate issue. I'm speaking of cache coherency of the hardware
> page table walk — the feature bit that all the horrid clflush calls are
> predicated on.
> 
> Again, this is just a bug. We *should* be getting this right, but don't
> yet.

And for DMA_PTE_SNP?  intel_iommu_map() won't let us set this bit if the
domain contains a hardware unit that doesn't support ecap.SC, but it
also doesn't update existing mappings.  Barring hardware bugs, it seems
much easier to unconditionally set DMA_PTE_SNP but still advertise
IOMMU_CAP_CACHE_COHERENCY based on the composition of the domain.
Otherwise we have to reject adding devices to a domain that change the
coherency once DMA mappings are in play or never set IOMMU_CACHE and
advertise to KVM that the domain is always non-coherent.  Thanks,

Alex




_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] On unmap, flush IOMMU TLB and return correct size

Reply via email to