On Mon, 2015-12-07 at 11:20 +0000, Peter Maydell wrote: > On 7 December 2015 at 10:53, Pavel Fedin <p.fe...@samsung.com> wrote: > >> TAGET_PAGE_ALIGN tells us that it *could* be a valid DMA target though. > >> The VM model is capable of using that as a page size, which means we > >> assume it is and want to generate a fault. > > > > We seem to have looped back. So... > > It is possible to fix this according to this assumption. In this > > case we would need to make TARGET_PAGE_BITS a variable. If we are > > emulating ancient armv5te, it will be set to 10. For modern targets, > > ARMv6 and newer, it will be 12. > > You can't just make TARGET_PAGE_BITS a variable, it is used as a compile > time constant in a bunch of TCG internal stuff. It would be nice > if we didn't require it to be compile time, but it would be a lot of > work to fix (especially if you want to avoid it being a performance > hit). > > In any case, that still doesn't fix the problem. On an AArch64 > target CPU, TARGET_PAGE_BITS still has to be 12 (for a 4K > minimum page size), but the guest and host could still be using > 64K pages. So your VFIO code *must* be able to deal with the > situation where TARGET_PAGE_BITS is smaller than any alignment > that the guest, host or IOMMU need to care about. > > I still think the VFIO code needs to figure out what alignment > it actually cares about and find some way to determine what > that is, or alternatively if the relevant alignment is not > possible to determine, write the code so that it doesn't > need to care. Either way, TARGET_PAGE_ALIGN is not the answer.
Ok, let's work our way down through the relevant page sizes, host, IOMMU, and target. The host page size is relevant because this is the granularity with which the kernel can pin pages. Every IOMMU mapping must be backed by a pinned page in the current model since we don't really have hardware to support IOMMU page faults. The IOMMU page size defines the granularity with which we can map IOVA to physical memory. The IOMMU may support multiple page sizes, but what we're really talking about here is the minimum page size. The target page size is relevant because this defines the minimum possible page size used within the VM. We presume that anything less than TARGET_PAGE_ALIGN cannot be referenced as a page by the VM CPU and therefore is probably not allocated as a DMA buffer for a driver running within the guest. An implementation detail here is that the vfio type1 IOMMU model currently exposes the host page size as the minimum IOMMU page size. The reason for this is to simplify page accounting, if we don't allow sub-host page mappings we don't need per page reference counting. This can be fixed within the current API, but kernel changes are required or else locked page requirements due to over-counting become a problem. The benefit though is that this abstracts the host page size from QEMU. So let's take the easy scenario first, if target page size is greater than or equal to the minimum IOMMU page size, we're golden. We can map anything that could be a target DMA buffer. This leads to the current situation that we simply ignore any ranges which disappear when we align to the target page size. It can't be a DMA buffer, ignore it. Note that the 64k host, 4k target problem goes away if type1 accounting is fixed to allow IOMMU granularity mapping, since I think in the cases we care about the IOMMU still supports 4k pages, otherwise... Then we come to the scenario here, where target page size is less than the minimum IOMMU page size. The current code is intentionally trying to trigger the vfio type1 error that this cannot be mapped. To resolve this, QEMU needs to decide if it's ok to provide the device with DMA access to everything on that IOMMU granularity page, ensure that aliases mapping the same IOMMU page are consistent and handle the reference counting for those sub-mappings to avoid duplicate mappings and premature unmaps. So I think in the end, the one page size we care about is the minimum IOMMU granularity. We don't really care about the target page size at all and maybe we only care about the host page size for determining what might share a page with a sub-page mapping. However, there's work to get there (QEMU, kernel, or both depending on the specific config) and the target page size trick has so far been a useful simplification. Thanks, Alex