Hi Robin, On Mon, Jul 27, 2015 at 07:18:08PM +0100, Robin Murphy wrote: > Currently, users of the LPAE page table code are (ab)using dma_map_page() > as a means to flush page table updates for non-coherent IOMMUs. Since > from the CPU's point of view, creating IOMMU page tables *is* passing > DMA buffers to a device (the IOMMU's page table walker), there's little > reason not to use the DMA API correctly. > > Allow drivers to opt into appropriate DMA operations for page table > allocation and updates by providing the relevant device, and make the > flush_pgtable() callback optional in case those DMA API operations are > sufficient. The expectation is that an LPAE IOMMU should have a full view > of system memory, so use streaming mappings to avoid unnecessary pressure > on ZONE_DMA, and treat any DMA translation as a warning sign. > > Signed-off-by: Robin Murphy <robin.mur...@arm.com> > --- > > Hi all, > > Since Russell fixing Tegra[1] reminded me, I dug this out from, er, > rather a long time ago[2] and tidied it up. I've tested the SMMUv2 > version with the MMU-401s on Juno (both coherent and non-coherent) > with no visible regressions; I have the same hope for the SMMUv3 and > IPMMU changes since they should be semantically identical. At worst > the Renesas driver might need a larger DMA mask setting as per > f1d84548694f, but given that there shouldn't be any highmem involved > I'd think it should be OK as-is. > > Robin. > > [1]:http://article.gmane.org/gmane.linux.ports.tegra/23150 > [2]:http://article.gmane.org/gmane.linux.kernel.iommu/8972 > > drivers/iommu/io-pgtable-arm.c | 107 > +++++++++++++++++++++++++++++++---------- > drivers/iommu/io-pgtable.h | 2 + > 2 files changed, 84 insertions(+), 25 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 4e46021..b93a60e 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte; > > static bool selftest_running = false; > > +static dma_addr_t __arm_lpae_dma(struct device *dev, void *pages) > +{ > + return phys_to_dma(dev, virt_to_phys(pages)); > +} > + > +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, > + struct io_pgtable_cfg *cfg, void *cookie) > +{ > + void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO); > + struct device *dev = cfg->iommu_dev; > + dma_addr_t dma; > + > + if (!pages) > + return NULL;
Missing newline here. > + if (dev) { > + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); > + if (dma_mapping_error(dev, dma)) > + goto out_free; > + /* > + * We depend on the IOMMU being able to work with any physical > + * address directly, so if the DMA layer suggests it can't by > + * giving us back some translation, that bodes very badly... > + */ > + if (WARN(dma != __arm_lpae_dma(dev, pages), > + "Cannot accommodate DMA translation for IOMMU page > tables\n")) Now that we have a struct device for the iommu, we could use dev_err to make this diagnostic more useful. > + goto out_unmap; > + } Missing newline again... > + if (cfg->tlb->flush_pgtable) Why would you have both a dev and a flush callback? In which cases is the DMA API insufficient? > + cfg->tlb->flush_pgtable(pages, size, cookie); ... and here (yeah, pedantry, but consistency keeps this file easier to read). > + return pages; > + > +out_unmap: > + dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); > +out_free: > + free_pages_exact(pages, size); > + return NULL; > +} > + > +static void __arm_lpae_free_pages(void *pages, size_t size, > + struct io_pgtable_cfg *cfg) > +{ > + struct device *dev = cfg->iommu_dev; > + > + if (dev) > + dma_unmap_single(dev, __arm_lpae_dma(dev, pages), > + size, DMA_TO_DEVICE); > + free_pages_exact(pages, size); > +} > + > +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, > + struct io_pgtable_cfg *cfg, void *cookie) > +{ > + struct device *dev = cfg->iommu_dev; > + > + *ptep = pte; > + > + if (dev) > + dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep), > + sizeof(pte), DMA_TO_DEVICE); > + if (cfg->tlb->flush_pgtable) > + cfg->tlb->flush_pgtable(ptep, sizeof(pte), cookie); Could we kill the flush_pgtable callback completely and just stick in a dma_wmb() here? Ideally, we've have something like dma_store_release, which we could use to set the parent page table entry, but that's left as a future exercise ;) > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h > index 10e32f6..39fe864 100644 > --- a/drivers/iommu/io-pgtable.h > +++ b/drivers/iommu/io-pgtable.h > @@ -41,6 +41,7 @@ struct iommu_gather_ops { > * @ias: Input address (iova) size, in bits. > * @oas: Output address (paddr) size, in bits. > * @tlb: TLB management callbacks for this set of tables. > + * @iommu_dev: The owner of the page table memory (for DMA purposes). > */ > struct io_pgtable_cfg { > #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0) /* Set NS bit in PTEs */ > @@ -49,6 +50,7 @@ struct io_pgtable_cfg { > unsigned int ias; > unsigned int oas; > const struct iommu_gather_ops *tlb; > + struct device *iommu_dev; I think we should also update the comments for iommu_gather_ops once we decide on the fate of flush_pgtable. Will _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu