Hi Robin,

On Mon, Jul 27, 2015 at 07:18:08PM +0100, Robin Murphy wrote:
> Currently, users of the LPAE page table code are (ab)using dma_map_page()
> as a means to flush page table updates for non-coherent IOMMUs. Since
> from the CPU's point of view, creating IOMMU page tables *is* passing
> DMA buffers to a device (the IOMMU's page table walker), there's little
> reason not to use the DMA API correctly.
> 
> Allow drivers to opt into appropriate DMA operations for page table
> allocation and updates by providing the relevant device, and make the
> flush_pgtable() callback optional in case those DMA API operations are
> sufficient. The expectation is that an LPAE IOMMU should have a full view
> of system memory, so use streaming mappings to avoid unnecessary pressure
> on ZONE_DMA, and treat any DMA translation as a warning sign.
> 
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> ---
> 
> Hi all,
> 
> Since Russell fixing Tegra[1] reminded me, I dug this out from, er,
> rather a long time ago[2] and tidied it up. I've tested the SMMUv2
> version with the MMU-401s on Juno (both coherent and non-coherent)
> with no visible regressions; I have the same hope for the SMMUv3 and
> IPMMU changes since they should be semantically identical. At worst
> the Renesas driver might need a larger DMA mask setting as per
> f1d84548694f, but given that there shouldn't be any highmem involved
> I'd think it should be OK as-is.
> 
> Robin.
> 
> [1]:http://article.gmane.org/gmane.linux.ports.tegra/23150
> [2]:http://article.gmane.org/gmane.linux.kernel.iommu/8972
> 
>  drivers/iommu/io-pgtable-arm.c | 107 
> +++++++++++++++++++++++++++++++----------
>  drivers/iommu/io-pgtable.h     |   2 +
>  2 files changed, 84 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 4e46021..b93a60e 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte;
>  
>  static bool selftest_running = false;
>  
> +static dma_addr_t __arm_lpae_dma(struct device *dev, void *pages)
> +{
> +     return phys_to_dma(dev, virt_to_phys(pages));
> +}
> +
> +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> +                                 struct io_pgtable_cfg *cfg, void *cookie)
> +{
> +     void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO);
> +     struct device *dev = cfg->iommu_dev;
> +     dma_addr_t dma;
> +
> +     if (!pages)
> +             return NULL;

Missing newline here.

> +     if (dev) {
> +             dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
> +             if (dma_mapping_error(dev, dma))
> +                     goto out_free;
> +             /*
> +              * We depend on the IOMMU being able to work with any physical
> +              * address directly, so if the DMA layer suggests it can't by
> +              * giving us back some translation, that bodes very badly...
> +              */
> +             if (WARN(dma != __arm_lpae_dma(dev, pages),
> +                      "Cannot accommodate DMA translation for IOMMU page 
> tables\n"))

Now that we have a struct device for the iommu, we could use dev_err to make
this diagnostic more useful.

> +                     goto out_unmap;
> +     }

Missing newline again...

> +     if (cfg->tlb->flush_pgtable)

Why would you have both a dev and a flush callback? In which cases is the
DMA API insufficient?

> +             cfg->tlb->flush_pgtable(pages, size, cookie);

... and here (yeah, pedantry, but consistency keeps this file easier to
read).

> +     return pages;
> +
> +out_unmap:
> +     dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> +out_free:
> +     free_pages_exact(pages, size);
> +     return NULL;
> +}
> +
> +static void __arm_lpae_free_pages(void *pages, size_t size,
> +                               struct io_pgtable_cfg *cfg)
> +{
> +     struct device *dev = cfg->iommu_dev;
> +
> +     if (dev)
> +             dma_unmap_single(dev, __arm_lpae_dma(dev, pages),
> +                              size, DMA_TO_DEVICE);
> +     free_pages_exact(pages, size);
> +}
> +
> +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
> +                            struct io_pgtable_cfg *cfg, void *cookie)
> +{
> +     struct device *dev = cfg->iommu_dev;
> +
> +     *ptep = pte;
> +
> +     if (dev)
> +             dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep),
> +                                        sizeof(pte), DMA_TO_DEVICE);
> +     if (cfg->tlb->flush_pgtable)
> +             cfg->tlb->flush_pgtable(ptep, sizeof(pte), cookie);

Could we kill the flush_pgtable callback completely and just stick in a
dma_wmb() here? Ideally, we've have something like dma_store_release,
which we could use to set the parent page table entry, but that's left
as a future exercise ;)

> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 10e32f6..39fe864 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -41,6 +41,7 @@ struct iommu_gather_ops {
>   * @ias:           Input address (iova) size, in bits.
>   * @oas:           Output address (paddr) size, in bits.
>   * @tlb:           TLB management callbacks for this set of tables.
> + * @iommu_dev:     The owner of the page table memory (for DMA purposes).
>   */
>  struct io_pgtable_cfg {
>       #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> @@ -49,6 +50,7 @@ struct io_pgtable_cfg {
>       unsigned int                    ias;
>       unsigned int                    oas;
>       const struct iommu_gather_ops   *tlb;
> +     struct device                   *iommu_dev;

I think we should also update the comments for iommu_gather_ops once
we decide on the fate of flush_pgtable.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to