On Fri, Aug 21, 2020 at 02:26:15PM +1200, Barry Song wrote:
> Right now, smmu is using dma_alloc_coherent() to get memory to save queues
> and tables. Typically, on ARM64 server, there is a default CMA located at
> node0, which could be far away from node2, node3 etc.
> with this patch, smmu will get memory from local numa node to save command
> queues and page tables. that means dma_unmap latency will be shrunk much.
> Meanwhile, when iommu.passthrough is on, device drivers which call dma_
> alloc_coherent() will also get local memory and avoid the travel between
> numa nodes.
> 
> Cc: Christoph Hellwig <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Ganapatrao Kulkarni <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Nicolas Saenz Julienne <[email protected]>
> Cc: Steve Capper <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Signed-off-by: Barry Song <[email protected]>
> ---
>  -v6: rebase on top of 5.9-rc1
> 
>  arch/arm64/mm/init.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 481d22c32a2e..f1c75957ff3c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -429,6 +429,8 @@ void __init bootmem_init(void)
>       arm64_hugetlb_cma_reserve();
>  #endif
>  
> +     dma_pernuma_cma_reserve();

I think will have to do for now, but I still wish that more of this was
driven from the core code so that we don't have to worry about
initialisation order and whether things are early/late enough on a per-arch
basis.

Acked-by: Will Deacon <[email protected]>

Will

Reply via email to