On Fri, Aug 21, 2020 at 02:26:15PM +1200, Barry Song wrote: > Right now, smmu is using dma_alloc_coherent() to get memory to save queues > and tables. Typically, on ARM64 server, there is a default CMA located at > node0, which could be far away from node2, node3 etc. > with this patch, smmu will get memory from local numa node to save command > queues and page tables. that means dma_unmap latency will be shrunk much. > Meanwhile, when iommu.passthrough is on, device drivers which call dma_ > alloc_coherent() will also get local memory and avoid the travel between > numa nodes. > > Cc: Christoph Hellwig <[email protected]> > Cc: Marek Szyprowski <[email protected]> > Cc: Will Deacon <[email protected]> > Cc: Robin Murphy <[email protected]> > Cc: Ganapatrao Kulkarni <[email protected]> > Cc: Catalin Marinas <[email protected]> > Cc: Nicolas Saenz Julienne <[email protected]> > Cc: Steve Capper <[email protected]> > Cc: Andrew Morton <[email protected]> > Cc: Mike Rapoport <[email protected]> > Signed-off-by: Barry Song <[email protected]> > --- > -v6: rebase on top of 5.9-rc1 > > arch/arm64/mm/init.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 481d22c32a2e..f1c75957ff3c 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -429,6 +429,8 @@ void __init bootmem_init(void) > arm64_hugetlb_cma_reserve(); > #endif > > + dma_pernuma_cma_reserve();
I think will have to do for now, but I still wish that more of this was driven from the core code so that we don't have to worry about initialisation order and whether things are early/late enough on a per-arch basis. Acked-by: Will Deacon <[email protected]> Will

