This RFC improves the performance of indirect mapping on all tested DMA usages, based on a mlx5 device, ranging from 64k packages to 1-byte packages, from 1 thread to 64 threads.
In all workloads tested, the performance of indirect mapping gets very near to direct mapping case. The whole thing is designed to have as much perfomance as possible, so the impact of the pagecache is not too big. As I am not very experienced in XArrays usage, nor in lockless algorithms, I would specially appreaciate feedback on possible failures on it's usage, missing barriers, and so on. Also, this size for the FIFO is just for testing purposes. It's also very possible that it will not be a good idea in platforms other than pseries, (i have not tested them). I can plan I bypass for those cases without much work. Thank you! Leonardo Bras (2): dma-direction: Add DMA_DIR_COMPAT() macro to test direction compability powerpc/kernel/iommu: Introduce IOMMU DMA pagecache arch/powerpc/include/asm/iommu-cache.h | 31 ++++ arch/powerpc/include/asm/iommu.h | 4 + arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kernel/iommu-cache.c | 247 +++++++++++++++++++++++++ arch/powerpc/kernel/iommu.c | 15 +- include/linux/dma-direction.h | 3 + 6 files changed, 296 insertions(+), 6 deletions(-) create mode 100644 arch/powerpc/include/asm/iommu-cache.h create mode 100644 arch/powerpc/kernel/iommu-cache.c -- 2.25.4