On Fri, Jan 23, 2026 at 01:59:04PM -0800, Matthew Brost wrote: > The dma-map IOVA alloc, link, and sync APIs perform significantly better > than dma-map / dma-unmap, as they avoid costly IOMMU synchronizations. > This difference is especially noticeable when mapping a 2MB region in > 4KB pages. > > Use dma-map IOVA alloc, link, and sync APIs for GPU SVM and DRM page, > which mappings between the CPU and GPU. > > Initial results are promising. > > Baseline CPU time during 2M / 64K fault with a migration: > Average migrate 2M cpu time (us, percentage): 552.36049107142857142857, > .71943789893868318799 > Average migrate 64K cpu time (us, percentage): 24.97767857142857142857, > .34789908128526791960 > > After this series CPU time during 2M / 64K fault with a migration: > Average migrate 2M cpu time (us, percentage): 224.81808035714285714286, > .51412827364772602557 > Average migrate 64K cpu time (us, percentage): 14.65625000000000000000, > .25659463050529524405
Thats a 2x improvement in overall full operation? Wow! Did you look at how non-iommu cases perform too? I think we can do better still for the non-cached platforms as I have a way in mind to batch up lines and flush the line instead of flushing for every 8 byte IOPTE written. Some ARM folks have been talking about this problem too.. Jason
