On Fri, Jan 23, 2026 at 01:59:04PM -0800, Matthew Brost wrote:
> The dma-map IOVA alloc, link, and sync APIs perform significantly better
> than dma-map / dma-unmap, as they avoid costly IOMMU synchronizations.
> This difference is especially noticeable when mapping a 2MB region in
> 4KB pages.
> 
> Use dma-map IOVA alloc, link, and sync APIs for GPU SVM and DRM page,
> which mappings between the CPU and GPU.
> 
> Initial results are promising.
> 
> Baseline CPU time during 2M / 64K fault with a migration:
> Average migrate 2M cpu time (us, percentage): 552.36049107142857142857, 
> .71943789893868318799
> Average migrate 64K cpu time (us, percentage): 24.97767857142857142857, 
> .34789908128526791960
> 
> After this series CPU time during 2M / 64K fault with a migration:
> Average migrate 2M cpu time (us, percentage): 224.81808035714285714286, 
> .51412827364772602557
> Average migrate 64K cpu time (us, percentage): 14.65625000000000000000, 
> .25659463050529524405

Thats a 2x improvement in overall full operation? Wow!

Did you look at how non-iommu cases perform too?

I think we can do better still for the non-cached platforms as I have
a way in mind to batch up lines and flush the line instead of flushing
for every 8 byte IOPTE written. Some ARM folks have been talking about
this problem too..

Jason

Reply via email to