Hi Will,
On Thu, Jan 02, 2020 at 05:44:39PM +0000, John Garry wrote:
And for the overall system, we have:
PerfTop: 85864 irqs/sec kernel:89.6% exact: 0.0% lost: 0/34434 drop:
0/40116 [4000Hz cycles], (all, 96 CPUs)
--------------------------------------------------------------------------------------------------------------------------
27.43% [kernel] [k] arm_smmu_cmdq_issue_cmdlist
11.71% [kernel] [k] _raw_spin_unlock_irqrestore
6.35% [kernel] [k] _raw_spin_unlock_irq
2.65% [kernel] [k] get_user_pages_fast
2.03% [kernel] [k] __slab_free
1.55% [kernel] [k] tick_nohz_idle_exit
1.47% [kernel] [k] arm_lpae_map
1.39% [kernel] [k] __fget
1.14% [kernel] [k] __lock_text_start
1.09% [kernel] [k] _raw_spin_lock
1.08% [kernel] [k] bio_release_pages.part.42
1.03% [kernel] [k] __sbitmap_get_word
0.97% [kernel] [k] arm_smmu_atc_inv_domain.constprop.42
0.91% [kernel] [k] fput_many
0.88% [kernel] [k] __arm_lpae_map
One thing to note is that we still spend an appreciable amount of time in
arm_smmu_atc_inv_domain(), which is disappointing when considering it should
effectively be a noop.
As for arm_smmu_cmdq_issue_cmdlist(), I do note that during the testing our
batch size is 1, so we're not seeing the real benefit of the batching. I
can't help but think that we could improve this code to try to combine CMD
SYNCs for small batches.
Anyway, let me know your thoughts or any questions. I'll have a look if a
get a chance for other possible bottlenecks.
Did you ever get any more information on this? I don't have any SMMUv3
hardware any more, so I can't really dig into this myself.
I'm only getting back to look at this now, as SMMU performance is a bit
of a hot topic again for us.
So one thing we are doing which looks to help performance is this series
from Marc:
https://lore.kernel.org/lkml/9171c554-50d2-142b-96ae-1357952fc...@huawei.com/T/#mee5562d1efd6aaeb8d2682bdb6807fe7b5d7f56d
So that is just spreading the per-CPU load for NVMe interrupt handling
(where the DMA unmapping is happening), so I'd say just side-stepping
any SMMU issue really.
Going back to the SMMU, I wanted to run epbf and perf annotate to help
profile this, but was having no luck getting them to work properly. I'll
look at this again now.
Cheers,
John
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu