FWIW I believe is is still on the plan for someone here to dust off
the PMU pNMI patches at some point.
Cool. Well I can try to experiment with what Julien had at v4 for now.
JFYI, I have done some more perf record capturing, and updated the
"annotate" and "report" output here
https://raw.githubusercontent.com/hisilicon/kernel-dev/679eca1008b1d11b42e1b5fa8a205266c240d1e1/ann.txt
and .../report
This capture is just for cpu0, since NVMe irq handling+dma unmapping
will occur on specific CPUs, cpu0 being one of them.
The reports look somewhat sane. So we no longer have ~99% of time
attributed to re-enabling interrupts, now that's like:
3.14 : ffff80001071eae0: ldr w0, [x29, #108]
: int ret = 0;
0.00 : ffff80001071eae4: mov w24, #0x0
// #0
: if (sync) {
0.00 : ffff80001071eae8: cbnz w0, ffff80001071eb44
<arm_smmu_cmdq_issue_cmdlist+0x44c>
: arch_local_irq_restore():
: asm volatile(ALTERNATIVE(
0.00 : ffff80001071eaec: msr daif, x21
: arch_static_branch():
0.25 : ffff80001071eaf0: nop
: arm_smmu_cmdq_issue_cmdlist():
: }
: }
:
: local_irq_restore(flags);
: return ret;
: }
One observation (if these reports are to be believed) is that we may
spend a lot of time in the CAS loop, trying to get a place in the queue
initially:
: __CMPXCHG_CASE(w, , , 32, )
: __CMPXCHG_CASE(x, , , 64, )
0.00 : ffff80001071e828: mov x0, x27
0.00 : ffff80001071e82c: mov x4, x1
0.00 : ffff80001071e830: cas x4, x2, [x27]
28.61 : ffff80001071e834: mov x0, x4
: arm_smmu_cmdq_issue_cmdlist():
: if (old == llq.val)
0.00 : ffff80001071e838: ldr x1, [x23]
John
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu