On Fri, Aug 21, 2020 at 09:54:20PM +0800, John Garry wrote: > As mentioned in [0], the CPU may consume many cycles processing > arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to > get space on the queue takes a lot of time once we start getting many > CPUs contending - from experiment, for 64 CPUs contending the cmdq, > success rate is ~ 1 in 12, which is poor, but not totally awful. > > This series removes that cmpxchg() and replaces with an atomic_add, > same as how the actual cmdq deals with maintaining the prod pointer.
I'm still not a fan of this. Could you try to adapt the hacks I sent before, please? I know they weren't quite right (I have no hardware to test on), but the basic idea is to fall back to a spinlock if the cmpxchg() fails. The queueing in the spinlock implementation should avoid the contention. Thanks, Will

