On Tue, Aug 09, 2016 at 04:47:38PM -0700, John Stultz wrote: > On Tue, Aug 9, 2016 at 2:51 AM, Peter Zijlstra <pet...@infradead.org> wrote: > > > > Currently the percpu-rwsem switches to (global) atomic ops while a > > writer is waiting; which could be quite a while and slows down > > releasing the readers. > > > > This patch cures this problem by ordering the reader-state vs > > reader-count (see the comments in __percpu_down_read() and > > percpu_down_write()). This changes a global atomic op into a full > > memory barrier, which doesn't have the global cacheline contention. > > > > This also enables using the percpu-rwsem with rcu_sync disabled in order > > to bias the implementation differently, reducing the writer latency by > > adding some cost to readers. > > So this by itself doesn't help us much, but including the following > from Oleg does help quite a bit:
Correct, Oleg was going to send his rcu_sync rework on top of this. But since its holiday season things might be tad delayed.