On Wed, Jul 13, 2016 at 2:01 PM, Dmitry Shmidt <dimitr...@google.com> wrote: > On Wed, Jul 13, 2016 at 1:52 PM, Paul E. McKenney > <paul...@linux.vnet.ibm.com> wrote: >> On Wed, Jul 13, 2016 at 10:26:57PM +0200, Peter Zijlstra wrote: >>> On Wed, Jul 13, 2016 at 04:18:23PM -0400, Tejun Heo wrote: >>> > Hello, John. >>> > >>> > On Wed, Jul 13, 2016 at 01:13:11PM -0700, John Stultz wrote: >>> > > On Wed, Jul 13, 2016 at 11:33 AM, Tejun Heo <t...@kernel.org> wrote: >>> > > > On Wed, Jul 13, 2016 at 02:21:02PM -0400, Tejun Heo wrote: >>> > > >> One interesting thing to try would be replacing it with a regular >>> > > >> non-percpu rwsem and see how it behaves. That should easily tell us >>> > > >> whether this is from actual contention or artifacts from percpu_rwsem >>> > > >> implementation. >>> > > > >>> > > > So, something like the following. Can you please see whether this >>> > > > makes any difference? >>> > > >>> > > Yea. So this brings it down for me closer to what we're seeing with >>> > > the Dmitry's patch reverting the two problematic commits, usually >>> > > 10-50us with one early spike at 18ms. >>> > >>> > So, it's a percpu rwsem issue then. I haven't really followed the >>> > perpcpu rwsem changes closely. Oleg, are multi-milisec delay expected >>> > on down write expected with the current implementation of >>> > percpu_rwsem? >>> >>> There is a synchronize_sched() in there, so sorta. That thing is heavily >>> geared towards readers, as is the only 'sane' choice for global locks. >> >> Then one diagnostic step to take would be to replace that >> synchronize_sched() with synchronize_sched_expedited(), and see if that >> gets rid of the delays. >> >> Not a particularly real-time-friendly fix, but certainly a good check >> on our various assumptions. > > All delays <200 us, but one that is 3 ms.
Yep. I'm seeing the same here too with Paul's change. thanks -john