On Wed, Jul 15, 2015 at 09:36:01PM +0200, Oleg Nesterov wrote: > On 07/15, Paul E. McKenney wrote: > > > > On Sun, Jul 12, 2015 at 01:35:35AM +0200, Oleg Nesterov wrote: > > > > > Let's start with the simple test-case, > > > > > > #!/bin/bash > > > > > > perf probe -x /lib/libc.so.6 syscall > > > > > > for i in {1..1000}; do > > > echo 1 >| > > > /sys/kernel/debug/tracing/events/probe_libc/syscall/enable > > > echo 0 >| > > > /sys/kernel/debug/tracing/events/probe_libc/syscall/enable > > > done > > > > > > It needs ~ 13.5 seconds (2 CPUs, KVM). If we simply replace > > > synchronize_sched_expedited() with synchronize_sched() it takes > > > ~ 67.5 seconds. This is not good. > > > > Yep, even if you avoided the write-release grace period, you would > > still be looking at something like 40 seconds, which is 3x. Some > > might consider that to be a performance regression. ;-) > > Yes ;) > > > > And just in case, I also measured > > > > > > for (i = 0; i < 1000000; ++i) { > > > percpu_down_write(&dup_mmap_sem); > > > percpu_up_write(&dup_mmap_sem); > > > } > > > > > > and it runs more than 1.5 times faster (to remind, only 2 CPUs), > > > but this is not that interesting, I agree. > > > > Your trick avoiding the grace periods during a writer-to-writer handoff > > are cute, and they are helping a lot here. > > Yes. And even the fact that a single writer doesn't need to sleep in > percpu_up_write()->synchronize_sched() looks good imo. > > Yes, yes, we can remove it if we penalize the readers, but I'd like to > avoid this. > > > Concurrent readers would > > have a tough time of it with this workload, though. They would all > > be serialized. > > Sure. in this case it is not better than the normal rw_semaphore. Worse > actually. > > > > And note that the actual change in percpu-rwsem is really simple, > > > and imo it even makes the code simpler. (the last patch is off- > > > topic cleanup). > > > > > > So the only complication is rcu_sync itself. But, rightly or not (I > > > am obviously biased), I believe this new rcu infrastructure is natural > > > and useful, and I think it can have more users too. > > > > I don't have an objection to it, even in its current form (I did > > review it long ago), but it does need to have a user! > > Do you mean you need another user except percpu_rw_semaphore? I do > not see any right now...
Not asking for more than one use, but it does need a use. I believe that percpu_rw_semaphore suffices. > Let me remind about sb_writers again. It actually has 3 rw_sem's > and I am trying to turn then into percpu_rw_semaphore's. > > In this case freeze_super() will need 6 synchronize_sched_expedited(). > This just looks ugly. But if we have rcu_sync primitives, all 3 sem's > in struct super_block can share the same "struct rcu_sync", and > freeze_super() will need only once synchronize_sched(). Makes sense. > > > And. We can do more improvements in rcu_sync and percpu-rwsem, and > > > I don't only mean other optimizations from Peter. In particular, we > > > can extract the "wait for gp pass" from rcu_sync_enter() into another > > > helper, we can teach percpu_down_write() to allow multiple writers, > > > and more. > > > > As in a percpu_down_write() that allows up to (say) five concurrent > > write-holders? > > Yes. Because this is what uprobes (and probably cgroups) actually needs. > It does not need the global lock. Just it needs to exclude the "readers" > (dup_mmap) globally. > > And in fact the very first version I sent worked this way. Then I removed > this because a) this was a bit "unusual" for reviewers ;) and b) because > I raced with another commit which has already added the initial (and > sub-optimal) version of percpu_rw_semaphore. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/