On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote: > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > > > If you have not already done so, please run this with debug > > > enabled, > > > especially CONFIG_PROVE_LOCKING=y (which implies > > > CONFIG_PROVE_RCU=y). > > > This is important because there are configurations for which the > > > deadlocks you saw with SRCU turn into silent failure, including > > > memory corruption. > > > CONFIG_PROVE_RCU=y will catch many of those situations. > > > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > > enabled in the builds where we saw the problem, but I'm not sure. > > CONFIG_PROVE_RCU=y will reliably catch things like this: > > 1. rcu_read_lock(); > synchronize_rcu(); > rcu_read_unlock();
Ok, that's not something that happens here either. > 2. rcu_read_lock(); > schedule_timeout_interruptible(HZ); > rcu_read_unlock(); Neither is this happening. > There are more, but this should get you the flavor of the types > of bugs CONFIG_PROVE_RCU=y can locate for you. Makes sense. However, the issue at hand is what we (you and I) discussed earlier wrt. lockdep -- from SRCU's point of view everything is actually OK, except that the one thread is waiting for something and we can never finish the grace period, and thus synchronize_srcu() will never return. But there's no real SRCU bug here. > > Nicolai probably never even ran into this problem, though it should > > be easy to reproduce. > > I am just worried that the situation resulting in the earlier SRCU > deadlocks might be hiding behind CONFIG_PROVE_RCU=n, > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug > hiding behind some other set of Kconfig options. There's no SRCU deadlock though. I know exactly why it happens, in my case, which is the following: Thread 1 userspace: read(debugfs_file_1) srcu_read_lock(&debugfs_srcu); // in debugfs bowels wait_event_interruptible(...); // in my driver's debugfs read method Thread 2: debugfs_remove(debugfs_file_2); srcu_synchronize(&debugfs_srcu); // in debugfs bowels This is the live-lock. The deadlock is something I posited but never ran into: CPU 1 CPU 2 srcu_read_lock(&debugfs_srcu); rtnl_lock(); rtnl_lock(); srcu_synchronize(&debugfs_srcu); Again, no (S)RCU abuse here, just an ABBA deadlock. johannes