Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, Apr 18, 2017 at 05:20:38PM +0200, Johannes Berg wrote: > On Tue, 2017-04-18 at 08:17 -0700, Paul E. McKenney wrote: > > > > Again, no (S)RCU abuse here, just an ABBA deadlock. > > > > OK, please accept my apologies for failing to follow the thread. > > No worries - just wanted to clarify this in case I was missing > something. > > > I nevertheless reiterate my advice to run at least some tests with > > CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep > > to find the above theoretical deadlock. > > Right. It won't matter for debugfs after this patchset, but it would > indeed be nice - I failed in my attempt though :) Hey, I was hoping! ;-) Thanx, Paul
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, Apr 18, 2017 at 05:20:38PM +0200, Johannes Berg wrote: > On Tue, 2017-04-18 at 08:17 -0700, Paul E. McKenney wrote: > > > > Again, no (S)RCU abuse here, just an ABBA deadlock. > > > > OK, please accept my apologies for failing to follow the thread. > > No worries - just wanted to clarify this in case I was missing > something. > > > I nevertheless reiterate my advice to run at least some tests with > > CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep > > to find the above theoretical deadlock. > > Right. It won't matter for debugfs after this patchset, but it would > indeed be nice - I failed in my attempt though :) Hey, I was hoping! ;-) Thanx, Paul
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, 2017-04-18 at 08:17 -0700, Paul E. McKenney wrote: > > Again, no (S)RCU abuse here, just an ABBA deadlock. > > OK, please accept my apologies for failing to follow the thread. No worries - just wanted to clarify this in case I was missing something. > I nevertheless reiterate my advice to run at least some tests with > CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep > to find the above theoretical deadlock. Right. It won't matter for debugfs after this patchset, but it would indeed be nice - I failed in my attempt though :) johannes
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, 2017-04-18 at 08:17 -0700, Paul E. McKenney wrote: > > Again, no (S)RCU abuse here, just an ABBA deadlock. > > OK, please accept my apologies for failing to follow the thread. No worries - just wanted to clarify this in case I was missing something. > I nevertheless reiterate my advice to run at least some tests with > CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep > to find the above theoretical deadlock. Right. It won't matter for debugfs after this patchset, but it would indeed be nice - I failed in my attempt though :) johannes
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, Apr 18, 2017 at 03:40:32PM +0200, Johannes Berg wrote: > On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote: > > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > > > > > If you have not already done so, please run this with debug > > > > enabled, > > > > especially CONFIG_PROVE_LOCKING=y (which implies > > > > CONFIG_PROVE_RCU=y). > > > > This is important because there are configurations for which the > > > > deadlocks you saw with SRCU turn into silent failure, including > > > > memory corruption. > > > > CONFIG_PROVE_RCU=y will catch many of those situations. > > > > > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > > > enabled in the builds where we saw the problem, but I'm not sure. > > > > CONFIG_PROVE_RCU=y will reliably catch things like this: > > > > 1. rcu_read_lock(); > > synchronize_rcu(); > > rcu_read_unlock(); > > Ok, that's not something that happens here either. > > > 2. rcu_read_lock(); > > schedule_timeout_interruptible(HZ); > > rcu_read_unlock(); > > Neither is this happening. > > > There are more, but this should get you the flavor of the types > > of bugs CONFIG_PROVE_RCU=y can locate for you. > > Makes sense. However, the issue at hand is what we (you and I) > discussed earlier wrt. lockdep -- from SRCU's point of view everything > is actually OK, except that the one thread is waiting for something and > we can never finish the grace period, and thus synchronize_srcu() will > never return. But there's no real SRCU bug here. > > > > Nicolai probably never even ran into this problem, though it should > > > be easy to reproduce. > > > > I am just worried that the situation resulting in the earlier SRCU > > deadlocks might be hiding behind CONFIG_PROVE_RCU=n, > > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug > > hiding behind some other set of Kconfig options. > > There's no SRCU deadlock though. I know exactly why it happens, in my > case, which is the following: > > Thread 1 > userspace: read(debugfs_file_1) > srcu_read_lock(_srcu); // in debugfs bowels > wait_event_interruptible(...); // in my driver's debugfs read method > > Thread 2: > debugfs_remove(debugfs_file_2); > srcu_synchronize(_srcu); // in debugfs bowels > > > This is the live-lock. The deadlock is something I posited but never > ran into: > > CPU 1 CPU 2 > srcu_read_lock(_srcu); > rtnl_lock(); > rtnl_lock(); > srcu_synchronize(_srcu); > > Again, no (S)RCU abuse here, just an ABBA deadlock. OK, please accept my apologies for failing to follow the thread. I nevertheless reiterate my advice to run at least some tests with CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep to find the above theoretical deadlock. Thanx, Paul
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, Apr 18, 2017 at 03:40:32PM +0200, Johannes Berg wrote: > On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote: > > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > > > > > If you have not already done so, please run this with debug > > > > enabled, > > > > especially CONFIG_PROVE_LOCKING=y (which implies > > > > CONFIG_PROVE_RCU=y). > > > > This is important because there are configurations for which the > > > > deadlocks you saw with SRCU turn into silent failure, including > > > > memory corruption. > > > > CONFIG_PROVE_RCU=y will catch many of those situations. > > > > > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > > > enabled in the builds where we saw the problem, but I'm not sure. > > > > CONFIG_PROVE_RCU=y will reliably catch things like this: > > > > 1. rcu_read_lock(); > > synchronize_rcu(); > > rcu_read_unlock(); > > Ok, that's not something that happens here either. > > > 2. rcu_read_lock(); > > schedule_timeout_interruptible(HZ); > > rcu_read_unlock(); > > Neither is this happening. > > > There are more, but this should get you the flavor of the types > > of bugs CONFIG_PROVE_RCU=y can locate for you. > > Makes sense. However, the issue at hand is what we (you and I) > discussed earlier wrt. lockdep -- from SRCU's point of view everything > is actually OK, except that the one thread is waiting for something and > we can never finish the grace period, and thus synchronize_srcu() will > never return. But there's no real SRCU bug here. > > > > Nicolai probably never even ran into this problem, though it should > > > be easy to reproduce. > > > > I am just worried that the situation resulting in the earlier SRCU > > deadlocks might be hiding behind CONFIG_PROVE_RCU=n, > > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug > > hiding behind some other set of Kconfig options. > > There's no SRCU deadlock though. I know exactly why it happens, in my > case, which is the following: > > Thread 1 > userspace: read(debugfs_file_1) > srcu_read_lock(_srcu); // in debugfs bowels > wait_event_interruptible(...); // in my driver's debugfs read method > > Thread 2: > debugfs_remove(debugfs_file_2); > srcu_synchronize(_srcu); // in debugfs bowels > > > This is the live-lock. The deadlock is something I posited but never > ran into: > > CPU 1 CPU 2 > srcu_read_lock(_srcu); > rtnl_lock(); > rtnl_lock(); > srcu_synchronize(_srcu); > > Again, no (S)RCU abuse here, just an ABBA deadlock. OK, please accept my apologies for failing to follow the thread. I nevertheless reiterate my advice to run at least some tests with CONFIG_PROVE_RCU=y. And yes, it would be good to upgrade lockdep to find the above theoretical deadlock. Thanx, Paul
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote: > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > > > If you have not already done so, please run this with debug > > > enabled, > > > especially CONFIG_PROVE_LOCKING=y (which implies > > > CONFIG_PROVE_RCU=y). > > > This is important because there are configurations for which the > > > deadlocks you saw with SRCU turn into silent failure, including > > > memory corruption. > > > CONFIG_PROVE_RCU=y will catch many of those situations. > > > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > > enabled in the builds where we saw the problem, but I'm not sure. > > CONFIG_PROVE_RCU=y will reliably catch things like this: > > 1.rcu_read_lock(); > synchronize_rcu(); > rcu_read_unlock(); Ok, that's not something that happens here either. > 2.rcu_read_lock(); > schedule_timeout_interruptible(HZ); > rcu_read_unlock(); Neither is this happening. > There are more, but this should get you the flavor of the types > of bugs CONFIG_PROVE_RCU=y can locate for you. Makes sense. However, the issue at hand is what we (you and I) discussed earlier wrt. lockdep -- from SRCU's point of view everything is actually OK, except that the one thread is waiting for something and we can never finish the grace period, and thus synchronize_srcu() will never return. But there's no real SRCU bug here. > > Nicolai probably never even ran into this problem, though it should > > be easy to reproduce. > > I am just worried that the situation resulting in the earlier SRCU > deadlocks might be hiding behind CONFIG_PROVE_RCU=n, > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug > hiding behind some other set of Kconfig options. There's no SRCU deadlock though. I know exactly why it happens, in my case, which is the following: Thread 1 userspace: read(debugfs_file_1) srcu_read_lock(_srcu); // in debugfs bowels wait_event_interruptible(...); // in my driver's debugfs read method Thread 2: debugfs_remove(debugfs_file_2); srcu_synchronize(_srcu); // in debugfs bowels This is the live-lock. The deadlock is something I posited but never ran into: CPU 1 CPU 2 srcu_read_lock(_srcu); rtnl_lock(); rtnl_lock(); srcu_synchronize(_srcu); Again, no (S)RCU abuse here, just an ABBA deadlock. johannes
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, 2017-04-18 at 06:31 -0700, Paul E. McKenney wrote: > On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > > > If you have not already done so, please run this with debug > > > enabled, > > > especially CONFIG_PROVE_LOCKING=y (which implies > > > CONFIG_PROVE_RCU=y). > > > This is important because there are configurations for which the > > > deadlocks you saw with SRCU turn into silent failure, including > > > memory corruption. > > > CONFIG_PROVE_RCU=y will catch many of those situations. > > > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > > enabled in the builds where we saw the problem, but I'm not sure. > > CONFIG_PROVE_RCU=y will reliably catch things like this: > > 1.rcu_read_lock(); > synchronize_rcu(); > rcu_read_unlock(); Ok, that's not something that happens here either. > 2.rcu_read_lock(); > schedule_timeout_interruptible(HZ); > rcu_read_unlock(); Neither is this happening. > There are more, but this should get you the flavor of the types > of bugs CONFIG_PROVE_RCU=y can locate for you. Makes sense. However, the issue at hand is what we (you and I) discussed earlier wrt. lockdep -- from SRCU's point of view everything is actually OK, except that the one thread is waiting for something and we can never finish the grace period, and thus synchronize_srcu() will never return. But there's no real SRCU bug here. > > Nicolai probably never even ran into this problem, though it should > > be easy to reproduce. > > I am just worried that the situation resulting in the earlier SRCU > deadlocks might be hiding behind CONFIG_PROVE_RCU=n, > CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug > hiding behind some other set of Kconfig options. There's no SRCU deadlock though. I know exactly why it happens, in my case, which is the following: Thread 1 userspace: read(debugfs_file_1) srcu_read_lock(_srcu); // in debugfs bowels wait_event_interruptible(...); // in my driver's debugfs read method Thread 2: debugfs_remove(debugfs_file_2); srcu_synchronize(_srcu); // in debugfs bowels This is the live-lock. The deadlock is something I posited but never ran into: CPU 1 CPU 2 srcu_read_lock(_srcu); rtnl_lock(); rtnl_lock(); srcu_synchronize(_srcu); Again, no (S)RCU abuse here, just an ABBA deadlock. johannes
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > If you have not already done so, please run this with debug enabled, > > especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). > > This is important because there are configurations for which the > > deadlocks you saw with SRCU turn into silent failure, including > > memory corruption. > > CONFIG_PROVE_RCU=y will catch many of those situations. > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > enabled in the builds where we saw the problem, but I'm not sure. CONFIG_PROVE_RCU=y will reliably catch things like this: 1. rcu_read_lock(); synchronize_rcu(); rcu_read_unlock(); With CONFIG_PROVE_RCU=n and CONFIG_PREEMPT=n, this will result in too-short grace periods, which can free things out from under the read-side critical section, which in turn can result in arbitrary memory corruption. You might not even get a "scheduling while atomic", though CONFIG_PREEMPT_COUNT=y will produce this message. With CONFIG_PREEMPT=y, on the other hand, this should deadlock in a manner similar to the earlier SRCU deadlocks seen in debugfs. 2. rcu_read_lock(); schedule_timeout_interruptible(HZ); rcu_read_unlock(); With CONFIG_PROVE_RCU=y and CONFIG_PREEMPT=y, this will just work, more or less. Until someone runs with CONFIG_PREEMPT=n, which will produce "scheduling while atomic". (I have a fix for this queued for 4.13, FWIW, so that in the future CONFIG_PROVE_RCU=y and CONFIG_PREEMPT=y will complain about this. But for now, silent bug.) There are more, but this should get you the flavor of the types of bugs CONFIG_PROVE_RCU=y can locate for you. > Can you say which configurations you're thinking of? And perhaps what > kind of corruption you're thinking of also? I'm having a hard time > imagining any corruption that should happen? #1 is the silent corruption case given CONFIG_PROVE_RCU=n, CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. > Nicolai probably never even ran into this problem, though it should be > easy to reproduce. I am just worried that the situation resulting in the earlier SRCU deadlocks might be hiding behind CONFIG_PROVE_RCU=n, CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug hiding behind some other set of Kconfig options. Thanx, Paul
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Tue, Apr 18, 2017 at 11:39:27AM +0200, Johannes Berg wrote: > On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > > > If you have not already done so, please run this with debug enabled, > > especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). > > This is important because there are configurations for which the > > deadlocks you saw with SRCU turn into silent failure, including > > memory corruption. > > CONFIG_PROVE_RCU=y will catch many of those situations. > > Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU > enabled in the builds where we saw the problem, but I'm not sure. CONFIG_PROVE_RCU=y will reliably catch things like this: 1. rcu_read_lock(); synchronize_rcu(); rcu_read_unlock(); With CONFIG_PROVE_RCU=n and CONFIG_PREEMPT=n, this will result in too-short grace periods, which can free things out from under the read-side critical section, which in turn can result in arbitrary memory corruption. You might not even get a "scheduling while atomic", though CONFIG_PREEMPT_COUNT=y will produce this message. With CONFIG_PREEMPT=y, on the other hand, this should deadlock in a manner similar to the earlier SRCU deadlocks seen in debugfs. 2. rcu_read_lock(); schedule_timeout_interruptible(HZ); rcu_read_unlock(); With CONFIG_PROVE_RCU=y and CONFIG_PREEMPT=y, this will just work, more or less. Until someone runs with CONFIG_PREEMPT=n, which will produce "scheduling while atomic". (I have a fix for this queued for 4.13, FWIW, so that in the future CONFIG_PROVE_RCU=y and CONFIG_PREEMPT=y will complain about this. But for now, silent bug.) There are more, but this should get you the flavor of the types of bugs CONFIG_PROVE_RCU=y can locate for you. > Can you say which configurations you're thinking of? And perhaps what > kind of corruption you're thinking of also? I'm having a hard time > imagining any corruption that should happen? #1 is the silent corruption case given CONFIG_PROVE_RCU=n, CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. > Nicolai probably never even ran into this problem, though it should be > easy to reproduce. I am just worried that the situation resulting in the earlier SRCU deadlocks might be hiding behind CONFIG_PROVE_RCU=n, CONFIG_PREEMPT=n, and CONFIG_PREEMPT_COUNT=n. Or some other bug hiding behind some other set of Kconfig options. Thanx, Paul
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > If you have not already done so, please run this with debug enabled, > especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). > This is important because there are configurations for which the > deadlocks you saw with SRCU turn into silent failure, including > memory corruption. > CONFIG_PROVE_RCU=y will catch many of those situations. Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU enabled in the builds where we saw the problem, but I'm not sure. Can you say which configurations you're thinking of? And perhaps what kind of corruption you're thinking of also? I'm having a hard time imagining any corruption that should happen? Nicolai probably never even ran into this problem, though it should be easy to reproduce. johannes
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Mon, 2017-04-17 at 09:01 -0700, Paul E. McKenney wrote: > If you have not already done so, please run this with debug enabled, > especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). > This is important because there are configurations for which the > deadlocks you saw with SRCU turn into silent failure, including > memory corruption. > CONFIG_PROVE_RCU=y will catch many of those situations. Can you elaborate on that? I think we may have had CONFIG_PROVE_RCU enabled in the builds where we saw the problem, but I'm not sure. Can you say which configurations you're thinking of? And perhaps what kind of corruption you're thinking of also? I'm having a hard time imagining any corruption that should happen? Nicolai probably never even ran into this problem, though it should be easy to reproduce. johannes
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Sun, Apr 16, 2017 at 11:51:37AM +0200, Nicolai Stange wrote: > Currently, a dentry's debugfs_fsdata instance is allocated from > debugfs_file_get() at first usage, i.e. at first file opening. > > It won't ever get freed though. > > Ideally, these instances would get freed after the last open file handle > gets closed again, that is from the fops' ->release(). > > Unfortunately, we can't hook easily into those ->release()ers of files > created through debugfs_create_file_unsafe(), i.e. of those proxied through > debugfs_open_proxy_file_operations rather than through the "full" > debugfs_full_proxy_file_operations proxy. > > Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(), > with the drawback of a potential allocation + deallocation per > debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation. > > In addition to its former role of tracking pending fops, use the > ->active_users for reference counting on the debugfs_fsdata instance > itself. In particular, don't keep a dummy reference to be dropped from > __debugfs_remove_file(): a d_delete()ed dentry and thus, request for > completion notification, is now signaled by the d_unlinked() dentry itself. > > Once ->active_users drops to zero (and the dentry is still intact), free > the debugfs_fsdata instance from debugfs_file_put(). RCU protects any > concurrent debugfs_file_get() attempts to get a hold of the instance here. > Likewise for full_proxy_release() which lacks a call to debugfs_file_get(). > > Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair, > care must be taken in order to avoid races between debugfs_file_put() and > debugfs_file_get() as well as __debugfs_remove_file(). Rather than > introducing a global lock, exploit the fact that there will ever be only a > single !d_unlinked() -> d_unlinked() transition and add memory barriers > where needed. Given the lack of proper benchmarking, that debugfs fops > aren't performance critical and that we've already got a potential > allocation/deallocation pair anyway, the added code complexity might be > highly questionable though. > > Signed-off-by: Nicolai StangeIf you have not already done so, please run this with debug enabled, especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). This is important because there are configurations for which the deadlocks you saw with SRCU turn into silent failure, including memory corruption. CONFIG_PROVE_RCU=y will catch many of those situations. (And yes, kfree_rcu() doesn't have that problem, but...) Another issue called out inline. Thanx, Paul > --- > fs/debugfs/file.c | 102 > -- > fs/debugfs/inode.c| 8 +++- > fs/debugfs/internal.h | 1 + > 3 files changed, 90 insertions(+), 21 deletions(-) > > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c > index f4dfd7d0d625..b2cc25d44a39 100644 > --- a/fs/debugfs/file.c > +++ b/fs/debugfs/file.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > #include > > #include "internal.h" > @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry) > struct debugfs_fsdata *fsd; > void *d_fsd; > > - d_fsd = READ_ONCE(dentry->d_fsdata); > + rcu_read_lock(); > +retry: > + d_fsd = rcu_dereference(dentry->d_fsdata); > if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) { > + /* > + * Paired with the control dependency in > + * debugfs_file_put(): if we saw the debugfs_fsdata > + * instance "restored" there but not the dead dentry, > + * we'd erroneously instantiate a fresh debugfs_fsdata > + * instance below. > + */ > + smp_rmb(); > + if (d_unlinked(dentry)) { > + rcu_read_unlock(); > + return -EIO; > + } > + > fsd = d_fsd; > + if (!refcount_inc_not_zero(>active_users)) { > + /* > + * A concurrent debugfs_file_put() dropped the > + * count to zero and is about to free the > + * debugfs_fsdata. Help out resetting the > + * ->d_fsdata and retry. > + */ > + d_fsd = (void *)((unsigned long)fsd->real_fops | > + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); > + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd); This is an infrequent race, I hope? If on the other hand there is a possibility of this branch being taken a huge number of times in one call, it would be good to exit the RCU read-side critical section before retrying. > + goto retry; > + } > + rcu_read_unlock(); > } else { > +
Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
On Sun, Apr 16, 2017 at 11:51:37AM +0200, Nicolai Stange wrote: > Currently, a dentry's debugfs_fsdata instance is allocated from > debugfs_file_get() at first usage, i.e. at first file opening. > > It won't ever get freed though. > > Ideally, these instances would get freed after the last open file handle > gets closed again, that is from the fops' ->release(). > > Unfortunately, we can't hook easily into those ->release()ers of files > created through debugfs_create_file_unsafe(), i.e. of those proxied through > debugfs_open_proxy_file_operations rather than through the "full" > debugfs_full_proxy_file_operations proxy. > > Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(), > with the drawback of a potential allocation + deallocation per > debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation. > > In addition to its former role of tracking pending fops, use the > ->active_users for reference counting on the debugfs_fsdata instance > itself. In particular, don't keep a dummy reference to be dropped from > __debugfs_remove_file(): a d_delete()ed dentry and thus, request for > completion notification, is now signaled by the d_unlinked() dentry itself. > > Once ->active_users drops to zero (and the dentry is still intact), free > the debugfs_fsdata instance from debugfs_file_put(). RCU protects any > concurrent debugfs_file_get() attempts to get a hold of the instance here. > Likewise for full_proxy_release() which lacks a call to debugfs_file_get(). > > Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair, > care must be taken in order to avoid races between debugfs_file_put() and > debugfs_file_get() as well as __debugfs_remove_file(). Rather than > introducing a global lock, exploit the fact that there will ever be only a > single !d_unlinked() -> d_unlinked() transition and add memory barriers > where needed. Given the lack of proper benchmarking, that debugfs fops > aren't performance critical and that we've already got a potential > allocation/deallocation pair anyway, the added code complexity might be > highly questionable though. > > Signed-off-by: Nicolai Stange If you have not already done so, please run this with debug enabled, especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y). This is important because there are configurations for which the deadlocks you saw with SRCU turn into silent failure, including memory corruption. CONFIG_PROVE_RCU=y will catch many of those situations. (And yes, kfree_rcu() doesn't have that problem, but...) Another issue called out inline. Thanx, Paul > --- > fs/debugfs/file.c | 102 > -- > fs/debugfs/inode.c| 8 +++- > fs/debugfs/internal.h | 1 + > 3 files changed, 90 insertions(+), 21 deletions(-) > > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c > index f4dfd7d0d625..b2cc25d44a39 100644 > --- a/fs/debugfs/file.c > +++ b/fs/debugfs/file.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > #include > > #include "internal.h" > @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry) > struct debugfs_fsdata *fsd; > void *d_fsd; > > - d_fsd = READ_ONCE(dentry->d_fsdata); > + rcu_read_lock(); > +retry: > + d_fsd = rcu_dereference(dentry->d_fsdata); > if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) { > + /* > + * Paired with the control dependency in > + * debugfs_file_put(): if we saw the debugfs_fsdata > + * instance "restored" there but not the dead dentry, > + * we'd erroneously instantiate a fresh debugfs_fsdata > + * instance below. > + */ > + smp_rmb(); > + if (d_unlinked(dentry)) { > + rcu_read_unlock(); > + return -EIO; > + } > + > fsd = d_fsd; > + if (!refcount_inc_not_zero(>active_users)) { > + /* > + * A concurrent debugfs_file_put() dropped the > + * count to zero and is about to free the > + * debugfs_fsdata. Help out resetting the > + * ->d_fsdata and retry. > + */ > + d_fsd = (void *)((unsigned long)fsd->real_fops | > + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); > + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd); This is an infrequent race, I hope? If on the other hand there is a possibility of this branch being taken a huge number of times in one call, it would be good to exit the RCU read-side critical section before retrying. > + goto retry; > + } > + rcu_read_unlock(); > } else { > + rcu_read_unlock(); >
[RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
Currently, a dentry's debugfs_fsdata instance is allocated from debugfs_file_get() at first usage, i.e. at first file opening. It won't ever get freed though. Ideally, these instances would get freed after the last open file handle gets closed again, that is from the fops' ->release(). Unfortunately, we can't hook easily into those ->release()ers of files created through debugfs_create_file_unsafe(), i.e. of those proxied through debugfs_open_proxy_file_operations rather than through the "full" debugfs_full_proxy_file_operations proxy. Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(), with the drawback of a potential allocation + deallocation per debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation. In addition to its former role of tracking pending fops, use the ->active_users for reference counting on the debugfs_fsdata instance itself. In particular, don't keep a dummy reference to be dropped from __debugfs_remove_file(): a d_delete()ed dentry and thus, request for completion notification, is now signaled by the d_unlinked() dentry itself. Once ->active_users drops to zero (and the dentry is still intact), free the debugfs_fsdata instance from debugfs_file_put(). RCU protects any concurrent debugfs_file_get() attempts to get a hold of the instance here. Likewise for full_proxy_release() which lacks a call to debugfs_file_get(). Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair, care must be taken in order to avoid races between debugfs_file_put() and debugfs_file_get() as well as __debugfs_remove_file(). Rather than introducing a global lock, exploit the fact that there will ever be only a single !d_unlinked() -> d_unlinked() transition and add memory barriers where needed. Given the lack of proper benchmarking, that debugfs fops aren't performance critical and that we've already got a potential allocation/deallocation pair anyway, the added code complexity might be highly questionable though. Signed-off-by: Nicolai Stange--- fs/debugfs/file.c | 102 -- fs/debugfs/inode.c| 8 +++- fs/debugfs/internal.h | 1 + 3 files changed, 90 insertions(+), 21 deletions(-) diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index f4dfd7d0d625..b2cc25d44a39 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include "internal.h" @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry) struct debugfs_fsdata *fsd; void *d_fsd; - d_fsd = READ_ONCE(dentry->d_fsdata); + rcu_read_lock(); +retry: + d_fsd = rcu_dereference(dentry->d_fsdata); if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) { + /* +* Paired with the control dependency in +* debugfs_file_put(): if we saw the debugfs_fsdata +* instance "restored" there but not the dead dentry, +* we'd erroneously instantiate a fresh debugfs_fsdata +* instance below. +*/ + smp_rmb(); + if (d_unlinked(dentry)) { + rcu_read_unlock(); + return -EIO; + } + fsd = d_fsd; + if (!refcount_inc_not_zero(>active_users)) { + /* +* A concurrent debugfs_file_put() dropped the +* count to zero and is about to free the +* debugfs_fsdata. Help out resetting the +* ->d_fsdata and retry. +*/ + d_fsd = (void *)((unsigned long)fsd->real_fops | + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd); + goto retry; + } + rcu_read_unlock(); } else { + rcu_read_unlock(); fsd = kmalloc(sizeof(*fsd), GFP_KERNEL); if (!fsd) return -ENOMEM; @@ -91,25 +121,28 @@ int debugfs_file_get(struct dentry *dentry) refcount_set(>active_users, 1); init_completion(>active_users_drained); if (cmpxchg(>d_fsdata, d_fsd, fsd) != d_fsd) { + /* +* Another debugfs_file_get() has installed a +* debugfs_fsdata instance concurrently. +* Free ours and retry to grab a reference on +* the installed one. +*/ kfree(fsd); - fsd = READ_ONCE(dentry->d_fsdata); + rcu_read_lock(); + goto retry; + } + /* +* In case of a successful
[RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
Currently, a dentry's debugfs_fsdata instance is allocated from debugfs_file_get() at first usage, i.e. at first file opening. It won't ever get freed though. Ideally, these instances would get freed after the last open file handle gets closed again, that is from the fops' ->release(). Unfortunately, we can't hook easily into those ->release()ers of files created through debugfs_create_file_unsafe(), i.e. of those proxied through debugfs_open_proxy_file_operations rather than through the "full" debugfs_full_proxy_file_operations proxy. Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(), with the drawback of a potential allocation + deallocation per debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation. In addition to its former role of tracking pending fops, use the ->active_users for reference counting on the debugfs_fsdata instance itself. In particular, don't keep a dummy reference to be dropped from __debugfs_remove_file(): a d_delete()ed dentry and thus, request for completion notification, is now signaled by the d_unlinked() dentry itself. Once ->active_users drops to zero (and the dentry is still intact), free the debugfs_fsdata instance from debugfs_file_put(). RCU protects any concurrent debugfs_file_get() attempts to get a hold of the instance here. Likewise for full_proxy_release() which lacks a call to debugfs_file_get(). Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair, care must be taken in order to avoid races between debugfs_file_put() and debugfs_file_get() as well as __debugfs_remove_file(). Rather than introducing a global lock, exploit the fact that there will ever be only a single !d_unlinked() -> d_unlinked() transition and add memory barriers where needed. Given the lack of proper benchmarking, that debugfs fops aren't performance critical and that we've already got a potential allocation/deallocation pair anyway, the added code complexity might be highly questionable though. Signed-off-by: Nicolai Stange --- fs/debugfs/file.c | 102 -- fs/debugfs/inode.c| 8 +++- fs/debugfs/internal.h | 1 + 3 files changed, 90 insertions(+), 21 deletions(-) diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index f4dfd7d0d625..b2cc25d44a39 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include "internal.h" @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry) struct debugfs_fsdata *fsd; void *d_fsd; - d_fsd = READ_ONCE(dentry->d_fsdata); + rcu_read_lock(); +retry: + d_fsd = rcu_dereference(dentry->d_fsdata); if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) { + /* +* Paired with the control dependency in +* debugfs_file_put(): if we saw the debugfs_fsdata +* instance "restored" there but not the dead dentry, +* we'd erroneously instantiate a fresh debugfs_fsdata +* instance below. +*/ + smp_rmb(); + if (d_unlinked(dentry)) { + rcu_read_unlock(); + return -EIO; + } + fsd = d_fsd; + if (!refcount_inc_not_zero(>active_users)) { + /* +* A concurrent debugfs_file_put() dropped the +* count to zero and is about to free the +* debugfs_fsdata. Help out resetting the +* ->d_fsdata and retry. +*/ + d_fsd = (void *)((unsigned long)fsd->real_fops | + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT); + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd); + goto retry; + } + rcu_read_unlock(); } else { + rcu_read_unlock(); fsd = kmalloc(sizeof(*fsd), GFP_KERNEL); if (!fsd) return -ENOMEM; @@ -91,25 +121,28 @@ int debugfs_file_get(struct dentry *dentry) refcount_set(>active_users, 1); init_completion(>active_users_drained); if (cmpxchg(>d_fsdata, d_fsd, fsd) != d_fsd) { + /* +* Another debugfs_file_get() has installed a +* debugfs_fsdata instance concurrently. +* Free ours and retry to grab a reference on +* the installed one. +*/ kfree(fsd); - fsd = READ_ONCE(dentry->d_fsdata); + rcu_read_lock(); + goto retry; + } + /* +* In case of a successful cmpxchg() above, this