Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds wrote: > > > On Sat, 31 May 2014, Josh Boyer wrote: >> >> One of my machines got the lockdep report below when booting a kernel >> that contained these patches. > > I think this is just a lacking annotation. > > We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the > rule is that we lock things in topological order (parent dentry first). > lock_parent() is very careful about that, but doesn't actually tell > lockdep about it. > > This trivial oneliner should fix it. > > Linus Built the previous kernel with just this applied and it seems to behave correctly on several reboots. I think this looks good. Thanks! josh > > --- > fs/dcache.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/dcache.c b/fs/dcache.c > index bce851dc03ef..be2bea834bf4 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -553,7 +553,7 @@ again: > } > rcu_read_unlock(); > if (parent != dentry) > - spin_lock(>d_lock); > + spin_lock_nested(>d_lock, DENTRY_D_LOCK_NESTED); > else > parent = NULL; > return parent; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds wrote: > > > On Sat, 31 May 2014, Josh Boyer wrote: >> >> One of my machines got the lockdep report below when booting a kernel >> that contained these patches. > > I think this is just a lacking annotation. > > We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the > rule is that we lock things in topological order (parent dentry first). > lock_parent() is very careful about that, but doesn't actually tell > lockdep about it. > > This trivial oneliner should fix it. Thanks, I'll give it a spin later today. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Sat, 31 May 2014, Josh Boyer wrote: > > One of my machines got the lockdep report below when booting a kernel > that contained these patches. I think this is just a lacking annotation. We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the rule is that we lock things in topological order (parent dentry first). lock_parent() is very careful about that, but doesn't actually tell lockdep about it. This trivial oneliner should fix it. Linus --- fs/dcache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index bce851dc03ef..be2bea834bf4 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -553,7 +553,7 @@ again: } rcu_read_unlock(); if (parent != dentry) - spin_lock(>d_lock); + spin_lock_nested(>d_lock, DENTRY_D_LOCK_NESTED); else parent = NULL; return parent; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 1:14 PM, Al Viro wrote: > On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote: >> On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: >> > On Fri, May 30, 2014 at 8:21 AM, Al Viro wrote: >> > > >> > > Linus, how would you prefer it to be handled? >> > >> > I'll just have to do an rc8. I really hoped to avoid it, because we're >> > going on our family vacation when school is out in two weeks, and it >> > causes problems for the merge window, but it's not like there is much >> > choice - I can't do a 3.15 release with a known regression like that. >> >> Sorry about that... ;-/ >> >> > So just send me the pull request, and I'll pull it. I'll probably do >> > the "let's increase the x86-64 stack size to 16kB" too, to close >> > _that_ issue as well. >> >> OK, here it is: >> >> Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink >> list corruption; the root cause was that trylock of parent's ->d_lock could >> be disrupted by d_walk() happening on other CPUs, resulting in >> shrink_dentry_list() making no progress *and* the same d_walk() being called >> again and again for as long as shrink_dentry_list() doesn't get past that >> mess. Solution is to have shrink_dentry_list() treat that trylock failure >> not >> as "try to do the same thing again", but "lock them in the right order". >> Please, pull from >> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 > > *GYAH* > > Shortlog and diffstat are from the local branch that has two more cleanups > on top what's been pushed to vfs.git (and what had been tested). Ones > matching what's really in that branch are here: > Shortlog: > Al Viro (6): > lift the "already marked killed" case into shrink_dentry_list() > split dentry_kill() > expand dentry_kill(dentry, 0) in shrink_dentry_list() > shrink_dentry_list(): take parent's ->d_lock earlier > dealing with the rest of shrink_dentry_list() livelock > dentry_kill() doesn't need the second argument now > > Diffstat: > fs/dcache.c | 153 > ++-- > 1 file changed, 107 insertions(+), 46 deletions(-) > > My apologies - the script I'm using to generate shortlogs takes branch > name as an argument, defaulting to HEAD, which was two commits past > vfs/for-linus-2. And no, I'm _not_ planning to push that followup stuff > until the merge window. Just to make sure: the branch to pull should have > head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree > as vfs.git#for-linus, which is what got testing. One of my machines got the lockdep report below when booting a kernel that contained these patches. This corresponds to Linux v3.15-rc7-102-g1487385edb55. josh [ 11.205628] usbcore: registered new interface driver btusb [ 11.212994] systemd-journald[430]: Received request to flush runtime journal from PID 1 [ 11.230780] audit: type=1305 audit(140157.755:4): audit_pid=661 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [ 11.233853] usb 2-1.8.1.1: USB disconnect, device number 6 [ 11.253251] = [ 11.253254] [ INFO: possible recursive locking detected ] [ 11.253257] 3.15.0-0.rc7.git4.1.fc21.x86_64 #1 Not tainted [ 11.253259] - [ 11.253261] systemd-udevd/448 is trying to acquire lock: [ 11.253264] (&(>d_lockref.lock)->rlock){+.+...}, at: [] lock_parent.part.21+0x59/0x69 [ 11.253291] but task is already holding lock: [ 11.253295] (&(>d_lockref.lock)->rlock){+.+...}, at: [] lock_parent.part.21+0x36/0x69 [ 11.253308] other info that might help us debug this: [ 11.253312] Possible unsafe locking scenario: [ 11.253317]CPU0 [ 11.253319] [ 11.253322] lock(&(>d_lockref.lock)->rlock); [ 11.253327] lock(&(>d_lockref.lock)->rlock); [ 11.253332] *** DEADLOCK *** [ 11.253337] May be due to missing lock nesting notation [ 11.253342] 1 lock held by systemd-udevd/448: [ 11.253345] #0: (&(>d_lockref.lock)->rlock){+.+...}, at: [] lock_parent.part.21+0x36/0x69 [ 11.253359] stack backtrace: [ 11.253366] CPU: 1 PID: 448 Comm: systemd-udevd Not tainted 3.15.0-0.rc7.git4.1.fc21.x86_64 #1 [ 11.253371] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B03.1211161133 11/16/2012 [ 11.253375] 652a059c 88025f71b978 817d7dd3 [ 11.253384] 825a8c60 88025f71ba50 810f9084 88003f83d8d8 [ 11.253393] 825a8c60 8802 [ 11.253402] Call Trace: [ 11.253411] [] dump_stack+0x4d/0x66 [ 11.253422] [] __lock_acquire+0x16e4/0x1ca0 [ 11.253434] [] ? native_sched_clock+0x35/0xa0 [ 11.253443] [] lock_acquire+0xa2/0x1d0 [ 11.253452] [] ? lock_parent.part.21+0x59/0x69 [
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 1:14 PM, Al Viro v...@zeniv.linux.org.uk wrote: On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote: On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote: Linus, how would you prefer it to be handled? I'll just have to do an rc8. I really hoped to avoid it, because we're going on our family vacation when school is out in two weeks, and it causes problems for the merge window, but it's not like there is much choice - I can't do a 3.15 release with a known regression like that. Sorry about that... ;-/ So just send me the pull request, and I'll pull it. I'll probably do the let's increase the x86-64 stack size to 16kB too, to close _that_ issue as well. OK, here it is: Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink list corruption; the root cause was that trylock of parent's -d_lock could be disrupted by d_walk() happening on other CPUs, resulting in shrink_dentry_list() making no progress *and* the same d_walk() being called again and again for as long as shrink_dentry_list() doesn't get past that mess. Solution is to have shrink_dentry_list() treat that trylock failure not as try to do the same thing again, but lock them in the right order. Please, pull from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 *GYAH* Shortlog and diffstat are from the local branch that has two more cleanups on top what's been pushed to vfs.git (and what had been tested). Ones matching what's really in that branch are here: Shortlog: Al Viro (6): lift the already marked killed case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's -d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now Diffstat: fs/dcache.c | 153 ++-- 1 file changed, 107 insertions(+), 46 deletions(-) My apologies - the script I'm using to generate shortlogs takes branch name as an argument, defaulting to HEAD, which was two commits past vfs/for-linus-2. And no, I'm _not_ planning to push that followup stuff until the merge window. Just to make sure: the branch to pull should have head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree as vfs.git#for-linus, which is what got testing. One of my machines got the lockdep report below when booting a kernel that contained these patches. This corresponds to Linux v3.15-rc7-102-g1487385edb55. josh [ 11.205628] usbcore: registered new interface driver btusb [ 11.212994] systemd-journald[430]: Received request to flush runtime journal from PID 1 [ 11.230780] audit: type=1305 audit(140157.755:4): audit_pid=661 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [ 11.233853] usb 2-1.8.1.1: USB disconnect, device number 6 [ 11.253251] = [ 11.253254] [ INFO: possible recursive locking detected ] [ 11.253257] 3.15.0-0.rc7.git4.1.fc21.x86_64 #1 Not tainted [ 11.253259] - [ 11.253261] systemd-udevd/448 is trying to acquire lock: [ 11.253264] ((dentry-d_lockref.lock)-rlock){+.+...}, at: [817d66c1] lock_parent.part.21+0x59/0x69 [ 11.253291] but task is already holding lock: [ 11.253295] ((dentry-d_lockref.lock)-rlock){+.+...}, at: [817d669e] lock_parent.part.21+0x36/0x69 [ 11.253308] other info that might help us debug this: [ 11.253312] Possible unsafe locking scenario: [ 11.253317]CPU0 [ 11.253319] [ 11.253322] lock((dentry-d_lockref.lock)-rlock); [ 11.253327] lock((dentry-d_lockref.lock)-rlock); [ 11.253332] *** DEADLOCK *** [ 11.253337] May be due to missing lock nesting notation [ 11.253342] 1 lock held by systemd-udevd/448: [ 11.253345] #0: ((dentry-d_lockref.lock)-rlock){+.+...}, at: [817d669e] lock_parent.part.21+0x36/0x69 [ 11.253359] stack backtrace: [ 11.253366] CPU: 1 PID: 448 Comm: systemd-udevd Not tainted 3.15.0-0.rc7.git4.1.fc21.x86_64 #1 [ 11.253371] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B03.1211161133 11/16/2012 [ 11.253375] 652a059c 88025f71b978 817d7dd3 [ 11.253384] 825a8c60 88025f71ba50 810f9084 88003f83d8d8 [ 11.253393] 825a8c60 8802 [ 11.253402] Call Trace: [ 11.253411] [817d7dd3] dump_stack+0x4d/0x66 [ 11.253422] [810f9084] __lock_acquire+0x16e4/0x1ca0 [ 11.253434] [81023595] ? native_sched_clock+0x35/0xa0 [ 11.253443] [810f9e32]
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Sat, 31 May 2014, Josh Boyer wrote: One of my machines got the lockdep report below when booting a kernel that contained these patches. I think this is just a lacking annotation. We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the rule is that we lock things in topological order (parent dentry first). lock_parent() is very careful about that, but doesn't actually tell lockdep about it. This trivial oneliner should fix it. Linus --- fs/dcache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index bce851dc03ef..be2bea834bf4 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -553,7 +553,7 @@ again: } rcu_read_unlock(); if (parent != dentry) - spin_lock(dentry-d_lock); + spin_lock_nested(dentry-d_lock, DENTRY_D_LOCK_NESTED); else parent = NULL; return parent; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Sat, 31 May 2014, Josh Boyer wrote: One of my machines got the lockdep report below when booting a kernel that contained these patches. I think this is just a lacking annotation. We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the rule is that we lock things in topological order (parent dentry first). lock_parent() is very careful about that, but doesn't actually tell lockdep about it. This trivial oneliner should fix it. Thanks, I'll give it a spin later today. josh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Sat, 31 May 2014, Josh Boyer wrote: One of my machines got the lockdep report below when booting a kernel that contained these patches. I think this is just a lacking annotation. We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the rule is that we lock things in topological order (parent dentry first). lock_parent() is very careful about that, but doesn't actually tell lockdep about it. This trivial oneliner should fix it. Linus Built the previous kernel with just this applied and it seems to behave correctly on several reboots. I think this looks good. Thanks! josh --- fs/dcache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index bce851dc03ef..be2bea834bf4 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -553,7 +553,7 @@ again: } rcu_read_unlock(); if (parent != dentry) - spin_lock(dentry-d_lock); + spin_lock_nested(dentry-d_lock, DENTRY_D_LOCK_NESTED); else parent = NULL; return parent; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 6:48 PM, Al Viro wrote: > On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: >> On Fri, May 30, 2014 at 8:21 AM, Al Viro wrote: >> > >> > Linus, how would you prefer it to be handled? >> >> I'll just have to do an rc8. I really hoped to avoid it, because we're >> going on our family vacation when school is out in two weeks, and it >> causes problems for the merge window, but it's not like there is much >> choice - I can't do a 3.15 release with a known regression like that. > > Sorry about that... ;-/ > >> So just send me the pull request, and I'll pull it. I'll probably do >> the "let's increase the x86-64 stack size to 16kB" too, to close >> _that_ issue as well. > > OK, here it is: > > Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink > list corruption; the root cause was that trylock of parent's ->d_lock could > be disrupted by d_walk() happening on other CPUs, resulting in > shrink_dentry_list() making no progress *and* the same d_walk() being called > again and again for as long as shrink_dentry_list() doesn't get past that > mess. Solution is to have shrink_dentry_list() treat that trylock failure not > as "try to do the same thing again", but "lock them in the right order". > Please, pull from > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 > > Shortlog: > Al Viro (8): > lift the "already marked killed" case into shrink_dentry_list() > split dentry_kill() > expand dentry_kill(dentry, 0) in shrink_dentry_list() > shrink_dentry_list(): take parent's ->d_lock earlier > dealing with the rest of shrink_dentry_list() livelock > dentry_kill() doesn't need the second argument now > d_prune_alias(): just lock the parent and call __dentry_kill() > dcache.c: call ->d_prune() regardless of d_unhashed() > > Diffstat: > fs/dcache.c | 176 > +--- > 1 file changed, 115 insertions(+), 61 deletions(-) > Did you push all? $ git log --oneline v3.15-rc7.. 8cbf74d dentry_kill() doesn't need the second argument now b2b8019 dealing with the rest of shrink_dentry_list() livelock 046b961 shrink_dentry_list(): take parent's ->d_lock earlier ff2fde9 expand dentry_kill(dentry, 0) in shrink_dentry_list() e55fd01 split dentry_kill() 64fd72e lift the "already marked killed" case into shrink_dentry_list() b6dd6f4 vfs: fix vmplice_to_user() - Sedat - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote: > On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: > > On Fri, May 30, 2014 at 8:21 AM, Al Viro wrote: > > > > > > Linus, how would you prefer it to be handled? > > > > I'll just have to do an rc8. I really hoped to avoid it, because we're > > going on our family vacation when school is out in two weeks, and it > > causes problems for the merge window, but it's not like there is much > > choice - I can't do a 3.15 release with a known regression like that. > > Sorry about that... ;-/ > > > So just send me the pull request, and I'll pull it. I'll probably do > > the "let's increase the x86-64 stack size to 16kB" too, to close > > _that_ issue as well. > > OK, here it is: > > Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink > list corruption; the root cause was that trylock of parent's ->d_lock could > be disrupted by d_walk() happening on other CPUs, resulting in > shrink_dentry_list() making no progress *and* the same d_walk() being called > again and again for as long as shrink_dentry_list() doesn't get past that > mess. Solution is to have shrink_dentry_list() treat that trylock failure not > as "try to do the same thing again", but "lock them in the right order". > Please, pull from > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 *GYAH* Shortlog and diffstat are from the local branch that has two more cleanups on top what's been pushed to vfs.git (and what had been tested). Ones matching what's really in that branch are here: Shortlog: Al Viro (6): lift the "already marked killed" case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's ->d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now Diffstat: fs/dcache.c | 153 ++-- 1 file changed, 107 insertions(+), 46 deletions(-) My apologies - the script I'm using to generate shortlogs takes branch name as an argument, defaulting to HEAD, which was two commits past vfs/for-linus-2. And no, I'm _not_ planning to push that followup stuff until the merge window. Just to make sure: the branch to pull should have head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree as vfs.git#for-linus, which is what got testing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: > On Fri, May 30, 2014 at 8:21 AM, Al Viro wrote: > > > > Linus, how would you prefer it to be handled? > > I'll just have to do an rc8. I really hoped to avoid it, because we're > going on our family vacation when school is out in two weeks, and it > causes problems for the merge window, but it's not like there is much > choice - I can't do a 3.15 release with a known regression like that. Sorry about that... ;-/ > So just send me the pull request, and I'll pull it. I'll probably do > the "let's increase the x86-64 stack size to 16kB" too, to close > _that_ issue as well. OK, here it is: Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink list corruption; the root cause was that trylock of parent's ->d_lock could be disrupted by d_walk() happening on other CPUs, resulting in shrink_dentry_list() making no progress *and* the same d_walk() being called again and again for as long as shrink_dentry_list() doesn't get past that mess. Solution is to have shrink_dentry_list() treat that trylock failure not as "try to do the same thing again", but "lock them in the right order". Please, pull from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 Shortlog: Al Viro (8): lift the "already marked killed" case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's ->d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now d_prune_alias(): just lock the parent and call __dentry_kill() dcache.c: call ->d_prune() regardless of d_unhashed() Diffstat: fs/dcache.c | 176 +--- 1 file changed, 115 insertions(+), 61 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 8:21 AM, Al Viro wrote: > > Linus, how would you prefer it to be handled? I'll just have to do an rc8. I really hoped to avoid it, because we're going on our family vacation when school is out in two weeks, and it causes problems for the merge window, but it's not like there is much choice - I can't do a 3.15 release with a known regression like that. So just send me the pull request, and I'll pull it. I'll probably do the "let's increase the x86-64 stack size to 16kB" too, to close _that_ issue as well. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 11:12:38AM +0300, Mika Westerberg wrote: > Tested your latest #for-linus from here: > > https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus > > and the livelock is gone, > > Tested-by: Mika Westerberg > > Thanks again! OK... I've just pushed another branch (#for-linus-2) with lock_parent() change folded into the commit that used to introduce rename_lock-based variant. Heads of the branches are byte-for-byte identical: al@duke:~/linux/trees/vfs$ git log vfs/for-linus|head -1 commit ebc6cb92bfeddf25462842f64604bc9fd2aab5b0 al@duke:~/linux/trees/vfs$ git log vfs/for-linus-2|head -1 commit 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 al@duke:~/linux/trees/vfs$ git diff ebc6cb92bfeddf25462842f64604bc9fd2aab5b0 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 al@duke:~/linux/trees/vfs$ so testing done on one of them obviously applies to another. FWIW, on #for-linus-2 we have Shortlog: Al Viro (6): lift the "already marked killed" case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's ->d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now Diffstat: fs/dcache.c | 153 ++-- 1 file changed, 107 insertions(+), 46 deletions(-) Linus, how would you prefer it to be handled? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 07:52:01PM +0100, Al Viro wrote: > On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote: > > On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote: > > > On Thu, May 29, 2014 at 9:23 AM, Al Viro wrote: > > > > > > > > BTW, lock_parent() might be better off if in contended case it would not > > > > bother with rename_lock and did something like this: > > > > again: > > > > > > Ack. I think that's much better. > > > > Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry), > > not if (parent)). I'll wait with folding it back into the commit that > > introduces lock_parent() until we get testing results... > > Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on > success is trivial, but there's more serious problem: suppose dentries > involved get moved before we get to locking what we thought was parent. > We end up taking ->d_lock on two dentries that might be nowhere near each > other in the tree, with obvious nasty implications. Would be _very_ hard > to reproduce ;-/ > > AFAICS, the following would be safe, but I'd really appreciate any extra > eyes on that sucker: > > static inline struct dentry *lock_parent(struct dentry *dentry) > { > struct dentry *parent = dentry->d_parent; > if (IS_ROOT(dentry)) > return NULL; > if (likely(spin_trylock(>d_lock))) > return parent; > spin_unlock(>d_lock); > rcu_read_lock(); > again: > parent = ACCESS_ONCE(dentry->d_parent); > spin_lock(>d_lock); > /* > * We can't blindly lock dentry until we are sure > * that we won't violate the locking order. > * While parent->d_lock is not enough to stabilize >* dentry->d_parent, it *is* enough to stabilize >* dentry->d_parent == parent. > */ > if (unlikely(parent != dentry->d_parent)) { > spin_unlock(>d_lock); > goto again; > } > rcu_read_unlock(); > if (parent != dentry) > spin_lock(>d_lock); > else > parent = NULL; > return parent; > } > > That variant got force-pushed in place of the previous one, again at the > head of #for-linus. And I'm definitely not folding it in until it gets > more review and testing. Tested your latest #for-linus from here: https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus and the livelock is gone, Tested-by: Mika Westerberg Thanks again! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 10:00:49PM -0700, Linus Torvalds wrote: > Yeah, I think that would be good. Except I think you should create a > "dentry_is_dead()" helper function that then has that "if you hold the > dentry or parent lock, this is safe" comment, because for lockref in > general you do need to have the lock in the lockref itself. The fact > that dentries have more locking is very much dentry-specific. With how many callers? There literally are only two places where we look at that bit; one of them very tempting to convert to ->d_lockref.count != 0, since reactions to positive and negative are very similar. We also have that bogus BUG_ON() you've mentioned (that one simply should die) and only one place where we check it for being negative - autofs4_lookup_active(). And that one is better dealt with by taking removal from their active list from ->d_release() to ->d_prune() (if not turning their ->d_release() into ->d_prune() wholesale) and making ->d_prune() called for hashed and unhashed alike (the only instance *already* checks for d_unhashed() and does nothing in that case; no need to check that in fs/dcache.c). With that done, the check will be gone - all it does is filtering out the ones that are already on the way out, but still hadn't reached ->d_release()). IOW, it's not a widely used functionality and it's really not something that should be ever needed outside of fs/dcache.c. And in fs/dcache.c we have one call site, so I'm not sure if even mentioning __lockref_not_dead() would make much sense - (int)child->d_lockref.count < 0 might be better, along with a comment about ->d_parent->d_lock serializing it against lockref_mark_dead() in __dentry_kill() just as well as ->d_lock would... Note that the only reason why autofs is playing those games is that they keep references to dentries that do not contribute to refcount, rip them out when dentry is killed and do that in the wrong method, which opens the window when ->d_lock is already dropped and ->d_release() is inevitable but yet to be called. Solution: rip those references out before dropping ->d_lock, which is what ->d_prune() gives us. To be fair, that code predates ->d_prune() by several years (Jul 2008 and Oct 2011, resp.) And "vfs: call d_op->d_prune() before unhashing dentry" has added very odd checks for !d_unhashed(), despite ceph ->d_prune() being an explicit no-op in other cases... While we are at it, what the devil is d_prune_aliases() doing? OK, we grab ->d_lock and see that refcount is 0. Then we call ->d_prune(), bump refcount to 1, forcibly unhash the sucker, drop ->d_lock and ->i_lock and call dput(). Which seems to be far too convoluted... AFAICS, what we want to do is spin_lock(>d_lock); if (!dentry->d_lockref.count) { parent = lock_parent(dentry); if (likely(!dentry->d_lockref.count)) { __dentry_kill(dentry); goto restart; } if (parent) spin_unlock(>d_lock); } spin_unlock(>d_lock); (which means that pulling ->i_lock trylock into __dentry_kill() is probably not a good plan, more's the pity...) And there goes this second call site of ->d_prune() - after that it would be called exactly in one place, right after __dentry_kill() has done lockref_mark_dead(). The whole reason for calling it there was that forcible unhash used to force dput() to kill the sucker has a side effect of messing ceph ->d_prune()... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 10:00:49PM -0700, Linus Torvalds wrote: Yeah, I think that would be good. Except I think you should create a dentry_is_dead() helper function that then has that if you hold the dentry or parent lock, this is safe comment, because for lockref in general you do need to have the lock in the lockref itself. The fact that dentries have more locking is very much dentry-specific. With how many callers? There literally are only two places where we look at that bit; one of them very tempting to convert to -d_lockref.count != 0, since reactions to positive and negative are very similar. We also have that bogus BUG_ON() you've mentioned (that one simply should die) and only one place where we check it for being negative - autofs4_lookup_active(). And that one is better dealt with by taking removal from their active list from -d_release() to -d_prune() (if not turning their -d_release() into -d_prune() wholesale) and making -d_prune() called for hashed and unhashed alike (the only instance *already* checks for d_unhashed() and does nothing in that case; no need to check that in fs/dcache.c). With that done, the check will be gone - all it does is filtering out the ones that are already on the way out, but still hadn't reached -d_release()). IOW, it's not a widely used functionality and it's really not something that should be ever needed outside of fs/dcache.c. And in fs/dcache.c we have one call site, so I'm not sure if even mentioning __lockref_not_dead() would make much sense - (int)child-d_lockref.count 0 might be better, along with a comment about -d_parent-d_lock serializing it against lockref_mark_dead() in __dentry_kill() just as well as -d_lock would... Note that the only reason why autofs is playing those games is that they keep references to dentries that do not contribute to refcount, rip them out when dentry is killed and do that in the wrong method, which opens the window when -d_lock is already dropped and -d_release() is inevitable but yet to be called. Solution: rip those references out before dropping -d_lock, which is what -d_prune() gives us. To be fair, that code predates -d_prune() by several years (Jul 2008 and Oct 2011, resp.) And vfs: call d_op-d_prune() before unhashing dentry has added very odd checks for !d_unhashed(), despite ceph -d_prune() being an explicit no-op in other cases... While we are at it, what the devil is d_prune_aliases() doing? OK, we grab -d_lock and see that refcount is 0. Then we call -d_prune(), bump refcount to 1, forcibly unhash the sucker, drop -d_lock and -i_lock and call dput(). Which seems to be far too convoluted... AFAICS, what we want to do is spin_lock(dentry-d_lock); if (!dentry-d_lockref.count) { parent = lock_parent(dentry); if (likely(!dentry-d_lockref.count)) { __dentry_kill(dentry); goto restart; } if (parent) spin_unlock(parent-d_lock); } spin_unlock(dentry-d_lock); (which means that pulling -i_lock trylock into __dentry_kill() is probably not a good plan, more's the pity...) And there goes this second call site of -d_prune() - after that it would be called exactly in one place, right after __dentry_kill() has done lockref_mark_dead(). The whole reason for calling it there was that forcible unhash used to force dput() to kill the sucker has a side effect of messing ceph -d_prune()... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 07:52:01PM +0100, Al Viro wrote: On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote: On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote: On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote: BTW, lock_parent() might be better off if in contended case it would not bother with rename_lock and did something like this: again: Ack. I think that's much better. Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry), not if (parent)). I'll wait with folding it back into the commit that introduces lock_parent() until we get testing results... Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on success is trivial, but there's more serious problem: suppose dentries involved get moved before we get to locking what we thought was parent. We end up taking -d_lock on two dentries that might be nowhere near each other in the tree, with obvious nasty implications. Would be _very_ hard to reproduce ;-/ AFAICS, the following would be safe, but I'd really appreciate any extra eyes on that sucker: static inline struct dentry *lock_parent(struct dentry *dentry) { struct dentry *parent = dentry-d_parent; if (IS_ROOT(dentry)) return NULL; if (likely(spin_trylock(parent-d_lock))) return parent; spin_unlock(dentry-d_lock); rcu_read_lock(); again: parent = ACCESS_ONCE(dentry-d_parent); spin_lock(parent-d_lock); /* * We can't blindly lock dentry until we are sure * that we won't violate the locking order. * While parent-d_lock is not enough to stabilize * dentry-d_parent, it *is* enough to stabilize * dentry-d_parent == parent. */ if (unlikely(parent != dentry-d_parent)) { spin_unlock(parent-d_lock); goto again; } rcu_read_unlock(); if (parent != dentry) spin_lock(dentry-d_lock); else parent = NULL; return parent; } That variant got force-pushed in place of the previous one, again at the head of #for-linus. And I'm definitely not folding it in until it gets more review and testing. Tested your latest #for-linus from here: https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus and the livelock is gone, Tested-by: Mika Westerberg mika.westerb...@linux.intel.com Thanks again! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 11:12:38AM +0300, Mika Westerberg wrote: Tested your latest #for-linus from here: https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus and the livelock is gone, Tested-by: Mika Westerberg mika.westerb...@linux.intel.com Thanks again! OK... I've just pushed another branch (#for-linus-2) with lock_parent() change folded into the commit that used to introduce rename_lock-based variant. Heads of the branches are byte-for-byte identical: al@duke:~/linux/trees/vfs$ git log vfs/for-linus|head -1 commit ebc6cb92bfeddf25462842f64604bc9fd2aab5b0 al@duke:~/linux/trees/vfs$ git log vfs/for-linus-2|head -1 commit 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 al@duke:~/linux/trees/vfs$ git diff ebc6cb92bfeddf25462842f64604bc9fd2aab5b0 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 al@duke:~/linux/trees/vfs$ so testing done on one of them obviously applies to another. FWIW, on #for-linus-2 we have Shortlog: Al Viro (6): lift the already marked killed case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's -d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now Diffstat: fs/dcache.c | 153 ++-- 1 file changed, 107 insertions(+), 46 deletions(-) Linus, how would you prefer it to be handled? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote: Linus, how would you prefer it to be handled? I'll just have to do an rc8. I really hoped to avoid it, because we're going on our family vacation when school is out in two weeks, and it causes problems for the merge window, but it's not like there is much choice - I can't do a 3.15 release with a known regression like that. So just send me the pull request, and I'll pull it. I'll probably do the let's increase the x86-64 stack size to 16kB too, to close _that_ issue as well. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote: Linus, how would you prefer it to be handled? I'll just have to do an rc8. I really hoped to avoid it, because we're going on our family vacation when school is out in two weeks, and it causes problems for the merge window, but it's not like there is much choice - I can't do a 3.15 release with a known regression like that. Sorry about that... ;-/ So just send me the pull request, and I'll pull it. I'll probably do the let's increase the x86-64 stack size to 16kB too, to close _that_ issue as well. OK, here it is: Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink list corruption; the root cause was that trylock of parent's -d_lock could be disrupted by d_walk() happening on other CPUs, resulting in shrink_dentry_list() making no progress *and* the same d_walk() being called again and again for as long as shrink_dentry_list() doesn't get past that mess. Solution is to have shrink_dentry_list() treat that trylock failure not as try to do the same thing again, but lock them in the right order. Please, pull from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 Shortlog: Al Viro (8): lift the already marked killed case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's -d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now d_prune_alias(): just lock the parent and call __dentry_kill() dcache.c: call -d_prune() regardless of d_unhashed() Diffstat: fs/dcache.c | 176 +--- 1 file changed, 115 insertions(+), 61 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote: On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote: Linus, how would you prefer it to be handled? I'll just have to do an rc8. I really hoped to avoid it, because we're going on our family vacation when school is out in two weeks, and it causes problems for the merge window, but it's not like there is much choice - I can't do a 3.15 release with a known regression like that. Sorry about that... ;-/ So just send me the pull request, and I'll pull it. I'll probably do the let's increase the x86-64 stack size to 16kB too, to close _that_ issue as well. OK, here it is: Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink list corruption; the root cause was that trylock of parent's -d_lock could be disrupted by d_walk() happening on other CPUs, resulting in shrink_dentry_list() making no progress *and* the same d_walk() being called again and again for as long as shrink_dentry_list() doesn't get past that mess. Solution is to have shrink_dentry_list() treat that trylock failure not as try to do the same thing again, but lock them in the right order. Please, pull from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 *GYAH* Shortlog and diffstat are from the local branch that has two more cleanups on top what's been pushed to vfs.git (and what had been tested). Ones matching what's really in that branch are here: Shortlog: Al Viro (6): lift the already marked killed case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's -d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now Diffstat: fs/dcache.c | 153 ++-- 1 file changed, 107 insertions(+), 46 deletions(-) My apologies - the script I'm using to generate shortlogs takes branch name as an argument, defaulting to HEAD, which was two commits past vfs/for-linus-2. And no, I'm _not_ planning to push that followup stuff until the merge window. Just to make sure: the branch to pull should have head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree as vfs.git#for-linus, which is what got testing. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Fri, May 30, 2014 at 6:48 PM, Al Viro v...@zeniv.linux.org.uk wrote: On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote: On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote: Linus, how would you prefer it to be handled? I'll just have to do an rc8. I really hoped to avoid it, because we're going on our family vacation when school is out in two weeks, and it causes problems for the merge window, but it's not like there is much choice - I can't do a 3.15 release with a known regression like that. Sorry about that... ;-/ So just send me the pull request, and I'll pull it. I'll probably do the let's increase the x86-64 stack size to 16kB too, to close _that_ issue as well. OK, here it is: Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink list corruption; the root cause was that trylock of parent's -d_lock could be disrupted by d_walk() happening on other CPUs, resulting in shrink_dentry_list() making no progress *and* the same d_walk() being called again and again for as long as shrink_dentry_list() doesn't get past that mess. Solution is to have shrink_dentry_list() treat that trylock failure not as try to do the same thing again, but lock them in the right order. Please, pull from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2 Shortlog: Al Viro (8): lift the already marked killed case into shrink_dentry_list() split dentry_kill() expand dentry_kill(dentry, 0) in shrink_dentry_list() shrink_dentry_list(): take parent's -d_lock earlier dealing with the rest of shrink_dentry_list() livelock dentry_kill() doesn't need the second argument now d_prune_alias(): just lock the parent and call __dentry_kill() dcache.c: call -d_prune() regardless of d_unhashed() Diffstat: fs/dcache.c | 176 +--- 1 file changed, 115 insertions(+), 61 deletions(-) Did you push all? $ git log --oneline v3.15-rc7.. 8cbf74d dentry_kill() doesn't need the second argument now b2b8019 dealing with the rest of shrink_dentry_list() livelock 046b961 shrink_dentry_list(): take parent's -d_lock earlier ff2fde9 expand dentry_kill(dentry, 0) in shrink_dentry_list() e55fd01 split dentry_kill() 64fd72e lift the already marked killed case into shrink_dentry_list() b6dd6f4 vfs: fix vmplice_to_user() - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 9:50 PM, Al Viro wrote: > > BTW, how serious is the problem with __lockref_is_dead(>d_lockref) > with only ->d_parent->d_lock held? From my reading of lib/lockref.c it > should be safe - we only do lockref_mark_dead() with ->d_parent->d_lock > held, and it'll provide all the serialization and barriers we need. > > If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely Yeah, I think that would be good. Except I think you should create a "dentry_is_dead()" helper function that then has that "if you hold the dentry or parent lock, this is safe" comment, because for lockref in general you do need to have the lock in the lockref itself. The fact that dentries have more locking is very much dentry-specific. But with that, go wild. I'd love to get rid of some of the redundant stuff. For another example, the BUG_ON((int)dentry->d_lockref.count > 0); test makes very little sense any more with lockrefs and the whole dead marker (that should make sure that it never gets incremented), but exists due to the direct conversion. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 12:14:51PM -0700, Linus Torvalds wrote: > Yeah, I don't think you can reproduce that, but I guess renaming > directories into each other (two renames needed) could trigger an ABBA > deadlock by changing the topological order of dentry/parent. > > I suspect there's no way in hell that tiny race will ever happen in > practice, but let's not risk it. > > And your solution (to re-check after just taking the parent lock) > seems sufficient and sane, since dentry_lock_for_move() will always > take the parent lock(s) before we move a dentry. > > So that looks good to me. BTW, how serious is the problem with __lockref_is_dead(>d_lockref) with only ->d_parent->d_lock held? From my reading of lib/lockref.c it should be safe - we only do lockref_mark_dead() with ->d_parent->d_lock held, and it'll provide all the serialization and barriers we need. If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely and replace checking for it with checking for negative ->d_lockref.count. There are two places where we check for it; in shrink_dentry_list() we definitely can go that way (we are holding ->d_lock there) and it simplifies the code nicely. In d_walk(), though (in the bit that used to be try_to_ascend() we only hold ->d_parent->d_lock. It looks like that ought to be safe to replace if (this_parent != child->d_parent || (child->d_flags & DCACHE_DENTRY_KILLED) || need_seqretry(_lock, seq)) { with if (this_parent != child->d_parent || __lockref_is_dead(>d_lockref) || need_seqretry(_lock, seq)) { and remove DCACHE_DENTRY_KILLED completely... The other user (in shrink_dentry_list()) simplifies to if (dentry->d_lockref.count != 0) { bool can_free = dentry->d_flags & DCACHE_MAY_FREE; spin_unlock(>d_lock); if (parent) spin_unlock(>d_lock); if (can_free) dentry_free(dentry); continue; } taking care of both the DCACHE_DENTRY_KILLED case and simple "lazy dget" one, and that one's definitely safe and worth doing. Would be nice if we could switch d_walk() one as well and kill that flag off, though... Basically, we have spin_lock(); spin_lock(); V = 1; lockref_mark_dead(); ... as the only place where R goes dead and we want to replace spin_lock(); if (V) ... with spin_lock(); if (__lockref_is_dead()) ... Unless I'm missing something subtle in lockref.c, that should be safe... Comments? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 11:52 AM, Al Viro wrote: > > Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on > success is trivial, but there's more serious problem: suppose dentries > involved get moved before we get to locking what we thought was parent. > We end up taking ->d_lock on two dentries that might be nowhere near each > other in the tree, with obvious nasty implications. Would be _very_ hard > to reproduce ;-/ Yeah, I don't think you can reproduce that, but I guess renaming directories into each other (two renames needed) could trigger an ABBA deadlock by changing the topological order of dentry/parent. I suspect there's no way in hell that tiny race will ever happen in practice, but let's not risk it. And your solution (to re-check after just taking the parent lock) seems sufficient and sane, since dentry_lock_for_move() will always take the parent lock(s) before we move a dentry. So that looks good to me. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote: > On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote: > > On Thu, May 29, 2014 at 9:23 AM, Al Viro wrote: > > > > > > BTW, lock_parent() might be better off if in contended case it would not > > > bother with rename_lock and did something like this: > > > again: > > > > Ack. I think that's much better. > > Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry), > not if (parent)). I'll wait with folding it back into the commit that > introduces lock_parent() until we get testing results... Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on success is trivial, but there's more serious problem: suppose dentries involved get moved before we get to locking what we thought was parent. We end up taking ->d_lock on two dentries that might be nowhere near each other in the tree, with obvious nasty implications. Would be _very_ hard to reproduce ;-/ AFAICS, the following would be safe, but I'd really appreciate any extra eyes on that sucker: static inline struct dentry *lock_parent(struct dentry *dentry) { struct dentry *parent = dentry->d_parent; if (IS_ROOT(dentry)) return NULL; if (likely(spin_trylock(>d_lock))) return parent; spin_unlock(>d_lock); rcu_read_lock(); again: parent = ACCESS_ONCE(dentry->d_parent); spin_lock(>d_lock); /* * We can't blindly lock dentry until we are sure * that we won't violate the locking order. * While parent->d_lock is not enough to stabilize * dentry->d_parent, it *is* enough to stabilize * dentry->d_parent == parent. */ if (unlikely(parent != dentry->d_parent)) { spin_unlock(>d_lock); goto again; } rcu_read_unlock(); if (parent != dentry) spin_lock(>d_lock); else parent = NULL; return parent; } That variant got force-pushed in place of the previous one, again at the head of #for-linus. And I'm definitely not folding it in until it gets more review and testing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote: > On Thu, May 29, 2014 at 9:23 AM, Al Viro wrote: > > > > BTW, lock_parent() might be better off if in contended case it would not > > bother with rename_lock and did something like this: > > again: > > Ack. I think that's much better. Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry), not if (parent)). I'll wait with folding it back into the commit that introduces lock_parent() until we get testing results... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 9:23 AM, Al Viro wrote: > > BTW, lock_parent() might be better off if in contended case it would not > bother with rename_lock and did something like this: > again: Ack. I think that's much better. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 04:44:54PM +0100, Al Viro wrote: > On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote: > > If so, though, that brings up two questions: > > > > (a) do we really want to be that aggressive? Can we ever traverse > > _past_ the point we're actually trying to shrink in > > shrink_dcache_parent()? > > Caller of shrink_dcache_parent() would better hold a reference to the > argument, or it might get freed right under us ;-) So no, we can't > go past that point - the subtree root will stay busy. > > The reason we want to be aggressive there is to avoid excessive iterations - > think what happens e.g. if we have a chain of N dentries, with nothing pinning > them (i.e. the last one has refcount 0, the first - 2, everything else - 1). > Simply doing dput() would result in O(N^2) vs. O(N)... > > > (b) why does the "dput()" (or rather, the dentry_kill()) locking > > logic have to retain the old trylock case rather than share the parent > > locking logic? > > > > I'm assuming the answer to (b) is that we can't afford to drop the > > dentry lock in dentry_kill(), but I'd like that answer to the "Why" to > > be documented somewhere. > > We actually might be able to do it that way (rechecking ->d_count after > lock_parent()), but I would really prefer to leave that until after -final. > I want to get profiling data from that first - dput() is a much hotter path > than shrink_dcache_parent() and friends... FWIW, I've just done more or less edible splitup of stuff past #for-linus - see #experimental-dentry_kill for that. Again, I really want to get profiling data to see if that hurts dput() - it takes ->d_lock on parent before the trylock on ->i_lock and in case of ->d_lock on parent being held by somebody else it bangs on rename_lock.lock cacheline. I'd expect that to be non-issue on any loads, but we need something stronger than my gut feelings... BTW, lock_parent() might be better off if in contended case it would not bother with rename_lock and did something like this: again: spin_unlock(>d_lock); rcu_read_lock(); parent = ACCESS_ONCE(dentry->d_parent); if (parent != dentry) spin_lock(>d_lock); spin_lock(>d_lock); if (likely(dentry->d_parent == parent)) { rcu_read_unlock(); return parent; } if (parent) spin_unlock(>d_lock); rcu_read_unlock(); goto again; It's almost certainly not worth bothering with right now, but if dput() starts using lock_parent(), it might be worth investigating... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote: > If so, though, that brings up two questions: > > (a) do we really want to be that aggressive? Can we ever traverse > _past_ the point we're actually trying to shrink in > shrink_dcache_parent()? Caller of shrink_dcache_parent() would better hold a reference to the argument, or it might get freed right under us ;-) So no, we can't go past that point - the subtree root will stay busy. The reason we want to be aggressive there is to avoid excessive iterations - think what happens e.g. if we have a chain of N dentries, with nothing pinning them (i.e. the last one has refcount 0, the first - 2, everything else - 1). Simply doing dput() would result in O(N^2) vs. O(N)... > (b) why does the "dput()" (or rather, the dentry_kill()) locking > logic have to retain the old trylock case rather than share the parent > locking logic? > > I'm assuming the answer to (b) is that we can't afford to drop the > dentry lock in dentry_kill(), but I'd like that answer to the "Why" to > be documented somewhere. We actually might be able to do it that way (rechecking ->d_count after lock_parent()), but I would really prefer to leave that until after -final. I want to get profiling data from that first - dput() is a much hotter path than shrink_dcache_parent() and friends... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 6:30 AM, Al Viro wrote: > > Great... OK, saner splitup of that sucker (equivalent to combination of > these two patches) is in vfs.git#for-linus. > > Review and testing would be very welcome. So looking at the "dealing with the rest of shrink_dentry_list() livelock" patch, I think the "release the parents" case is now big and complicated enough to be split into a function of its own. However, I have a bigger question too: the "release the parent" case _should_ be possible to do with just a "dput()" on the parent, and the only reason we're doing the special case is that since we are shrinking things, we try to be more aggressive and shrink all the parents if possible. Right? If so, though, that brings up two questions: (a) do we really want to be that aggressive? Can we ever traverse _past_ the point we're actually trying to shrink in shrink_dcache_parent()? (b) why does the "dput()" (or rather, the dentry_kill()) locking logic have to retain the old trylock case rather than share the parent locking logic? I'm assuming the answer to (b) is that we can't afford to drop the dentry lock in dentry_kill(), but I'd like that answer to the "Why" to be documented somewhere. I don't much care what the answer to (a) is, but again, it would be good to have that mentioned somewhere. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 02:30:36PM +0100, Al Viro wrote: > On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote: > > > With your both patches applied the problem is gone :-) > > > > I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs > > and didn't see the livelock at once. > > Great... OK, saner splitup of that sucker (equivalent to combination of > these two patches) is in vfs.git#for-linus. > > Review and testing would be very welcome. I'll re-test that branch first thing tomorrow morning (currently at home and the machine where this reproduces is my work desktop). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote: > With your both patches applied the problem is gone :-) > > I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs > and didn't see the livelock at once. Great... OK, saner splitup of that sucker (equivalent to combination of these two patches) is in vfs.git#for-linus. Review and testing would be very welcome. PS: The diff between that and aforementioned "both patches applied" is noise - dentry and parent declarations had been moved into the loop body there, while in #for-linus they are left alone. So Mika's testing results should apply to that one as well. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 01:51:07PM +0300, Mika Westerberg wrote: > On Thu, May 29, 2014 at 06:34:44AM +0100, Al Viro wrote: > > On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote: > > > On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote: > > > > On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: > > > > > > > > > OK, the warnings about averting your eyes very much apply; the thing > > > > > below > > > > > definitely needs more massage before it becomes acceptable (and no, > > > > > it's > > > > > not a single commit; I'm not that insane), but it changes behaviour > > > > > in the > > > > > way described above. Could you check if the livelock persists with > > > > > it? > > > > > No trace-generating code in there, so the logs should be compact > > > > > enough... > > > > > > > > Here's an updated patch, hopefully slightly less vomit-inducing. Should > > > > give the same behaviour as the previous one... Again, it's a cumulative > > > > diff - I'm still massaging the splitup here. > > > > > > BTW, it still leaves the "proceed to parent" case in shrink_dentry_list(); > > > in theory, it's also vulnerable to the same livelock. Can be dealt pretty > > > much the same way; I'd rather leave that one for right after -final, > > > though, > > > if the already posted variant turns out to be sufficient... > > > > ... which is (presumably) dealt with the incremental I'd just sent to Linus; > > seeing what kind of dumb mistakes I'm making, I'd better call it quits for > > tonight - it's 1:30am here and I didn't have anywhere near enough sleep > > yesterday. I'd appeciate if you could test the patch immediately > > upthread (from Message-ID: <20140529031149.ge18...@zeniv.linux.org.uk>) > > and see if it helps. There's an incremental on top of it (from > > Message-ID: <20140529052621.gh18...@zeniv.linux.org.uk>) that might or > > might not be a good idea. > > Thanks for the patch. > > I tested patch <20140529031149.ge18...@zeniv.linux.org.uk> and it seems > to improve things. After first plug/unplug I can see similar behaviour > but after a while it recovered. I did several iterations of plug/unplug > afterwards and didn't see the livelock to trigger. > > dmesg is attached. > > I'm going to try your incremental patch now. With your both patches applied the problem is gone :-) I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs and didn't see the livelock at once. Thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 01:51:07PM +0300, Mika Westerberg wrote: On Thu, May 29, 2014 at 06:34:44AM +0100, Al Viro wrote: On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote: On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote: On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable (and no, it's not a single commit; I'm not that insane), but it changes behaviour in the way described above. Could you check if the livelock persists with it? No trace-generating code in there, so the logs should be compact enough... Here's an updated patch, hopefully slightly less vomit-inducing. Should give the same behaviour as the previous one... Again, it's a cumulative diff - I'm still massaging the splitup here. BTW, it still leaves the proceed to parent case in shrink_dentry_list(); in theory, it's also vulnerable to the same livelock. Can be dealt pretty much the same way; I'd rather leave that one for right after -final, though, if the already posted variant turns out to be sufficient... ... which is (presumably) dealt with the incremental I'd just sent to Linus; seeing what kind of dumb mistakes I'm making, I'd better call it quits for tonight - it's 1:30am here and I didn't have anywhere near enough sleep yesterday. I'd appeciate if you could test the patch immediately upthread (from Message-ID: 20140529031149.ge18...@zeniv.linux.org.uk) and see if it helps. There's an incremental on top of it (from Message-ID: 20140529052621.gh18...@zeniv.linux.org.uk) that might or might not be a good idea. Thanks for the patch. I tested patch 20140529031149.ge18...@zeniv.linux.org.uk and it seems to improve things. After first plug/unplug I can see similar behaviour but after a while it recovered. I did several iterations of plug/unplug afterwards and didn't see the livelock to trigger. dmesg is attached. I'm going to try your incremental patch now. With your both patches applied the problem is gone :-) I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs and didn't see the livelock at once. Thanks a lot! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote: With your both patches applied the problem is gone :-) I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs and didn't see the livelock at once. Great... OK, saner splitup of that sucker (equivalent to combination of these two patches) is in vfs.git#for-linus. Review and testing would be very welcome. PS: The diff between that and aforementioned both patches applied is noise - dentry and parent declarations had been moved into the loop body there, while in #for-linus they are left alone. So Mika's testing results should apply to that one as well. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 02:30:36PM +0100, Al Viro wrote: On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote: With your both patches applied the problem is gone :-) I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs and didn't see the livelock at once. Great... OK, saner splitup of that sucker (equivalent to combination of these two patches) is in vfs.git#for-linus. Review and testing would be very welcome. I'll re-test that branch first thing tomorrow morning (currently at home and the machine where this reproduces is my work desktop). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 6:30 AM, Al Viro v...@zeniv.linux.org.uk wrote: Great... OK, saner splitup of that sucker (equivalent to combination of these two patches) is in vfs.git#for-linus. Review and testing would be very welcome. So looking at the dealing with the rest of shrink_dentry_list() livelock patch, I think the release the parents case is now big and complicated enough to be split into a function of its own. However, I have a bigger question too: the release the parent case _should_ be possible to do with just a dput() on the parent, and the only reason we're doing the special case is that since we are shrinking things, we try to be more aggressive and shrink all the parents if possible. Right? If so, though, that brings up two questions: (a) do we really want to be that aggressive? Can we ever traverse _past_ the point we're actually trying to shrink in shrink_dcache_parent()? (b) why does the dput() (or rather, the dentry_kill()) locking logic have to retain the old trylock case rather than share the parent locking logic? I'm assuming the answer to (b) is that we can't afford to drop the dentry lock in dentry_kill(), but I'd like that answer to the Why to be documented somewhere. I don't much care what the answer to (a) is, but again, it would be good to have that mentioned somewhere. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote: If so, though, that brings up two questions: (a) do we really want to be that aggressive? Can we ever traverse _past_ the point we're actually trying to shrink in shrink_dcache_parent()? Caller of shrink_dcache_parent() would better hold a reference to the argument, or it might get freed right under us ;-) So no, we can't go past that point - the subtree root will stay busy. The reason we want to be aggressive there is to avoid excessive iterations - think what happens e.g. if we have a chain of N dentries, with nothing pinning them (i.e. the last one has refcount 0, the first - 2, everything else - 1). Simply doing dput() would result in O(N^2) vs. O(N)... (b) why does the dput() (or rather, the dentry_kill()) locking logic have to retain the old trylock case rather than share the parent locking logic? I'm assuming the answer to (b) is that we can't afford to drop the dentry lock in dentry_kill(), but I'd like that answer to the Why to be documented somewhere. We actually might be able to do it that way (rechecking -d_count after lock_parent()), but I would really prefer to leave that until after -final. I want to get profiling data from that first - dput() is a much hotter path than shrink_dcache_parent() and friends... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 04:44:54PM +0100, Al Viro wrote: On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote: If so, though, that brings up two questions: (a) do we really want to be that aggressive? Can we ever traverse _past_ the point we're actually trying to shrink in shrink_dcache_parent()? Caller of shrink_dcache_parent() would better hold a reference to the argument, or it might get freed right under us ;-) So no, we can't go past that point - the subtree root will stay busy. The reason we want to be aggressive there is to avoid excessive iterations - think what happens e.g. if we have a chain of N dentries, with nothing pinning them (i.e. the last one has refcount 0, the first - 2, everything else - 1). Simply doing dput() would result in O(N^2) vs. O(N)... (b) why does the dput() (or rather, the dentry_kill()) locking logic have to retain the old trylock case rather than share the parent locking logic? I'm assuming the answer to (b) is that we can't afford to drop the dentry lock in dentry_kill(), but I'd like that answer to the Why to be documented somewhere. We actually might be able to do it that way (rechecking -d_count after lock_parent()), but I would really prefer to leave that until after -final. I want to get profiling data from that first - dput() is a much hotter path than shrink_dcache_parent() and friends... FWIW, I've just done more or less edible splitup of stuff past #for-linus - see #experimental-dentry_kill for that. Again, I really want to get profiling data to see if that hurts dput() - it takes -d_lock on parent before the trylock on -i_lock and in case of -d_lock on parent being held by somebody else it bangs on rename_lock.lock cacheline. I'd expect that to be non-issue on any loads, but we need something stronger than my gut feelings... BTW, lock_parent() might be better off if in contended case it would not bother with rename_lock and did something like this: again: spin_unlock(dentry-d_lock); rcu_read_lock(); parent = ACCESS_ONCE(dentry-d_parent); if (parent != dentry) spin_lock(parent-d_lock); spin_lock(dentry-d_lock); if (likely(dentry-d_parent == parent)) { rcu_read_unlock(); return parent; } if (parent) spin_unlock(parent-d_lock); rcu_read_unlock(); goto again; It's almost certainly not worth bothering with right now, but if dput() starts using lock_parent(), it might be worth investigating... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote: BTW, lock_parent() might be better off if in contended case it would not bother with rename_lock and did something like this: again: Ack. I think that's much better. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote: On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote: BTW, lock_parent() might be better off if in contended case it would not bother with rename_lock and did something like this: again: Ack. I think that's much better. Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry), not if (parent)). I'll wait with folding it back into the commit that introduces lock_parent() until we get testing results... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote: On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote: On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote: BTW, lock_parent() might be better off if in contended case it would not bother with rename_lock and did something like this: again: Ack. I think that's much better. Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry), not if (parent)). I'll wait with folding it back into the commit that introduces lock_parent() until we get testing results... Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on success is trivial, but there's more serious problem: suppose dentries involved get moved before we get to locking what we thought was parent. We end up taking -d_lock on two dentries that might be nowhere near each other in the tree, with obvious nasty implications. Would be _very_ hard to reproduce ;-/ AFAICS, the following would be safe, but I'd really appreciate any extra eyes on that sucker: static inline struct dentry *lock_parent(struct dentry *dentry) { struct dentry *parent = dentry-d_parent; if (IS_ROOT(dentry)) return NULL; if (likely(spin_trylock(parent-d_lock))) return parent; spin_unlock(dentry-d_lock); rcu_read_lock(); again: parent = ACCESS_ONCE(dentry-d_parent); spin_lock(parent-d_lock); /* * We can't blindly lock dentry until we are sure * that we won't violate the locking order. * While parent-d_lock is not enough to stabilize * dentry-d_parent, it *is* enough to stabilize * dentry-d_parent == parent. */ if (unlikely(parent != dentry-d_parent)) { spin_unlock(parent-d_lock); goto again; } rcu_read_unlock(); if (parent != dentry) spin_lock(dentry-d_lock); else parent = NULL; return parent; } That variant got force-pushed in place of the previous one, again at the head of #for-linus. And I'm definitely not folding it in until it gets more review and testing. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 11:52 AM, Al Viro v...@zeniv.linux.org.uk wrote: Grrr... Sadly, that's not good enough. Leaking rcu_read_lock() on success is trivial, but there's more serious problem: suppose dentries involved get moved before we get to locking what we thought was parent. We end up taking -d_lock on two dentries that might be nowhere near each other in the tree, with obvious nasty implications. Would be _very_ hard to reproduce ;-/ Yeah, I don't think you can reproduce that, but I guess renaming directories into each other (two renames needed) could trigger an ABBA deadlock by changing the topological order of dentry/parent. I suspect there's no way in hell that tiny race will ever happen in practice, but let's not risk it. And your solution (to re-check after just taking the parent lock) seems sufficient and sane, since dentry_lock_for_move() will always take the parent lock(s) before we move a dentry. So that looks good to me. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 12:14:51PM -0700, Linus Torvalds wrote: Yeah, I don't think you can reproduce that, but I guess renaming directories into each other (two renames needed) could trigger an ABBA deadlock by changing the topological order of dentry/parent. I suspect there's no way in hell that tiny race will ever happen in practice, but let's not risk it. And your solution (to re-check after just taking the parent lock) seems sufficient and sane, since dentry_lock_for_move() will always take the parent lock(s) before we move a dentry. So that looks good to me. BTW, how serious is the problem with __lockref_is_dead(dentry-d_lockref) with only -d_parent-d_lock held? From my reading of lib/lockref.c it should be safe - we only do lockref_mark_dead() with -d_parent-d_lock held, and it'll provide all the serialization and barriers we need. If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely and replace checking for it with checking for negative -d_lockref.count. There are two places where we check for it; in shrink_dentry_list() we definitely can go that way (we are holding -d_lock there) and it simplifies the code nicely. In d_walk(), though (in the bit that used to be try_to_ascend() we only hold -d_parent-d_lock. It looks like that ought to be safe to replace if (this_parent != child-d_parent || (child-d_flags DCACHE_DENTRY_KILLED) || need_seqretry(rename_lock, seq)) { with if (this_parent != child-d_parent || __lockref_is_dead(child-d_lockref) || need_seqretry(rename_lock, seq)) { and remove DCACHE_DENTRY_KILLED completely... The other user (in shrink_dentry_list()) simplifies to if (dentry-d_lockref.count != 0) { bool can_free = dentry-d_flags DCACHE_MAY_FREE; spin_unlock(dentry-d_lock); if (parent) spin_unlock(parent-d_lock); if (can_free) dentry_free(dentry); continue; } taking care of both the DCACHE_DENTRY_KILLED case and simple lazy dget one, and that one's definitely safe and worth doing. Would be nice if we could switch d_walk() one as well and kill that flag off, though... Basically, we have spin_lock(A); spin_lock(R.lock); V = 1; lockref_mark_dead(R); ... as the only place where R goes dead and we want to replace spin_lock(A); if (V) ... with spin_lock(A); if (__lockref_is_dead(R)) ... Unless I'm missing something subtle in lockref.c, that should be safe... Comments? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 9:50 PM, Al Viro v...@zeniv.linux.org.uk wrote: BTW, how serious is the problem with __lockref_is_dead(dentry-d_lockref) with only -d_parent-d_lock held? From my reading of lib/lockref.c it should be safe - we only do lockref_mark_dead() with -d_parent-d_lock held, and it'll provide all the serialization and barriers we need. If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely Yeah, I think that would be good. Except I think you should create a dentry_is_dead() helper function that then has that if you hold the dentry or parent lock, this is safe comment, because for lockref in general you do need to have the lock in the lockref itself. The fact that dentries have more locking is very much dentry-specific. But with that, go wild. I'd love to get rid of some of the redundant stuff. For another example, the BUG_ON((int)dentry-d_lockref.count 0); test makes very little sense any more with lockrefs and the whole dead marker (that should make sure that it never gets incremented), but exists due to the direct conversion. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote: > On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote: > > On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: > > > > > OK, the warnings about averting your eyes very much apply; the thing below > > > definitely needs more massage before it becomes acceptable (and no, it's > > > not a single commit; I'm not that insane), but it changes behaviour in the > > > way described above. Could you check if the livelock persists with it? > > > No trace-generating code in there, so the logs should be compact enough... > > > > Here's an updated patch, hopefully slightly less vomit-inducing. Should > > give the same behaviour as the previous one... Again, it's a cumulative > > diff - I'm still massaging the splitup here. > > BTW, it still leaves the "proceed to parent" case in shrink_dentry_list(); > in theory, it's also vulnerable to the same livelock. Can be dealt pretty > much the same way; I'd rather leave that one for right after -final, though, > if the already posted variant turns out to be sufficient... ... which is (presumably) dealt with the incremental I'd just sent to Linus; seeing what kind of dumb mistakes I'm making, I'd better call it quits for tonight - it's 1:30am here and I didn't have anywhere near enough sleep yesterday. I'd appeciate if you could test the patch immediately upthread (from Message-ID: <20140529031149.ge18...@zeniv.linux.org.uk>) and see if it helps. There's an incremental on top of it (from Message-ID: <20140529052621.gh18...@zeniv.linux.org.uk>) that might or might not be a good idea. I'm crawling to bed right now; back in ~7 hours... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 06:16:47AM +0100, Al Viro wrote: > On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote: > > On Wed, May 28, 2014 at 8:11 PM, Al Viro wrote: > > > > > > Here's an updated patch, hopefully slightly less vomit-inducing. > > > > Hmm. Less vomit-inducing, except for this part: > > > Ugh, that just *screams* for a helper function. Something like > > > > parent = get_parent_and_lock(dentry); > > > > or whatever, with that trylock/renamelock dance separated out. The > > rule would be that it would lock the "dentry", and return the > > (similarly locked) parent. Or NULL for a root dentry, of course. > > > > Please? > > Already done in my current tree; see below for the next commit in there... > I can separate that helper from the rest (dealing with the same livelock > for dentry_kill(dentry, 1) caller in shrink_dentry_list()). All of that > is very much subject to reordering, resplitting, writing more decent commit > messages, etc. Except that it's from the wrong branch ;-/ Correct one follows: commit fb860957449c07c6f1a1dd911e83a635b6f11f21 Author: Al Viro Date: Thu May 29 00:37:49 2014 -0400 deal with "put the parent" side of shrink_dentry_list() diff --git a/fs/dcache.c b/fs/dcache.c index 6c2a92e..628a791 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -793,30 +793,34 @@ restart: } EXPORT_SYMBOL(d_prune_aliases); -static void shrink_dentry_list(struct list_head *list) +static inline struct dentry *lock_parent(struct dentry *dentry) { - struct dentry *dentry, *parent; + struct dentry *parent = dentry->d_parent; + if (IS_ROOT(dentry)) + return NULL; + if (unlikely(!spin_trylock(>d_lock))) { + spin_unlock(>d_lock); + read_seqlock_excl(_lock); + parent = NULL; + if (!IS_ROOT(dentry)) { + parent = dentry->d_parent; + spin_lock(>d_lock); + } + read_sequnlock_excl(_lock); + spin_lock(>d_lock); + } + return parent; +} +static void shrink_dentry_list(struct list_head *list) +{ while (!list_empty(list)) { struct inode *inode; - dentry = list_entry(list->prev, struct dentry, d_lru); + struct dentry *dentry = list_entry(list->prev, struct dentry, d_lru); + struct dentry *parent; - parent = NULL; spin_lock(>d_lock); - if (!IS_ROOT(dentry)) { - parent = dentry->d_parent; - if (unlikely(!spin_trylock(>d_lock))) { - spin_unlock(>d_lock); - parent = NULL; - read_seqlock_excl(_lock); - if (!IS_ROOT(dentry)) { - parent = dentry->d_parent; - spin_lock(>d_lock); - } - read_sequnlock_excl(_lock); - spin_lock(>d_lock); - } - } + parent = lock_parent(dentry); /* * The dispose list is isolated and dentries are not accounted @@ -864,8 +868,26 @@ static void shrink_dentry_list(struct list_head *list) * fragmentation. */ dentry = parent; - while (dentry && !lockref_put_or_lock(>d_lockref)) - dentry = dentry_kill(dentry); + while (dentry && !lockref_put_or_lock(>d_lockref)) { + parent = lock_parent(dentry); + if (dentry->d_lockref.count != 1) { + dentry->d_lockref.count--; + spin_unlock(>d_lock); + if (parent) + spin_unlock(>d_lock); + break; + } + inode = dentry->d_inode;/* can't be NULL */ + if (unlikely(!spin_trylock(>i_lock))) { + spin_unlock(>d_lock); + if (parent) + spin_unlock(>d_lock); + cpu_relax(); + continue; + } + __dentry_kill(dentry); + dentry = parent; + } } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote: > On Wed, May 28, 2014 at 8:11 PM, Al Viro wrote: > > > > Here's an updated patch, hopefully slightly less vomit-inducing. > > Hmm. Less vomit-inducing, except for this part: > Ugh, that just *screams* for a helper function. Something like > > parent = get_parent_and_lock(dentry); > > or whatever, with that trylock/renamelock dance separated out. The > rule would be that it would lock the "dentry", and return the > (similarly locked) parent. Or NULL for a root dentry, of course. > > Please? Already done in my current tree; see below for the next commit in there... I can separate that helper from the rest (dealing with the same livelock for dentry_kill(dentry, 1) caller in shrink_dentry_list()). All of that is very much subject to reordering, resplitting, writing more decent commit messages, etc. commit 67e2554f6ca62e20363139f6f2968063410f0f3b Author: Al Viro Date: Wed May 28 23:53:36 2014 -0400 deal with "put the parent" side of shrink_dentry_list() Signed-off-by: Al Viro diff --git a/fs/dcache.c b/fs/dcache.c index 6c2a92e..7434169 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -793,30 +793,31 @@ restart: } EXPORT_SYMBOL(d_prune_aliases); -static void shrink_dentry_list(struct list_head *list) +static inline struct dentry *lock_parent(struct dentry *dentry) { - struct dentry *dentry, *parent; - - while (!list_empty(list)) { - struct inode *inode; - dentry = list_entry(list->prev, struct dentry, d_lru); - + struct dentry *parent = dentry->d_parent; + if (IS_ROOT(dentry)) + return NULL; + if (unlikely(!spin_trylock(>d_lock))) { + spin_unlock(>d_lock); + read_seqlock_excl(_lock); parent = NULL; - spin_lock(>d_lock); if (!IS_ROOT(dentry)) { parent = dentry->d_parent; - if (unlikely(!spin_trylock(>d_lock))) { - spin_unlock(>d_lock); - parent = NULL; - read_seqlock_excl(_lock); - if (!IS_ROOT(dentry)) { - parent = dentry->d_parent; - spin_lock(>d_lock); - } - read_sequnlock_excl(_lock); - spin_lock(>d_lock); - } + spin_lock(>d_lock); } + read_sequnlock_excl(_lock); + spin_lock(>d_lock); + } + return parent; +} + +static void shrink_dentry_list(struct list_head *list) +{ + while (!list_empty(list)) { + struct inode *inode; + struct dentry *dentry = list_entry(list->prev, struct dentry, d_lru); + struct dentry *parent = lock_parent(dentry); /* * The dispose list is isolated and dentries are not accounted @@ -864,8 +865,26 @@ static void shrink_dentry_list(struct list_head *list) * fragmentation. */ dentry = parent; - while (dentry && !lockref_put_or_lock(>d_lockref)) - dentry = dentry_kill(dentry); + while (dentry && !lockref_put_or_lock(>d_lockref)) { + parent = lock_parent(dentry); + if (dentry->d_lockref.count != 1) { + dentry->d_lockref.count--; + spin_unlock(>d_lock); + if (parent) + spin_unlock(>d_lock); + break; + } + inode = dentry->d_inode;/* can't be NULL */ + if (unlikely(!spin_trylock(>i_lock))) { + spin_unlock(>d_lock); + if (parent) + spin_unlock(>d_lock); + cpu_relax(); + continue; + } + __dentry_kill(dentry); + dentry = parent; + } } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 8:11 PM, Al Viro wrote: > > Here's an updated patch, hopefully slightly less vomit-inducing. Hmm. Less vomit-inducing, except for this part: > dentry = list_entry(list->prev, struct dentry, d_lru); > + > + parent = NULL; > spin_lock(>d_lock); > + if (!IS_ROOT(dentry)) { > + parent = dentry->d_parent; > + if (unlikely(!spin_trylock(>d_lock))) { > + spin_unlock(>d_lock); > + parent = NULL; > + read_seqlock_excl(_lock); > + if (!IS_ROOT(dentry)) { > + parent = dentry->d_parent; > + spin_lock(>d_lock); > + } > + read_sequnlock_excl(_lock); > + spin_lock(>d_lock); > + } > + } Ugh, that just *screams* for a helper function. Something like parent = get_parent_and_lock(dentry); or whatever, with that trylock/renamelock dance separated out. The rule would be that it would lock the "dentry", and return the (similarly locked) parent. Or NULL for a root dentry, of course. Please? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote: > On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: > > > OK, the warnings about averting your eyes very much apply; the thing below > > definitely needs more massage before it becomes acceptable (and no, it's > > not a single commit; I'm not that insane), but it changes behaviour in the > > way described above. Could you check if the livelock persists with it? > > No trace-generating code in there, so the logs should be compact enough... > > Here's an updated patch, hopefully slightly less vomit-inducing. Should > give the same behaviour as the previous one... Again, it's a cumulative > diff - I'm still massaging the splitup here. BTW, it still leaves the "proceed to parent" case in shrink_dentry_list(); in theory, it's also vulnerable to the same livelock. Can be dealt pretty much the same way; I'd rather leave that one for right after -final, though, if the already posted variant turns out to be sufficient... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: > OK, the warnings about averting your eyes very much apply; the thing below > definitely needs more massage before it becomes acceptable (and no, it's > not a single commit; I'm not that insane), but it changes behaviour in the > way described above. Could you check if the livelock persists with it? > No trace-generating code in there, so the logs should be compact enough... Here's an updated patch, hopefully slightly less vomit-inducing. Should give the same behaviour as the previous one... Again, it's a cumulative diff - I'm still massaging the splitup here. BTW, there's an interesting potential gotcha for ->d_delete() instances - it *is* possible to have it called several times (returning true every time it's called) for the same dentry. Suppose dput() is called and races with d_find_alias(). We get the following sequence: CPU1: dput sees d_count about to reach 0 CPU1: dput grabs ->d_lock CPU1: dput calls ->d_delete() and gets told that it shouldn't survive CPU2: d_find_alias() grabs ->i_lock CPU1: dput calls dentry_kill() CPU1: dentry_kill() tries to grab ->i_lock and fails CPU2: d_find_alias() calls __d_find_alias() CPU1: dentry_kill() drops ->d_lock and returns the same dentry CPU2: __d_find_alias() finds our dentry, grabs ->d_lock and bumps ->d_count CPU1: dput() finds ->d_count greater than 1, decrements it and returns CPU2: eventually calls dput(), leading to another call of ->d_delete(). Moral: don't assume that ->d_delete() returning 1 means that dentry won't survive; it's not a good place to destroy any data structures, etc. Fortunately, we have only 11 instances in the tree and only one has code making that assumption, the code in question being ifdefed out (lustre ;-/) Out-of-tree folks might easily step into that trap, so it's probably worth documenting - the comment in ll_ddelete() seems to indicate that its authors had only noticed the problem with locking environment, not that it is the wrong place for such code in the first place... diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..6c2a92e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry) } EXPORT_SYMBOL(d_drop); -/* - * Finish off a dentry we've decided to kill. - * dentry->d_lock must be held, returns with it unlocked. - * If ref is non-zero, then decrement the refcount too. - * Returns dentry requiring refcount drop, or NULL if we're done. - */ -static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) - __releases(dentry->d_lock) +static void __dentry_kill(struct dentry *dentry) { - struct inode *inode; struct dentry *parent = NULL; bool can_free = true; - - if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) { - can_free = dentry->d_flags & DCACHE_MAY_FREE; - spin_unlock(>d_lock); - goto out; - } - - inode = dentry->d_inode; - if (inode && !spin_trylock(>i_lock)) { -relock: - if (unlock_on_failure) { - spin_unlock(>d_lock); - cpu_relax(); - } - return dentry; /* try again with same dentry */ - } if (!IS_ROOT(dentry)) parent = dentry->d_parent; - if (parent && !spin_trylock(>d_lock)) { - if (inode) - spin_unlock(>i_lock); - goto relock; - } /* * The dentry is now unrecoverably dead to the world. @@ -520,10 +490,41 @@ relock: can_free = false; } spin_unlock(>d_lock); -out: if (likely(can_free)) dentry_free(dentry); +} + +/* + * Finish off a dentry we've decided to kill. + * dentry->d_lock must be held, returns with it unlocked. + * If ref is non-zero, then decrement the refcount too. + * Returns dentry requiring refcount drop, or NULL if we're done. + */ +static struct dentry *dentry_kill(struct dentry *dentry) + __releases(dentry->d_lock) +{ + struct inode *inode = dentry->d_inode; + struct dentry *parent = NULL; + + if (inode && unlikely(!spin_trylock(>i_lock))) + goto failed; + + if (!IS_ROOT(dentry)) { + parent = dentry->d_parent; + if (unlikely(!spin_trylock(>d_lock))) { + if (inode) + spin_unlock(>i_lock); + goto failed; + } + } + + __dentry_kill(dentry); return parent; + +failed: + spin_unlock(>d_lock); + cpu_relax(); + return dentry; /* try again with same dentry */ } /* @@ -579,7 +580,7 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry, 1); + dentry = dentry_kill(dentry); if (dentry) goto repeat; } @@ -797,8 +798,26 @@ static void shrink_dentry_list(struct
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 02:11:51PM -0700, Linus Torvalds wrote: > > d_walk() covers its arse with ->d_lock (and it wants the starting point > > to be pinned, obviously). So AFAICS RCU is not a problem. > > It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets > ->d_lock, but dentry_kill() drops d_lock in the middle, and the whole > "dentry_kill() can get called multiple times for the same dentry" just > makes me go "Hmm". It is *not* obvious what happens the second vs > third time that gets called. If it can get called two times, why not > three times? dentry_kill(dentry, 1) can only be called when ->d_count had been positive up to the point of caller grabbing ->d_lock. And it either does lockref_mark_dead() before dropping ->d_lock or it fails trylock and buggers off having done nothing, including changes of ->d_count. In case of dentry_kill(dentry, 0), ->d_lock has been held since the moment dentry had been found on shrink list. It either does nothing and dentry gets returned to the same shrink list without ever dropping ->d_lock (or changing refcount) or it does lockref_mark_dead() before dropping ->d_lock. In the latter case dentry will *not* be put on any shrink lists again. So we have at most one call of the first kind getting past the trylocks and at most one call of the second kind doing the same. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 1:14 PM, Al Viro wrote: > > Unless I'm badly misreading your patch, you are calling dentry_kill() > with rcu_read_lock() held. And that can trigger all sorts of interesting > things, starting with iput() and tons of disk IO... Yes, right you are. As to my worry: > d_walk() covers its arse with ->d_lock (and it wants the starting point > to be pinned, obviously). So AFAICS RCU is not a problem. It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets ->d_lock, but dentry_kill() drops d_lock in the middle, and the whole "dentry_kill() can get called multiple times for the same dentry" just makes me go "Hmm". It is *not* obvious what happens the second vs third time that gets called. If it can get called two times, why not three times? And if it can get called three times, why can't _two_ of them set "can_free" to true? That is absolutely *not* obvious to me. The aim of my patch (and I agree that sadly holding the rcu lock is not viable) was to make that "if it gets called twice" case _much_ more obvious. Because any but the first time will be very obviously a no-op (apart from taking and releasing the lock, which is ok for a rcu-delayed data structure). That's very much not the case with DCACHE_MAY_FREE. The behavior of that things os very much non-obvious, because different callers into dentry_kill() all do different things. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 01:02:23PM -0700, Linus Torvalds wrote: > Anyway, one reason I looked at this is that most of the threads in > Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole > d_walk() thing, and I had a *really* hard time convincing myself that > this was all safe without the RCU lock. I'm wondering if Mika perhaps > has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the > rename_lock sequence lock) do not end up being RCU-safe points. d_walk() covers its arse with ->d_lock (and it wants the starting point to be pinned, obviously). So AFAICS RCU is not a problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 12:43:24PM -0700, Linus Torvalds wrote: > It does require that the dentry shrinking code always hold the RCU > lock for reading, because others may actually be doing the final > dput() while the thing is on the shrinking list (and holding the RCU > lock is what protects the entry from actually being free'd). > > NOTE! I don't claim that this fixes anything, but I do think that it > makes that whole cross-thread complexity of that DCACHE_MAY_FREE go > away. I think it's easier to understand, and it removes code in the > process. Comments? Unless I'm badly misreading your patch, you are calling dentry_kill() with rcu_read_lock() held. And that can trigger all sorts of interesting things, starting with iput() and tons of disk IO... What am I missing here? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 12:43 PM, Linus Torvalds wrote: > - the shrinker list logic depends on the actual freeing of the dentry > to be delayed until the RCU grace period (already true for RCU-lookup > dentries) Side note: I just unconditionally removed the immediate __d_free() case, but it could as well have become - if (!(dentry->d_flags & DCACHE_RCUACCESS)) + if (!(dentry->d_flags & (DCACHE_RCUACCESS|DCACHE_SHRINK_LIST)) __d_free(>d_u.d_rcu); else call_rcu(>d_u.d_rcu, __d_free); instead. Anyway, one reason I looked at this is that most of the threads in Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole d_walk() thing, and I had a *really* hard time convincing myself that this was all safe without the RCU lock. I'm wondering if Mika perhaps has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the rename_lock sequence lock) do not end up being RCU-safe points. And once I started worrying about that, the whole "ok, who calls dentry_kill() when shrinking races with dput()" question started just making me worry about that whole DCACHE_MAY_FREE thing. Because one of the things that Miklos/Al's series of patches did was to drop some of the RCU locking, since it wasn't "necessary" any more as far as the lru list itself was concerned.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 11:39 AM, Al Viro wrote: > > OK, the warnings about averting your eyes very much apply; the thing below > definitely needs more massage before it becomes acceptable I've been looking at the this too, and I have to say, I absolutely hate your DCACHE_MAY_FREE logic. It makes it really hard to follow what the heck is happening across threads. So just to understand the code, how about this (UNTESTED!) patch? It gets rid of the DCACHE_MAY_FREE logic entirely, and makes the rules imho much more straightforward: - whoever calls "dentry_kill()" first is the one that frees things. - dentry_kill() does *not* touch the shrink list at all. In fact, *nothing* touches the shrink list, except for the shrinker. - the shrinkers remove entries from their own lists - the shrinker list logic depends on the actual freeing of the dentry to be delayed until the RCU grace period (already true for RCU-lookup dentries) In other words, that whole "may-free" thing goes away, the whole shrink-list locking issues go away, there are no subtle rules. Nobody else ever touches the shrink-list entries than the entity walking the shrink-lists. Once DCACHE_SHRINK_LIST is set, nobody else is It does require that the dentry shrinking code always hold the RCU lock for reading, because others may actually be doing the final dput() while the thing is on the shrinking list (and holding the RCU lock is what protects the entry from actually being free'd). NOTE! I don't claim that this fixes anything, but I do think that it makes that whole cross-thread complexity of that DCACHE_MAY_FREE go away. I think it's easier to understand, and it removes code in the process. Comments? Linus fs/dcache.c| 32 ++-- include/linux/dcache.h | 2 -- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01eefc07..6821479a378e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -248,11 +248,7 @@ static void __d_free(struct rcu_head *head) static void dentry_free(struct dentry *dentry) { - /* if dentry was never visible to RCU, immediate free is OK */ - if (!(dentry->d_flags & DCACHE_RCUACCESS)) - __d_free(>d_u.d_rcu); - else - call_rcu(>d_u.d_rcu, __d_free); + call_rcu(>d_u.d_rcu, __d_free); } /** @@ -453,12 +449,11 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure) { struct inode *inode; struct dentry *parent = NULL; - bool can_free = true; - if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) { - can_free = dentry->d_flags & DCACHE_MAY_FREE; + /* Did somebody else already free it? */ + if (__lockref_is_dead(>d_lockref)) { spin_unlock(>d_lock); - goto out; + return NULL; } inode = dentry->d_inode; @@ -514,15 +509,7 @@ relock: if (dentry->d_op && dentry->d_op->d_release) dentry->d_op->d_release(dentry); - spin_lock(>d_lock); - if (dentry->d_flags & DCACHE_SHRINK_LIST) { - dentry->d_flags |= DCACHE_MAY_FREE; - can_free = false; - } - spin_unlock(>d_lock); -out: - if (likely(can_free)) - dentry_free(dentry); + dentry_free(dentry); return parent; } @@ -921,9 +908,11 @@ long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, LIST_HEAD(dispose); long freed; + rcu_read_lock(); freed = list_lru_walk_node(>s_dentry_lru, nid, dentry_lru_isolate, , _to_scan); shrink_dentry_list(); + rcu_read_unlock(); return freed; } @@ -962,11 +951,13 @@ void shrink_dcache_sb(struct super_block *sb) do { LIST_HEAD(dispose); + rcu_read_lock(); freed = list_lru_walk(>s_dentry_lru, dentry_lru_isolate_shrink, , UINT_MAX); this_cpu_sub(nr_dentry_unused, freed); shrink_dentry_list(); + rcu_read_unlock(); } while (freed > 0); } EXPORT_SYMBOL(shrink_dcache_sb); @@ -1230,13 +1221,16 @@ void shrink_dcache_parent(struct dentry *parent) data.start = parent; data.found = 0; + rcu_read_lock(); d_walk(parent, , select_collect, NULL); if (!data.found) break; shrink_dentry_list(); + rcu_read_unlock(); cond_resched(); } + rcu_read_unlock(); } EXPORT_SYMBOL(shrink_dcache_parent); @@ -1338,11 +1332,13 @@ int check_submounts_and_drop(struct dentry *dentry) data.start = dentry; data.found = 0; + rcu_read_lock(); d_walk(dentry, , check_and_collect, check_and_drop); ret = data.found;
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 03:19:37PM +0100, Al Viro wrote: > OK, it's not ->i_lock, it's ->d_lock on parent being grabbed after that on > child, while d_walk() keeps taking them in opposite order. Hmm... > > In principle we could do the following: > * split dentry_kill() into the part that is taking locks and > the rest of it. > * in case of trylock failure have shrink_dentry_list() do > read_seqlock_excl(_lock) (which will stabilize ->d_parent) and > take ->d_lock in the right order, drop rename_lock and call __dentry_kill(). > > AFAICS, that would kill the livelock for good. We still have ->i_lock > trylock failures to deal with, but those are less of a problem - d_walk() > won't step on ->i_lock at all. I'm going to grab a couple of hours of sleep > and try to put together something along those lines... OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable (and no, it's not a single commit; I'm not that insane), but it changes behaviour in the way described above. Could you check if the livelock persists with it? No trace-generating code in there, so the logs should be compact enough... diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..ed0cc62 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry) } EXPORT_SYMBOL(d_drop); -/* - * Finish off a dentry we've decided to kill. - * dentry->d_lock must be held, returns with it unlocked. - * If ref is non-zero, then decrement the refcount too. - * Returns dentry requiring refcount drop, or NULL if we're done. - */ -static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) - __releases(dentry->d_lock) +static void __dentry_kill(struct dentry *dentry) { - struct inode *inode; struct dentry *parent = NULL; bool can_free = true; - - if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) { - can_free = dentry->d_flags & DCACHE_MAY_FREE; - spin_unlock(>d_lock); - goto out; - } - - inode = dentry->d_inode; - if (inode && !spin_trylock(>i_lock)) { -relock: - if (unlock_on_failure) { - spin_unlock(>d_lock); - cpu_relax(); - } - return dentry; /* try again with same dentry */ - } if (!IS_ROOT(dentry)) parent = dentry->d_parent; - if (parent && !spin_trylock(>d_lock)) { - if (inode) - spin_unlock(>i_lock); - goto relock; - } /* * The dentry is now unrecoverably dead to the world. @@ -520,10 +490,44 @@ relock: can_free = false; } spin_unlock(>d_lock); -out: if (likely(can_free)) dentry_free(dentry); +} + +/* + * Finish off a dentry we've decided to kill. + * dentry->d_lock must be held, returns with it unlocked. + * If ref is non-zero, then decrement the refcount too. + * Returns dentry requiring refcount drop, or NULL if we're done. + */ +static struct dentry * +dentry_kill(struct dentry *dentry, int unlock_on_failure) + __releases(dentry->d_lock) +{ + struct inode *inode = dentry->d_inode; + struct dentry *parent = NULL; + + if (inode && unlikely(!spin_trylock(>i_lock))) + goto failed; + + if (!IS_ROOT(dentry)) { + parent = dentry->d_parent; + if (unlikely(!spin_trylock(>d_lock))) { + if (inode) + spin_unlock(>i_lock); + goto failed; + } + } + + __dentry_kill(dentry); return parent; + +failed: + if (unlock_on_failure) { + spin_unlock(>d_lock); + cpu_relax(); + } + return dentry; /* try again with same dentry */ } /* @@ -797,6 +801,7 @@ static void shrink_dentry_list(struct list_head *list) struct dentry *dentry, *parent; while (!list_empty(list)) { + struct inode *inode; dentry = list_entry(list->prev, struct dentry, d_lru); spin_lock(>d_lock); /* @@ -815,23 +820,52 @@ static void shrink_dentry_list(struct list_head *list) continue; } - parent = dentry_kill(dentry, 0); - /* -* If dentry_kill returns NULL, we have nothing more to do. -*/ - if (!parent) + + if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) { + bool can_free = dentry->d_flags & DCACHE_MAY_FREE; + spin_unlock(>d_lock); + if (can_free) + dentry_free(dentry); continue; + } - if (unlikely(parent == dentry)) { -
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 04:11:36PM +0300, Mika Westerberg wrote: > I added this on top of your previus patch and unfortunately the livelock > is still there :-( > > >From the previous log, it looks like udev is doing exactly same operation > (check_submounts_and_drop) on 4 of the CPUs in paraller: 4 threads trying to resolve some pathnames in that subtree for some reason. All of them hitting ->d_revalidate() on the spot - basically, the first invalidated directory... OK, it's not ->i_lock, it's ->d_lock on parent being grabbed after that on child, while d_walk() keeps taking them in opposite order. Hmm... In principle we could do the following: * split dentry_kill() into the part that is taking locks and the rest of it. * in case of trylock failure have shrink_dentry_list() do read_seqlock_excl(_lock) (which will stabilize ->d_parent) and take ->d_lock in the right order, drop rename_lock and call __dentry_kill(). AFAICS, that would kill the livelock for good. We still have ->i_lock trylock failures to deal with, but those are less of a problem - d_walk() won't step on ->i_lock at all. I'm going to grab a couple of hours of sleep and try to put together something along those lines... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 12:57:01PM +0100, Al Viro wrote: > On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote: > > > I sent you the whole log privately so that I don't spam everyone. > > > > Summary is here: > > > > May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash > > Transcend 4GB8.07 PQ: 0 ANSI: 4 > > May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0 > > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical > > blocks: (4.02 GB/3.75 GiB) > > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off > > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00 > > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, > > read cache: enabled, doesn't support DPO or FUA > > May 28 10:24:23 lahna kernel: sdc: sdc1 > > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable > > disk > > > > Here I detached the USB stick: > > > > May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6 > > May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; > > CPU 4 PID 576 [systemd-udevd] > > May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 > > [systemd-udevd] > > May 28 10:24:32 lahna kernel: > > check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 > > [systemd-udevd] > > May 28 10:24:32 lahna kernel: > > shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 > > [systemd-udevd] > > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU > > 1 PID 1683 [systemd-udevd] > > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; > > CPU 1 PID 1683 [systemd-udevd] > > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; > > CPU 1 PID 1683 [systemd-udevd] > > May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; > > CPU 4 PID 576 [systemd-udevd] > > ... > > May 28 10:24:32 lahna kernel: > > check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 > > [systemd-udevd] > > May 28 10:24:32 lahna kernel: > > check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 > > [systemd-udevd] > > May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 > > PID 1685 [systemd-udevd] > > May 28 10:24:32 lahna kernel: A > ... > > Hmm... As it is, we have d_walk() trying _very_ hard in those situations. > Could you add that on top of the previous and see if livelock changes? > > diff --git a/fs/dcache.c b/fs/dcache.c > index 42ae01e..4ce58d3 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, > struct dentry *dentry) >* ensures forward progress). We'll be coming back to find >* the rest. >*/ > - if (!list_empty(>dispose)) > + if (data->found) > ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; > out: > return ret; I added this on top of your previus patch and unfortunately the livelock is still there :-( >From the previous log, it looks like udev is doing exactly same operation (check_submounts_and_drop) on 4 of the CPUs in paraller: May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 0 PID 1686 [systemd-udevd] I wonder why is that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote: > I sent you the whole log privately so that I don't spam everyone. > > Summary is here: > > May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash > Transcend 4GB8.07 PQ: 0 ANSI: 4 > May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0 > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical > blocks: (4.02 GB/3.75 GiB) > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00 > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read > cache: enabled, doesn't support DPO or FUA > May 28 10:24:23 lahna kernel: sdc: sdc1 > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk > > Here I detached the USB stick: > > May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6 > May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU > 4 PID 576 [systemd-udevd] > May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 > [systemd-udevd] > May 28 10:24:32 lahna kernel: > check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 > [systemd-udevd] > May 28 10:24:32 lahna kernel: > shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 > [systemd-udevd] > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 > PID 1683 [systemd-udevd] > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; > CPU 1 PID 1683 [systemd-udevd] > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU > 1 PID 1683 [systemd-udevd] > May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU > 4 PID 576 [systemd-udevd] > ... > May 28 10:24:32 lahna kernel: > check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 > [systemd-udevd] > May 28 10:24:32 lahna kernel: > check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 > [systemd-udevd] > May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID > 1685 [systemd-udevd] > May 28 10:24:32 lahna kernel: A ... Hmm... As it is, we have d_walk() trying _very_ hard in those situations. Could you add that on top of the previous and see if livelock changes? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..4ce58d3 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) * ensures forward progress). We'll be coming back to find * the rest. */ - if (!list_empty(>dispose)) + if (data->found) ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; out: return ret; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 04:19:55AM +0100, Al Viro wrote: > On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote: > > On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote: > > > On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote: > > > > > > > As the matter of fact, let's try this instead - retry the same sucker > > > > immediately in case if trylocks fail. Comments? > > > > > > Better yet, let's take "move back to shrink list" into dentry_kill() > > > itself. Then we get consistent locking rules for dentry_kill() and > > > instead of unlock_on_failure we simply pass it NULL or the shrink > > > list to put the sucker back. Mika, could you test this one and see > > > if it fixes that livelock? The difference in behaviour is that in > > > case of trylock failure we hit that sucker again without letting > > > it ride all the way around the list, same as we do for other dentry_kill() > > > callers. > > > > I tried this patch and unfortunately it still results the same sort of > > livelock. I've attached the dmesg. > > > > I also tried the serialization patch from Linus and it seemed to fix the > > problem. After several rounds of USB memory stick plug/unplug I haven't > > seen a single "soft lockup" warning in dmesg. > > > > I'm able to reproduce the problem pretty easily, so if you have > > something else to try I'm more than happy to give it a try. > > Could you try this and post the resulting log? I'd really like to understand > what's going on there - are we really hitting trylock failures there and what > dentries are involved. I sent you the whole log privately so that I don't spam everyone. Summary is here: May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash Transcend 4GB8.07 PQ: 0 ANSI: 4 May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical blocks: (4.02 GB/3.75 GiB) May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA May 28 10:24:23 lahna kernel: sdc: sdc1 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk Here I detached the USB stick: May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6 May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 4 PID 576 [systemd-udevd] ... May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A This goes on an on with a variable amount of 'A's printed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 12:43:24PM -0700, Linus Torvalds wrote: It does require that the dentry shrinking code always hold the RCU lock for reading, because others may actually be doing the final dput() while the thing is on the shrinking list (and holding the RCU lock is what protects the entry from actually being free'd). NOTE! I don't claim that this fixes anything, but I do think that it makes that whole cross-thread complexity of that DCACHE_MAY_FREE go away. I think it's easier to understand, and it removes code in the process. Comments? Unless I'm badly misreading your patch, you are calling dentry_kill() with rcu_read_lock() held. And that can trigger all sorts of interesting things, starting with iput() and tons of disk IO... What am I missing here? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 01:02:23PM -0700, Linus Torvalds wrote: Anyway, one reason I looked at this is that most of the threads in Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole d_walk() thing, and I had a *really* hard time convincing myself that this was all safe without the RCU lock. I'm wondering if Mika perhaps has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the rename_lock sequence lock) do not end up being RCU-safe points. d_walk() covers its arse with -d_lock (and it wants the starting point to be pinned, obviously). So AFAICS RCU is not a problem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 1:14 PM, Al Viro v...@zeniv.linux.org.uk wrote: Unless I'm badly misreading your patch, you are calling dentry_kill() with rcu_read_lock() held. And that can trigger all sorts of interesting things, starting with iput() and tons of disk IO... Yes, right you are. As to my worry: d_walk() covers its arse with -d_lock (and it wants the starting point to be pinned, obviously). So AFAICS RCU is not a problem. It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets -d_lock, but dentry_kill() drops d_lock in the middle, and the whole dentry_kill() can get called multiple times for the same dentry just makes me go Hmm. It is *not* obvious what happens the second vs third time that gets called. If it can get called two times, why not three times? And if it can get called three times, why can't _two_ of them set can_free to true? That is absolutely *not* obvious to me. The aim of my patch (and I agree that sadly holding the rcu lock is not viable) was to make that if it gets called twice case _much_ more obvious. Because any but the first time will be very obviously a no-op (apart from taking and releasing the lock, which is ok for a rcu-delayed data structure). That's very much not the case with DCACHE_MAY_FREE. The behavior of that things os very much non-obvious, because different callers into dentry_kill() all do different things. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 02:11:51PM -0700, Linus Torvalds wrote: d_walk() covers its arse with -d_lock (and it wants the starting point to be pinned, obviously). So AFAICS RCU is not a problem. It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets -d_lock, but dentry_kill() drops d_lock in the middle, and the whole dentry_kill() can get called multiple times for the same dentry just makes me go Hmm. It is *not* obvious what happens the second vs third time that gets called. If it can get called two times, why not three times? dentry_kill(dentry, 1) can only be called when -d_count had been positive up to the point of caller grabbing -d_lock. And it either does lockref_mark_dead() before dropping -d_lock or it fails trylock and buggers off having done nothing, including changes of -d_count. In case of dentry_kill(dentry, 0), -d_lock has been held since the moment dentry had been found on shrink list. It either does nothing and dentry gets returned to the same shrink list without ever dropping -d_lock (or changing refcount) or it does lockref_mark_dead() before dropping -d_lock. In the latter case dentry will *not* be put on any shrink lists again. So we have at most one call of the first kind getting past the trylocks and at most one call of the second kind doing the same. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable (and no, it's not a single commit; I'm not that insane), but it changes behaviour in the way described above. Could you check if the livelock persists with it? No trace-generating code in there, so the logs should be compact enough... Here's an updated patch, hopefully slightly less vomit-inducing. Should give the same behaviour as the previous one... Again, it's a cumulative diff - I'm still massaging the splitup here. BTW, there's an interesting potential gotcha for -d_delete() instances - it *is* possible to have it called several times (returning true every time it's called) for the same dentry. Suppose dput() is called and races with d_find_alias(). We get the following sequence: CPU1: dput sees d_count about to reach 0 CPU1: dput grabs -d_lock CPU1: dput calls -d_delete() and gets told that it shouldn't survive CPU2: d_find_alias() grabs -i_lock CPU1: dput calls dentry_kill() CPU1: dentry_kill() tries to grab -i_lock and fails CPU2: d_find_alias() calls __d_find_alias() CPU1: dentry_kill() drops -d_lock and returns the same dentry CPU2: __d_find_alias() finds our dentry, grabs -d_lock and bumps -d_count CPU1: dput() finds -d_count greater than 1, decrements it and returns CPU2: eventually calls dput(), leading to another call of -d_delete(). Moral: don't assume that -d_delete() returning 1 means that dentry won't survive; it's not a good place to destroy any data structures, etc. Fortunately, we have only 11 instances in the tree and only one has code making that assumption, the code in question being ifdefed out (lustre ;-/) Out-of-tree folks might easily step into that trap, so it's probably worth documenting - the comment in ll_ddelete() seems to indicate that its authors had only noticed the problem with locking environment, not that it is the wrong place for such code in the first place... diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..6c2a92e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry) } EXPORT_SYMBOL(d_drop); -/* - * Finish off a dentry we've decided to kill. - * dentry-d_lock must be held, returns with it unlocked. - * If ref is non-zero, then decrement the refcount too. - * Returns dentry requiring refcount drop, or NULL if we're done. - */ -static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) - __releases(dentry-d_lock) +static void __dentry_kill(struct dentry *dentry) { - struct inode *inode; struct dentry *parent = NULL; bool can_free = true; - - if (unlikely(dentry-d_flags DCACHE_DENTRY_KILLED)) { - can_free = dentry-d_flags DCACHE_MAY_FREE; - spin_unlock(dentry-d_lock); - goto out; - } - - inode = dentry-d_inode; - if (inode !spin_trylock(inode-i_lock)) { -relock: - if (unlock_on_failure) { - spin_unlock(dentry-d_lock); - cpu_relax(); - } - return dentry; /* try again with same dentry */ - } if (!IS_ROOT(dentry)) parent = dentry-d_parent; - if (parent !spin_trylock(parent-d_lock)) { - if (inode) - spin_unlock(inode-i_lock); - goto relock; - } /* * The dentry is now unrecoverably dead to the world. @@ -520,10 +490,41 @@ relock: can_free = false; } spin_unlock(dentry-d_lock); -out: if (likely(can_free)) dentry_free(dentry); +} + +/* + * Finish off a dentry we've decided to kill. + * dentry-d_lock must be held, returns with it unlocked. + * If ref is non-zero, then decrement the refcount too. + * Returns dentry requiring refcount drop, or NULL if we're done. + */ +static struct dentry *dentry_kill(struct dentry *dentry) + __releases(dentry-d_lock) +{ + struct inode *inode = dentry-d_inode; + struct dentry *parent = NULL; + + if (inode unlikely(!spin_trylock(inode-i_lock))) + goto failed; + + if (!IS_ROOT(dentry)) { + parent = dentry-d_parent; + if (unlikely(!spin_trylock(parent-d_lock))) { + if (inode) + spin_unlock(inode-i_lock); + goto failed; + } + } + + __dentry_kill(dentry); return parent; + +failed: + spin_unlock(dentry-d_lock); + cpu_relax(); + return dentry; /* try again with same dentry */ } /* @@ -579,7 +580,7 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry, 1); + dentry = dentry_kill(dentry); if (dentry) goto repeat; } @@ -797,8 +798,26 @@ static void
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote: On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable (and no, it's not a single commit; I'm not that insane), but it changes behaviour in the way described above. Could you check if the livelock persists with it? No trace-generating code in there, so the logs should be compact enough... Here's an updated patch, hopefully slightly less vomit-inducing. Should give the same behaviour as the previous one... Again, it's a cumulative diff - I'm still massaging the splitup here. BTW, it still leaves the proceed to parent case in shrink_dentry_list(); in theory, it's also vulnerable to the same livelock. Can be dealt pretty much the same way; I'd rather leave that one for right after -final, though, if the already posted variant turns out to be sufficient... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 8:11 PM, Al Viro v...@zeniv.linux.org.uk wrote: Here's an updated patch, hopefully slightly less vomit-inducing. Hmm. Less vomit-inducing, except for this part: dentry = list_entry(list-prev, struct dentry, d_lru); + + parent = NULL; spin_lock(dentry-d_lock); + if (!IS_ROOT(dentry)) { + parent = dentry-d_parent; + if (unlikely(!spin_trylock(parent-d_lock))) { + spin_unlock(dentry-d_lock); + parent = NULL; + read_seqlock_excl(rename_lock); + if (!IS_ROOT(dentry)) { + parent = dentry-d_parent; + spin_lock(parent-d_lock); + } + read_sequnlock_excl(rename_lock); + spin_lock(dentry-d_lock); + } + } Ugh, that just *screams* for a helper function. Something like parent = get_parent_and_lock(dentry); or whatever, with that trylock/renamelock dance separated out. The rule would be that it would lock the dentry, and return the (similarly locked) parent. Or NULL for a root dentry, of course. Please? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote: On Wed, May 28, 2014 at 8:11 PM, Al Viro v...@zeniv.linux.org.uk wrote: Here's an updated patch, hopefully slightly less vomit-inducing. Hmm. Less vomit-inducing, except for this part: Ugh, that just *screams* for a helper function. Something like parent = get_parent_and_lock(dentry); or whatever, with that trylock/renamelock dance separated out. The rule would be that it would lock the dentry, and return the (similarly locked) parent. Or NULL for a root dentry, of course. Please? Already done in my current tree; see below for the next commit in there... I can separate that helper from the rest (dealing with the same livelock for dentry_kill(dentry, 1) caller in shrink_dentry_list()). All of that is very much subject to reordering, resplitting, writing more decent commit messages, etc. commit 67e2554f6ca62e20363139f6f2968063410f0f3b Author: Al Viro v...@zeniv.linux.org.uk Date: Wed May 28 23:53:36 2014 -0400 deal with put the parent side of shrink_dentry_list() Signed-off-by: Al Viro v...@zeniv.linux.org.uk diff --git a/fs/dcache.c b/fs/dcache.c index 6c2a92e..7434169 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -793,30 +793,31 @@ restart: } EXPORT_SYMBOL(d_prune_aliases); -static void shrink_dentry_list(struct list_head *list) +static inline struct dentry *lock_parent(struct dentry *dentry) { - struct dentry *dentry, *parent; - - while (!list_empty(list)) { - struct inode *inode; - dentry = list_entry(list-prev, struct dentry, d_lru); - + struct dentry *parent = dentry-d_parent; + if (IS_ROOT(dentry)) + return NULL; + if (unlikely(!spin_trylock(parent-d_lock))) { + spin_unlock(dentry-d_lock); + read_seqlock_excl(rename_lock); parent = NULL; - spin_lock(dentry-d_lock); if (!IS_ROOT(dentry)) { parent = dentry-d_parent; - if (unlikely(!spin_trylock(parent-d_lock))) { - spin_unlock(dentry-d_lock); - parent = NULL; - read_seqlock_excl(rename_lock); - if (!IS_ROOT(dentry)) { - parent = dentry-d_parent; - spin_lock(parent-d_lock); - } - read_sequnlock_excl(rename_lock); - spin_lock(dentry-d_lock); - } + spin_lock(parent-d_lock); } + read_sequnlock_excl(rename_lock); + spin_lock(dentry-d_lock); + } + return parent; +} + +static void shrink_dentry_list(struct list_head *list) +{ + while (!list_empty(list)) { + struct inode *inode; + struct dentry *dentry = list_entry(list-prev, struct dentry, d_lru); + struct dentry *parent = lock_parent(dentry); /* * The dispose list is isolated and dentries are not accounted @@ -864,8 +865,26 @@ static void shrink_dentry_list(struct list_head *list) * fragmentation. */ dentry = parent; - while (dentry !lockref_put_or_lock(dentry-d_lockref)) - dentry = dentry_kill(dentry); + while (dentry !lockref_put_or_lock(dentry-d_lockref)) { + parent = lock_parent(dentry); + if (dentry-d_lockref.count != 1) { + dentry-d_lockref.count--; + spin_unlock(dentry-d_lock); + if (parent) + spin_unlock(parent-d_lock); + break; + } + inode = dentry-d_inode;/* can't be NULL */ + if (unlikely(!spin_trylock(inode-i_lock))) { + spin_unlock(dentry-d_lock); + if (parent) + spin_unlock(parent-d_lock); + cpu_relax(); + continue; + } + __dentry_kill(dentry); + dentry = parent; + } } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 06:16:47AM +0100, Al Viro wrote: On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote: On Wed, May 28, 2014 at 8:11 PM, Al Viro v...@zeniv.linux.org.uk wrote: Here's an updated patch, hopefully slightly less vomit-inducing. Hmm. Less vomit-inducing, except for this part: Ugh, that just *screams* for a helper function. Something like parent = get_parent_and_lock(dentry); or whatever, with that trylock/renamelock dance separated out. The rule would be that it would lock the dentry, and return the (similarly locked) parent. Or NULL for a root dentry, of course. Please? Already done in my current tree; see below for the next commit in there... I can separate that helper from the rest (dealing with the same livelock for dentry_kill(dentry, 1) caller in shrink_dentry_list()). All of that is very much subject to reordering, resplitting, writing more decent commit messages, etc. Except that it's from the wrong branch ;-/ Correct one follows: commit fb860957449c07c6f1a1dd911e83a635b6f11f21 Author: Al Viro v...@zeniv.linux.org.uk Date: Thu May 29 00:37:49 2014 -0400 deal with put the parent side of shrink_dentry_list() diff --git a/fs/dcache.c b/fs/dcache.c index 6c2a92e..628a791 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -793,30 +793,34 @@ restart: } EXPORT_SYMBOL(d_prune_aliases); -static void shrink_dentry_list(struct list_head *list) +static inline struct dentry *lock_parent(struct dentry *dentry) { - struct dentry *dentry, *parent; + struct dentry *parent = dentry-d_parent; + if (IS_ROOT(dentry)) + return NULL; + if (unlikely(!spin_trylock(parent-d_lock))) { + spin_unlock(dentry-d_lock); + read_seqlock_excl(rename_lock); + parent = NULL; + if (!IS_ROOT(dentry)) { + parent = dentry-d_parent; + spin_lock(parent-d_lock); + } + read_sequnlock_excl(rename_lock); + spin_lock(dentry-d_lock); + } + return parent; +} +static void shrink_dentry_list(struct list_head *list) +{ while (!list_empty(list)) { struct inode *inode; - dentry = list_entry(list-prev, struct dentry, d_lru); + struct dentry *dentry = list_entry(list-prev, struct dentry, d_lru); + struct dentry *parent; - parent = NULL; spin_lock(dentry-d_lock); - if (!IS_ROOT(dentry)) { - parent = dentry-d_parent; - if (unlikely(!spin_trylock(parent-d_lock))) { - spin_unlock(dentry-d_lock); - parent = NULL; - read_seqlock_excl(rename_lock); - if (!IS_ROOT(dentry)) { - parent = dentry-d_parent; - spin_lock(parent-d_lock); - } - read_sequnlock_excl(rename_lock); - spin_lock(dentry-d_lock); - } - } + parent = lock_parent(dentry); /* * The dispose list is isolated and dentries are not accounted @@ -864,8 +868,26 @@ static void shrink_dentry_list(struct list_head *list) * fragmentation. */ dentry = parent; - while (dentry !lockref_put_or_lock(dentry-d_lockref)) - dentry = dentry_kill(dentry); + while (dentry !lockref_put_or_lock(dentry-d_lockref)) { + parent = lock_parent(dentry); + if (dentry-d_lockref.count != 1) { + dentry-d_lockref.count--; + spin_unlock(dentry-d_lock); + if (parent) + spin_unlock(parent-d_lock); + break; + } + inode = dentry-d_inode;/* can't be NULL */ + if (unlikely(!spin_trylock(inode-i_lock))) { + spin_unlock(dentry-d_lock); + if (parent) + spin_unlock(parent-d_lock); + cpu_relax(); + continue; + } + __dentry_kill(dentry); + dentry = parent; + } } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote: On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote: On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote: OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable (and no, it's not a single commit; I'm not that insane), but it changes behaviour in the way described above. Could you check if the livelock persists with it? No trace-generating code in there, so the logs should be compact enough... Here's an updated patch, hopefully slightly less vomit-inducing. Should give the same behaviour as the previous one... Again, it's a cumulative diff - I'm still massaging the splitup here. BTW, it still leaves the proceed to parent case in shrink_dentry_list(); in theory, it's also vulnerable to the same livelock. Can be dealt pretty much the same way; I'd rather leave that one for right after -final, though, if the already posted variant turns out to be sufficient... ... which is (presumably) dealt with the incremental I'd just sent to Linus; seeing what kind of dumb mistakes I'm making, I'd better call it quits for tonight - it's 1:30am here and I didn't have anywhere near enough sleep yesterday. I'd appeciate if you could test the patch immediately upthread (from Message-ID: 20140529031149.ge18...@zeniv.linux.org.uk) and see if it helps. There's an incremental on top of it (from Message-ID: 20140529052621.gh18...@zeniv.linux.org.uk) that might or might not be a good idea. I'm crawling to bed right now; back in ~7 hours... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 04:19:55AM +0100, Al Viro wrote: On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote: On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote: On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote: As the matter of fact, let's try this instead - retry the same sucker immediately in case if trylocks fail. Comments? Better yet, let's take move back to shrink list into dentry_kill() itself. Then we get consistent locking rules for dentry_kill() and instead of unlock_on_failure we simply pass it NULL or the shrink list to put the sucker back. Mika, could you test this one and see if it fixes that livelock? The difference in behaviour is that in case of trylock failure we hit that sucker again without letting it ride all the way around the list, same as we do for other dentry_kill() callers. I tried this patch and unfortunately it still results the same sort of livelock. I've attached the dmesg. I also tried the serialization patch from Linus and it seemed to fix the problem. After several rounds of USB memory stick plug/unplug I haven't seen a single soft lockup warning in dmesg. I'm able to reproduce the problem pretty easily, so if you have something else to try I'm more than happy to give it a try. Could you try this and post the resulting log? I'd really like to understand what's going on there - are we really hitting trylock failures there and what dentries are involved. I sent you the whole log privately so that I don't spam everyone. Summary is here: May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash Transcend 4GB8.07 PQ: 0 ANSI: 4 May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical blocks: (4.02 GB/3.75 GiB) May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA May 28 10:24:23 lahna kernel: sdc: sdc1 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk Here I detached the USB stick: May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6 May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 4 PID 576 [systemd-udevd] ... May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A This goes on an on with a variable amount of 'A's printed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote: I sent you the whole log privately so that I don't spam everyone. Summary is here: May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash Transcend 4GB8.07 PQ: 0 ANSI: 4 May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical blocks: (4.02 GB/3.75 GiB) May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA May 28 10:24:23 lahna kernel: sdc: sdc1 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk Here I detached the USB stick: May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6 May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 4 PID 576 [systemd-udevd] ... May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A ... Hmm... As it is, we have d_walk() trying _very_ hard in those situations. Could you add that on top of the previous and see if livelock changes? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..4ce58d3 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) * ensures forward progress). We'll be coming back to find * the rest. */ - if (!list_empty(data-dispose)) + if (data-found) ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; out: return ret; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 12:57:01PM +0100, Al Viro wrote: On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote: I sent you the whole log privately so that I don't spam everyone. Summary is here: May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash Transcend 4GB8.07 PQ: 0 ANSI: 4 May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical blocks: (4.02 GB/3.75 GiB) May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA May 28 10:24:23 lahna kernel: sdc: sdc1 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk Here I detached the USB stick: May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6 May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 4 PID 576 [systemd-udevd] ... May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: A ... Hmm... As it is, we have d_walk() trying _very_ hard in those situations. Could you add that on top of the previous and see if livelock changes? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..4ce58d3 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) * ensures forward progress). We'll be coming back to find * the rest. */ - if (!list_empty(data-dispose)) + if (data-found) ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY; out: return ret; I added this on top of your previus patch and unfortunately the livelock is still there :-( From the previous log, it looks like udev is doing exactly same operation (check_submounts_and_drop) on 4 of the CPUs in paraller: May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 1 PID 1683 [systemd-udevd] May 28 10:24:32 lahna kernel: check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 0 PID 1686 [systemd-udevd] I wonder why is that. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 04:11:36PM +0300, Mika Westerberg wrote: I added this on top of your previus patch and unfortunately the livelock is still there :-( From the previous log, it looks like udev is doing exactly same operation (check_submounts_and_drop) on 4 of the CPUs in paraller: 4 threads trying to resolve some pathnames in that subtree for some reason. All of them hitting -d_revalidate() on the spot - basically, the first invalidated directory... OK, it's not -i_lock, it's -d_lock on parent being grabbed after that on child, while d_walk() keeps taking them in opposite order. Hmm... In principle we could do the following: * split dentry_kill() into the part that is taking locks and the rest of it. * in case of trylock failure have shrink_dentry_list() do read_seqlock_excl(rename_lock) (which will stabilize -d_parent) and take -d_lock in the right order, drop rename_lock and call __dentry_kill(). AFAICS, that would kill the livelock for good. We still have -i_lock trylock failures to deal with, but those are less of a problem - d_walk() won't step on -i_lock at all. I'm going to grab a couple of hours of sleep and try to put together something along those lines... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 03:19:37PM +0100, Al Viro wrote: OK, it's not -i_lock, it's -d_lock on parent being grabbed after that on child, while d_walk() keeps taking them in opposite order. Hmm... In principle we could do the following: * split dentry_kill() into the part that is taking locks and the rest of it. * in case of trylock failure have shrink_dentry_list() do read_seqlock_excl(rename_lock) (which will stabilize -d_parent) and take -d_lock in the right order, drop rename_lock and call __dentry_kill(). AFAICS, that would kill the livelock for good. We still have -i_lock trylock failures to deal with, but those are less of a problem - d_walk() won't step on -i_lock at all. I'm going to grab a couple of hours of sleep and try to put together something along those lines... OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable (and no, it's not a single commit; I'm not that insane), but it changes behaviour in the way described above. Could you check if the livelock persists with it? No trace-generating code in there, so the logs should be compact enough... diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..ed0cc62 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry) } EXPORT_SYMBOL(d_drop); -/* - * Finish off a dentry we've decided to kill. - * dentry-d_lock must be held, returns with it unlocked. - * If ref is non-zero, then decrement the refcount too. - * Returns dentry requiring refcount drop, or NULL if we're done. - */ -static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) - __releases(dentry-d_lock) +static void __dentry_kill(struct dentry *dentry) { - struct inode *inode; struct dentry *parent = NULL; bool can_free = true; - - if (unlikely(dentry-d_flags DCACHE_DENTRY_KILLED)) { - can_free = dentry-d_flags DCACHE_MAY_FREE; - spin_unlock(dentry-d_lock); - goto out; - } - - inode = dentry-d_inode; - if (inode !spin_trylock(inode-i_lock)) { -relock: - if (unlock_on_failure) { - spin_unlock(dentry-d_lock); - cpu_relax(); - } - return dentry; /* try again with same dentry */ - } if (!IS_ROOT(dentry)) parent = dentry-d_parent; - if (parent !spin_trylock(parent-d_lock)) { - if (inode) - spin_unlock(inode-i_lock); - goto relock; - } /* * The dentry is now unrecoverably dead to the world. @@ -520,10 +490,44 @@ relock: can_free = false; } spin_unlock(dentry-d_lock); -out: if (likely(can_free)) dentry_free(dentry); +} + +/* + * Finish off a dentry we've decided to kill. + * dentry-d_lock must be held, returns with it unlocked. + * If ref is non-zero, then decrement the refcount too. + * Returns dentry requiring refcount drop, or NULL if we're done. + */ +static struct dentry * +dentry_kill(struct dentry *dentry, int unlock_on_failure) + __releases(dentry-d_lock) +{ + struct inode *inode = dentry-d_inode; + struct dentry *parent = NULL; + + if (inode unlikely(!spin_trylock(inode-i_lock))) + goto failed; + + if (!IS_ROOT(dentry)) { + parent = dentry-d_parent; + if (unlikely(!spin_trylock(parent-d_lock))) { + if (inode) + spin_unlock(inode-i_lock); + goto failed; + } + } + + __dentry_kill(dentry); return parent; + +failed: + if (unlock_on_failure) { + spin_unlock(dentry-d_lock); + cpu_relax(); + } + return dentry; /* try again with same dentry */ } /* @@ -797,6 +801,7 @@ static void shrink_dentry_list(struct list_head *list) struct dentry *dentry, *parent; while (!list_empty(list)) { + struct inode *inode; dentry = list_entry(list-prev, struct dentry, d_lru); spin_lock(dentry-d_lock); /* @@ -815,23 +820,52 @@ static void shrink_dentry_list(struct list_head *list) continue; } - parent = dentry_kill(dentry, 0); - /* -* If dentry_kill returns NULL, we have nothing more to do. -*/ - if (!parent) + + if (unlikely(dentry-d_flags DCACHE_DENTRY_KILLED)) { + bool can_free = dentry-d_flags DCACHE_MAY_FREE; + spin_unlock(dentry-d_lock); + if (can_free) + dentry_free(dentry); continue; + } - if
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 11:39 AM, Al Viro v...@zeniv.linux.org.uk wrote: OK, the warnings about averting your eyes very much apply; the thing below definitely needs more massage before it becomes acceptable I've been looking at the this too, and I have to say, I absolutely hate your DCACHE_MAY_FREE logic. It makes it really hard to follow what the heck is happening across threads. So just to understand the code, how about this (UNTESTED!) patch? It gets rid of the DCACHE_MAY_FREE logic entirely, and makes the rules imho much more straightforward: - whoever calls dentry_kill() first is the one that frees things. - dentry_kill() does *not* touch the shrink list at all. In fact, *nothing* touches the shrink list, except for the shrinker. - the shrinkers remove entries from their own lists - the shrinker list logic depends on the actual freeing of the dentry to be delayed until the RCU grace period (already true for RCU-lookup dentries) In other words, that whole may-free thing goes away, the whole shrink-list locking issues go away, there are no subtle rules. Nobody else ever touches the shrink-list entries than the entity walking the shrink-lists. Once DCACHE_SHRINK_LIST is set, nobody else is It does require that the dentry shrinking code always hold the RCU lock for reading, because others may actually be doing the final dput() while the thing is on the shrinking list (and holding the RCU lock is what protects the entry from actually being free'd). NOTE! I don't claim that this fixes anything, but I do think that it makes that whole cross-thread complexity of that DCACHE_MAY_FREE go away. I think it's easier to understand, and it removes code in the process. Comments? Linus fs/dcache.c| 32 ++-- include/linux/dcache.h | 2 -- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01eefc07..6821479a378e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -248,11 +248,7 @@ static void __d_free(struct rcu_head *head) static void dentry_free(struct dentry *dentry) { - /* if dentry was never visible to RCU, immediate free is OK */ - if (!(dentry-d_flags DCACHE_RCUACCESS)) - __d_free(dentry-d_u.d_rcu); - else - call_rcu(dentry-d_u.d_rcu, __d_free); + call_rcu(dentry-d_u.d_rcu, __d_free); } /** @@ -453,12 +449,11 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure) { struct inode *inode; struct dentry *parent = NULL; - bool can_free = true; - if (unlikely(dentry-d_flags DCACHE_DENTRY_KILLED)) { - can_free = dentry-d_flags DCACHE_MAY_FREE; + /* Did somebody else already free it? */ + if (__lockref_is_dead(dentry-d_lockref)) { spin_unlock(dentry-d_lock); - goto out; + return NULL; } inode = dentry-d_inode; @@ -514,15 +509,7 @@ relock: if (dentry-d_op dentry-d_op-d_release) dentry-d_op-d_release(dentry); - spin_lock(dentry-d_lock); - if (dentry-d_flags DCACHE_SHRINK_LIST) { - dentry-d_flags |= DCACHE_MAY_FREE; - can_free = false; - } - spin_unlock(dentry-d_lock); -out: - if (likely(can_free)) - dentry_free(dentry); + dentry_free(dentry); return parent; } @@ -921,9 +908,11 @@ long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, LIST_HEAD(dispose); long freed; + rcu_read_lock(); freed = list_lru_walk_node(sb-s_dentry_lru, nid, dentry_lru_isolate, dispose, nr_to_scan); shrink_dentry_list(dispose); + rcu_read_unlock(); return freed; } @@ -962,11 +951,13 @@ void shrink_dcache_sb(struct super_block *sb) do { LIST_HEAD(dispose); + rcu_read_lock(); freed = list_lru_walk(sb-s_dentry_lru, dentry_lru_isolate_shrink, dispose, UINT_MAX); this_cpu_sub(nr_dentry_unused, freed); shrink_dentry_list(dispose); + rcu_read_unlock(); } while (freed 0); } EXPORT_SYMBOL(shrink_dcache_sb); @@ -1230,13 +1221,16 @@ void shrink_dcache_parent(struct dentry *parent) data.start = parent; data.found = 0; + rcu_read_lock(); d_walk(parent, data, select_collect, NULL); if (!data.found) break; shrink_dentry_list(data.dispose); + rcu_read_unlock(); cond_resched(); } + rcu_read_unlock(); } EXPORT_SYMBOL(shrink_dcache_parent); @@ -1338,11 +1332,13 @@ int check_submounts_and_drop(struct dentry *dentry) data.start = dentry; data.found = 0; + rcu_read_lock();
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Wed, May 28, 2014 at 12:43 PM, Linus Torvalds torva...@linux-foundation.org wrote: - the shrinker list logic depends on the actual freeing of the dentry to be delayed until the RCU grace period (already true for RCU-lookup dentries) Side note: I just unconditionally removed the immediate __d_free() case, but it could as well have become - if (!(dentry-d_flags DCACHE_RCUACCESS)) + if (!(dentry-d_flags (DCACHE_RCUACCESS|DCACHE_SHRINK_LIST)) __d_free(dentry-d_u.d_rcu); else call_rcu(dentry-d_u.d_rcu, __d_free); instead. Anyway, one reason I looked at this is that most of the threads in Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole d_walk() thing, and I had a *really* hard time convincing myself that this was all safe without the RCU lock. I'm wondering if Mika perhaps has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the rename_lock sequence lock) do not end up being RCU-safe points. And once I started worrying about that, the whole ok, who calls dentry_kill() when shrinking races with dput() question started just making me worry about that whole DCACHE_MAY_FREE thing. Because one of the things that Miklos/Al's series of patches did was to drop some of the RCU locking, since it wasn't necessary any more as far as the lru list itself was concerned.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote: > On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote: > > On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote: > > > > > As the matter of fact, let's try this instead - retry the same sucker > > > immediately in case if trylocks fail. Comments? > > > > Better yet, let's take "move back to shrink list" into dentry_kill() > > itself. Then we get consistent locking rules for dentry_kill() and > > instead of unlock_on_failure we simply pass it NULL or the shrink > > list to put the sucker back. Mika, could you test this one and see > > if it fixes that livelock? The difference in behaviour is that in > > case of trylock failure we hit that sucker again without letting > > it ride all the way around the list, same as we do for other dentry_kill() > > callers. > > I tried this patch and unfortunately it still results the same sort of > livelock. I've attached the dmesg. > > I also tried the serialization patch from Linus and it seemed to fix the > problem. After several rounds of USB memory stick plug/unplug I haven't > seen a single "soft lockup" warning in dmesg. > > I'm able to reproduce the problem pretty easily, so if you have > something else to try I'm more than happy to give it a try. Could you try this and post the resulting log? I'd really like to understand what's going on there - are we really hitting trylock failures there and what dentries are involved. diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..75f56a6 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -38,6 +38,7 @@ #include #include #include +#include #include "internal.h" #include "mount.h" @@ -448,7 +449,7 @@ EXPORT_SYMBOL(d_drop); * Returns dentry requiring refcount drop, or NULL if we're done. */ static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) +dentry_kill(struct dentry *dentry, struct list_head *shrink_list) __releases(dentry->d_lock) { struct inode *inode; @@ -464,10 +465,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure) inode = dentry->d_inode; if (inode && !spin_trylock(>i_lock)) { relock: - if (unlock_on_failure) { - spin_unlock(>d_lock); - cpu_relax(); - } + if (shrink_list) + d_shrink_add(dentry, shrink_list); + spin_unlock(>d_lock); + cpu_relax(); return dentry; /* try again with same dentry */ } if (!IS_ROOT(dentry)) @@ -542,6 +543,14 @@ out: * on the compiler to always get this right (gcc generally doesn't). * Real recursion would eat up our stack space. */ +static inline void dump(const char *s, struct dentry *dentry) +{ + if (unlikely(dentry->d_sb->s_magic == SYSFS_MAGIC)) { + printk(KERN_ERR "%s[%pd4]; CPU %d PID %d [%s]\n", + s, dentry, smp_processor_id(), + task_pid_nr(current), current->comm); + } +} /* * dput - release a dentry @@ -579,7 +588,9 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry, 1); + if (dentry->d_inode) + dump("dput", dentry); + dentry = dentry_kill(dentry, NULL); if (dentry) goto repeat; } @@ -798,6 +809,7 @@ static void shrink_dentry_list(struct list_head *list) while (!list_empty(list)) { dentry = list_entry(list->prev, struct dentry, d_lru); +again: spin_lock(>d_lock); /* * The dispose list is isolated and dentries are not accounted @@ -815,22 +827,19 @@ static void shrink_dentry_list(struct list_head *list) continue; } - parent = dentry_kill(dentry, 0); + dump("shrink", dentry); + parent = dentry_kill(dentry, list); /* * If dentry_kill returns NULL, we have nothing more to do. */ if (!parent) continue; +/* if trylocks have failed; just do it again */ if (unlikely(parent == dentry)) { - /* -* trylocks have failed and d_lock has been held the -* whole time, so it could not have been added to any -* other lists. Just add it back to the shrink list. -*/ - d_shrink_add(dentry, list); - spin_unlock(>d_lock); - continue; + if (dentry->d_sb->s_magic == SYSFS_MAGIC) + printk(KERN_ERR "A"); + goto again; } /* * We need to prune ancestors too. This is necessary to prevent @@ -839,8 +848,10 @@ static void
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote: On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote: On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote: As the matter of fact, let's try this instead - retry the same sucker immediately in case if trylocks fail. Comments? Better yet, let's take move back to shrink list into dentry_kill() itself. Then we get consistent locking rules for dentry_kill() and instead of unlock_on_failure we simply pass it NULL or the shrink list to put the sucker back. Mika, could you test this one and see if it fixes that livelock? The difference in behaviour is that in case of trylock failure we hit that sucker again without letting it ride all the way around the list, same as we do for other dentry_kill() callers. I tried this patch and unfortunately it still results the same sort of livelock. I've attached the dmesg. I also tried the serialization patch from Linus and it seemed to fix the problem. After several rounds of USB memory stick plug/unplug I haven't seen a single soft lockup warning in dmesg. I'm able to reproduce the problem pretty easily, so if you have something else to try I'm more than happy to give it a try. Could you try this and post the resulting log? I'd really like to understand what's going on there - are we really hitting trylock failures there and what dentries are involved. diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..75f56a6 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -38,6 +38,7 @@ #include linux/prefetch.h #include linux/ratelimit.h #include linux/list_lru.h +#include linux/magic.h #include internal.h #include mount.h @@ -448,7 +449,7 @@ EXPORT_SYMBOL(d_drop); * Returns dentry requiring refcount drop, or NULL if we're done. */ static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) +dentry_kill(struct dentry *dentry, struct list_head *shrink_list) __releases(dentry-d_lock) { struct inode *inode; @@ -464,10 +465,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure) inode = dentry-d_inode; if (inode !spin_trylock(inode-i_lock)) { relock: - if (unlock_on_failure) { - spin_unlock(dentry-d_lock); - cpu_relax(); - } + if (shrink_list) + d_shrink_add(dentry, shrink_list); + spin_unlock(dentry-d_lock); + cpu_relax(); return dentry; /* try again with same dentry */ } if (!IS_ROOT(dentry)) @@ -542,6 +543,14 @@ out: * on the compiler to always get this right (gcc generally doesn't). * Real recursion would eat up our stack space. */ +static inline void dump(const char *s, struct dentry *dentry) +{ + if (unlikely(dentry-d_sb-s_magic == SYSFS_MAGIC)) { + printk(KERN_ERR %s[%pd4]; CPU %d PID %d [%s]\n, + s, dentry, smp_processor_id(), + task_pid_nr(current), current-comm); + } +} /* * dput - release a dentry @@ -579,7 +588,9 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry, 1); + if (dentry-d_inode) + dump(dput, dentry); + dentry = dentry_kill(dentry, NULL); if (dentry) goto repeat; } @@ -798,6 +809,7 @@ static void shrink_dentry_list(struct list_head *list) while (!list_empty(list)) { dentry = list_entry(list-prev, struct dentry, d_lru); +again: spin_lock(dentry-d_lock); /* * The dispose list is isolated and dentries are not accounted @@ -815,22 +827,19 @@ static void shrink_dentry_list(struct list_head *list) continue; } - parent = dentry_kill(dentry, 0); + dump(shrink, dentry); + parent = dentry_kill(dentry, list); /* * If dentry_kill returns NULL, we have nothing more to do. */ if (!parent) continue; +/* if trylocks have failed; just do it again */ if (unlikely(parent == dentry)) { - /* -* trylocks have failed and d_lock has been held the -* whole time, so it could not have been added to any -* other lists. Just add it back to the shrink list. -*/ - d_shrink_add(dentry, list); - spin_unlock(dentry-d_lock); - continue; + if (dentry-d_sb-s_magic == SYSFS_MAGIC) + printk(KERN_ERR A); + goto again; } /* * We need to prune ancestors too. This is necessary to prevent @@ -839,8 +848,10 @@ static void
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote: > As the matter of fact, let's try this instead - retry the same sucker > immediately in case if trylocks fail. Comments? Better yet, let's take "move back to shrink list" into dentry_kill() itself. Then we get consistent locking rules for dentry_kill() and instead of unlock_on_failure we simply pass it NULL or the shrink list to put the sucker back. Mika, could you test this one and see if it fixes that livelock? The difference in behaviour is that in case of trylock failure we hit that sucker again without letting it ride all the way around the list, same as we do for other dentry_kill() callers. Signed-off-by: Al Viro --- diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..77c95e5 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -448,7 +448,7 @@ EXPORT_SYMBOL(d_drop); * Returns dentry requiring refcount drop, or NULL if we're done. */ static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) +dentry_kill(struct dentry *dentry, struct list_head *shrink_list) __releases(dentry->d_lock) { struct inode *inode; @@ -464,10 +464,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure) inode = dentry->d_inode; if (inode && !spin_trylock(>i_lock)) { relock: - if (unlock_on_failure) { - spin_unlock(>d_lock); - cpu_relax(); - } + if (shrink_list) + d_shrink_add(dentry, shrink_list); + spin_unlock(>d_lock); + cpu_relax(); return dentry; /* try again with same dentry */ } if (!IS_ROOT(dentry)) @@ -579,7 +579,7 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry, 1); + dentry = dentry_kill(dentry, NULL); if (dentry) goto repeat; } @@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list) while (!list_empty(list)) { dentry = list_entry(list->prev, struct dentry, d_lru); +again: spin_lock(>d_lock); /* * The dispose list is isolated and dentries are not accounted @@ -815,23 +816,16 @@ static void shrink_dentry_list(struct list_head *list) continue; } - parent = dentry_kill(dentry, 0); + parent = dentry_kill(dentry, list); /* * If dentry_kill returns NULL, we have nothing more to do. */ if (!parent) continue; - if (unlikely(parent == dentry)) { - /* -* trylocks have failed and d_lock has been held the -* whole time, so it could not have been added to any -* other lists. Just add it back to the shrink list. -*/ - d_shrink_add(dentry, list); - spin_unlock(>d_lock); - continue; - } +/* if trylocks have failed; just do it again */ + if (unlikely(parent == dentry)) + goto again; /* * We need to prune ancestors too. This is necessary to prevent * quadratic behavior of shrink_dcache_parent(), but is also @@ -840,7 +834,7 @@ static void shrink_dentry_list(struct list_head *list) */ dentry = parent; while (dentry && !lockref_put_or_lock(>d_lockref)) - dentry = dentry_kill(dentry, 1); + dentry = dentry_kill(dentry, NULL); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Tue, May 27, 2014 at 02:40:54AM +0100, Al Viro wrote: > It looks plausible, but I doubt that serializing check_submounts_and_drop() > will suffice - shrink_dcache_parent() is just as unpleasant and it *is* > triggered in the same situations. Moreover, the lack of loop in memory > shrinkers doesn't help - we might get shrink_dentry_list() from one of > those and loops that keep calling d_walk() from check_submounts_and_drop() > or shrink_dcache_parent()... > > > Anyway, I'd like Mika to test the stupid "let's serialize the dentry > > shrinking in check_submounts_and_drop()" to see if his problem goes > > away. I agree that it's not the _proper_ fix, but we're damn late in > > the rc series.. > > That we are... FWIW, if the nastiness matches the description above, > the right place to do something probably would be when those two > suckers get positive return value from d_walk() along with an empty > shrink list. I wonder if we should do down_read() in shrink_dentry_list() > and down_write();up_write() in that case in shrink_dcache_parent() and > check_submounts_and_drop(). How about the following? As the matter of fact, let's try this instead - retry the same sucker immediately in case if trylocks fail. Comments? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..d58d4cc 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list) while (!list_empty(list)) { dentry = list_entry(list->prev, struct dentry, d_lru); +again: spin_lock(>d_lock); /* * The dispose list is isolated and dentries are not accounted @@ -830,7 +831,8 @@ static void shrink_dentry_list(struct list_head *list) */ d_shrink_add(dentry, list); spin_unlock(>d_lock); - continue; + cpu_relax(); + goto again; } /* * We need to prune ancestors too. This is necessary to prevent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 01:24:52PM -0700, Linus Torvalds wrote: > Two things. > > (1) The facts. > > Just check the callchains on every single CPU in Mika's original email. Point. > (2) The code. > > Yes, the whole looping over the dentry tree happens in other places > too, but shrink_dcache_parents() is already called under s_umount But that one's not true. shrink_dcache_parent() is called from all kinds of places, and it isn't guaranteed to be serialized at all. For example, d_invalidate() will do it, and I wouldn't be surprised to see it called in environment where we see shitloads of ->d_revalidate() hitting dentries that ought to be invalidated. In fact, unless we have something mounted under sysfs, those calls of check_submounts_and_drop() will be followed by d_invalidate(). > > I really, really wonder WTF is causing that - we have spent 20-odd > > seconds spinning while dentries in there were being evicted by > > something. That - on sysfs, where dentry_kill() should be non-blocking > > and very fast. Something very fishy is going on and I'd really like > > to understand the use pattern we are seeing there. > > I think it literally is just a livelock. Just look at the NMI > backtraces for each stuck CPU: most of them are waiting for the dentry > lock in d_walk(). They have probably all a few dentries on their own > list. One of the CPU is actually _in_ shrink_dentry_list(). > > Now, the way our ticket spinlocks work, they are actually fair: which > means that I can easily imagine us getting into a pattern, where if > you have the right insane starting conditions, each CPU will basically > get their own dentry list. > > That said, the only way I can see that nobody ever makes any progress > is if somebody as the inode locked, and then dentry_kill() turns into > a no-op. Otherwise one of those threads should always kill one or more > dentries, afaik. We do have that "trylock on i_lock, then trylock on > parent->d_lock", and if either of those fails, drop and re-try loop. I > wonder if we can get into a situation where lots of people hold each > others dentries locks sufficiently that dentry_kill() just ends up > failing and looping.. Umm... Let me see if I understood you correctly - you think that it's shrink_dentry_list() cycling through a bunch of dentries, failing trylocks on all of them due to d_walk() from other threads that keeps hitting ->d_lock on parents (->i_lock is less likely, AFAICS). Then we move the sucker to the end of shrink list and try the next one, ad infinitum. And those d_walk() callers keep looping since they keep finding those dentries and nothing else... Right? It looks plausible, but I doubt that serializing check_submounts_and_drop() will suffice - shrink_dcache_parent() is just as unpleasant and it *is* triggered in the same situations. Moreover, the lack of loop in memory shrinkers doesn't help - we might get shrink_dentry_list() from one of those and loops that keep calling d_walk() from check_submounts_and_drop() or shrink_dcache_parent()... > Anyway, I'd like Mika to test the stupid "let's serialize the dentry > shrinking in check_submounts_and_drop()" to see if his problem goes > away. I agree that it's not the _proper_ fix, but we're damn late in > the rc series.. That we are... FWIW, if the nastiness matches the description above, the right place to do something probably would be when those two suckers get positive return value from d_walk() along with an empty shrink list. I wonder if we should do down_read() in shrink_dentry_list() and down_write();up_write() in that case in shrink_dcache_parent() and check_submounts_and_drop(). How about the following? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..72f2c95 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -795,7 +795,14 @@ EXPORT_SYMBOL(d_prune_aliases); static void shrink_dentry_list(struct list_head *list) { struct dentry *dentry, *parent; + static DECLARE_RWSEM(shrink_sem); + if (unlikely(list_empty(list))) { + down_write(_sem); + up_write(_sem); + return; + } + down_read(_sem); while (!list_empty(list)) { dentry = list_entry(list->prev, struct dentry, d_lru); spin_lock(>d_lock); @@ -842,6 +849,7 @@ static void shrink_dentry_list(struct list_head *list) while (dentry && !lockref_put_or_lock(>d_lockref)) dentry = dentry_kill(dentry, 1); } + up_read(_sem); } static enum lru_status @@ -923,7 +931,8 @@ long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, freed = list_lru_walk_node(>s_dentry_lru, nid, dentry_lru_isolate, , _to_scan); - shrink_dentry_list(); + if (!list_empty()) + shrink_dentry_list(); return freed; } @@ -966,7 +975,8 @@ void shrink_dcache_sb(struct super_block *sb)
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 11:26 AM, Al Viro wrote: > On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote: >> On Mon, May 26, 2014 at 8:27 AM, Al Viro wrote: >> > >> > That's the livelock. OK. >> >> Hmm. Is there any reason we don't have some exclusion around >> "check_submounts_and_drop()"? >> >> That would seem to be the simplest way to avoid any livelock: just >> don't allow concurrent calls (we could make the lock per-filesystem or >> whatever). This whole case should all be for just exceptional cases >> anyway. >> >> We already sleep in that thing (well, "cond_resched()"), so taking a >> mutex should be fine. > > What makes you think that it's another check_submounts_and_drop()? Two things. (1) The facts. Just check the callchains on every single CPU in Mika's original email. It *is* another check_submounts_and_drop() (in Mika's case, always through kernfs_dop_revalidate()). It's not me "thinking" so. You can see it for yourself. This seems to be triggered by systemd-udevd doing (in every single thread): readlink(): SyS_readlink -> user_path_at_empty -> filename_lookup -> path_lookupat -> link_path_walk -> lookup_fast -> kernfs_dop_revalidate -> check_submounts_and_drop and I suspect it's the "kernfs_active()" check that basically causes that storm of revalidate failures that leads to lots of check_submounts_and_drop() cases. (2) The code. Yes, the whole looping over the dentry tree happens in other places too, but shrink_dcache_parents() is already called under s_umount, and the out-of-memory pruning isn't done in a for-loop. So if it's a livelock, check_submounts_and_drop() really is pretty special. I agree that it's not necessarily unique from a race standpoint, but it does seem to be in a class of its own when it comes to be able to *trigger* any potential livelock. In particular, the fact that kernfs can generate that sudden storm of check_submounts_and_drop() calls when something goes away. > I really, really wonder WTF is causing that - we have spent 20-odd > seconds spinning while dentries in there were being evicted by > something. That - on sysfs, where dentry_kill() should be non-blocking > and very fast. Something very fishy is going on and I'd really like > to understand the use pattern we are seeing there. I think it literally is just a livelock. Just look at the NMI backtraces for each stuck CPU: most of them are waiting for the dentry lock in d_walk(). They have probably all a few dentries on their own list. One of the CPU is actually _in_ shrink_dentry_list(). Now, the way our ticket spinlocks work, they are actually fair: which means that I can easily imagine us getting into a pattern, where if you have the right insane starting conditions, each CPU will basically get their own dentry list. That said, the only way I can see that nobody ever makes any progress is if somebody as the inode locked, and then dentry_kill() turns into a no-op. Otherwise one of those threads should always kill one or more dentries, afaik. We do have that "trylock on i_lock, then trylock on parent->d_lock", and if either of those fails, drop and re-try loop. I wonder if we can get into a situation where lots of people hold each others dentries locks sufficiently that dentry_kill() just ends up failing and looping.. Anyway, I'd like Mika to test the stupid "let's serialize the dentry shrinking in check_submounts_and_drop()" to see if his problem goes away. I agree that it's not the _proper_ fix, but we're damn late in the rc series.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote: > On Mon, May 26, 2014 at 8:27 AM, Al Viro wrote: > > > > That's the livelock. OK. > > Hmm. Is there any reason we don't have some exclusion around > "check_submounts_and_drop()"? > > That would seem to be the simplest way to avoid any livelock: just > don't allow concurrent calls (we could make the lock per-filesystem or > whatever). This whole case should all be for just exceptional cases > anyway. > > We already sleep in that thing (well, "cond_resched()"), so taking a > mutex should be fine. What makes you think that it's another check_submounts_and_drop()? And not, e.g., shrink_dcache_parent(). Or memory shrinkers. Or some twit sitting in a subdirectory and doing stat(2) in a loop, for that matter... I really, really wonder WTF is causing that - we have spent 20-odd seconds spinning while dentries in there were being evicted by something. That - on sysfs, where dentry_kill() should be non-blocking and very fast. Something very fishy is going on and I'd really like to understand the use pattern we are seeing there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 8:27 AM, Al Viro wrote: > > That's the livelock. OK. Hmm. Is there any reason we don't have some exclusion around "check_submounts_and_drop()"? That would seem to be the simplest way to avoid any livelock: just don't allow concurrent calls (we could make the lock per-filesystem or whatever). This whole case should all be for just exceptional cases anyway. We already sleep in that thing (well, "cond_resched()"), so taking a mutex should be fine. The attached (TOTALLY UNTESTED!) patch as an example. Avert your eyes. Mika, does this make any difference? Linus fs/dcache.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01eefc07..663fd04614cc 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1333,16 +1333,19 @@ int check_submounts_and_drop(struct dentry *dentry) for (;;) { struct select_data data; + static DEFINE_MUTEX(mutex); INIT_LIST_HEAD(); data.start = dentry; data.found = 0; + mutex_lock(); d_walk(dentry, , check_and_collect, check_and_drop); ret = data.found; if (!list_empty()) shrink_dentry_list(); + mutex_unlock(); if (ret <= 0) break;
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 04:27:03PM +0100, Al Viro wrote: > Do these livelocks keep happening indefinitely, once triggered? IOW, > is that a buggered state of dcache and/or kernfs, or is it a transient pileup > that happens when we invalidate a subtree there? Could you slap if (dentry->d_sb->s_magic == SYSFS_MAGIC) { printk(KERN_INFO "killing %p4d", dentry); WARN_ON(1); } in the very beginning of dentry_kill(), if (dentry->d_sb->s_magic == SYSFS_MAGIC) { printk(KERN_INFO "invalidate %p4d", dentry); WARN_ON(1); } right after the if (!dentry->d_inode) { d_drop(dentry); goto out; } in check_submounts_and_drop(), reproduce that shite and see what gets into the log between USB disconnect and soft lockup? Warning: it will produce an obscene amount of output. If it gets _really_ excessive (as in "can't even get through the boot without drowning in syslog traffic"), we could add a sysctl turning those on, but let's try and see if it's survivable in that form first... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 05:29:48PM +0300, Mika Westerberg wrote: > I attached the dmesg with 'echo t > /proc/sysrq-trigger' included. > [ 133.826957] usb 3-10.3: USB disconnect, device number 7 > [ 159.326769] BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:1824] > [ 159.326809] CPU: 6 PID: 1824 Comm: systemd-udevd Tainted: G I > 3.15.0-rc7 #55 > [ 159.326810] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 > TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014 > [ 159.326812] task: 880472854a80 ti: 8804747ec000 task.ti: > 8804747ec000 [snip] > [ 159.326834] Call Trace: > [ 159.326838] [] dentry_kill+0x36/0x280 > [ 159.326840] [] shrink_dentry_list+0x8a/0x110 > [ 159.326842] [] check_submounts_and_drop+0x74/0xa0 > [ 159.326845] [] kernfs_dop_revalidate+0x5d/0xd0 > [ 159.326847] [] lookup_fast+0x26d/0x2c0 > [ 159.326849] [] path_lookupat+0x155/0x780 > [ 159.326851] [] ? final_putname+0x22/0x50 > [ 159.326853] [] ? user_path_at_empty+0x5f/0x90 > [ 159.326856] [] ? kmem_cache_alloc+0x35/0x1f0 > [ 159.326858] [] ? getname_flags+0x4f/0x1a0 > [ 159.326860] [] filename_lookup+0x2b/0xc0 > [ 159.326862] [] user_path_at_empty+0x54/0x90 > [ 159.326865] [] ? SYSC_newstat+0x1f/0x40 > [ 159.326867] [] SyS_readlink+0x4c/0x130 > [ 159.326870] [] ? __audit_syscall_exit+0x1f6/0x2a0 > [ 159.326872] [] system_call_fastpath+0x16/0x1b That's the livelock. OK. But in the stack traces below a) that systemd-udevd instance is happily running in userland and b) the only traces of dentry_kill() in the stack are noise in stacks of gdbus and dbus-daemon. *grumble* I wonder if we should instrument d_shrink_add()/d_lru_shrink_mode() so that they would tag dentry with pointer to current, allowing us to report something useful in select_collect()... What's really strange is that the same livelock seems to repeat at least once more; dentries involved the first time around should've been dead and buried by then. And AFAICS, in the log you've originally posted exactly that has happened - both times to the same process... Do these livelocks keep happening indefinitely, once triggered? IOW, is that a buggered state of dcache and/or kernfs, or is it a transient pileup that happens when we invalidate a subtree there? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
[fsdevel and folks who'd been on d_lru corruption thread Cc'd - that's a continuation of the same mess] On Mon, May 26, 2014 at 12:37:41PM +0300, Mika Westerberg wrote: > Hi, > > After v3.15-rc4 my Fedora 20 system with mainline kernel has been suffering > from the above lockup. > > This is easy to reproduce: > > 1) Plug in USB memory stick (to xHCI port) > 2) Unplug it > > Typically only one iteration is needed and suddenly I can see > systemd-udev taking 100% CPU and eventually the whole system becomes > unusable. > > I've tried to investigate and it looks like we just spin in > shrink_dentry_list() forever. Reverting following fs/dcache.c commits > the issue goes away: > > 60942f2f235ce7b817166cdf355eed729094834d dcache: don't need rcu in > shrink_dentry_list() > 9c8c10e262e0f62cb2530f1b076de979123183dd more graceful recovery in > umount_collect() > fe91522a7ba82ca1a51b07e19954b3825e4aaa22 don't remove from shrink list in > select_collect() Which means that we very likely have a reproducer for d_lru-corrupting races in earlier kernels here. I wonder if it can be simulated under KVM... > (The first two commits themselves don't seem to be related but reverting > them is needed so that the last one can be cleanly reverted). What I really wonder is what else is going on there; it keeps finding a bunch of dentries _already_ on shrink list(s) of somebody else. And spins (with eviction of everything worthy not already on shrink lists and cond_resched() thrown in) to give whoever's trying to evict those suckers do their job. This means that we either have somebody stuck trying to evict a dentry, or that more and more dentries keep being added and evicted there. Is somebody sitting in a subdirectory of invalid one and trying to do lookups there, perhaps? But in that case we would have the same livelock in the older kernels, possibly harder to hit, but still there... FWIW, older kernels just went ahead, picked those already-on-shrink-list dentries and did dentry_kill(), hopefully not at the time when the owner of shrink list got around to removing the neighbor from that list. With list corruption in case it happened at just the wrong moment. I don't have Fedora anywhere outside of KVM test images, and it'll take a while to inflict it on actual hardware; in the meanwhile, could you hit alt-sysrq-t after it gets stuck and post the results? At least that would give some idea whether it's somebody stuck on trying to evict a dentry or a stream of new dentries being added and killed there. AFAICS, kernfs ->d_release() isn't blocking and final iput() there also doesn't look like it's likely to get stuck, but I'd rather have that possibility excluded... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
[fsdevel and folks who'd been on d_lru corruption thread Cc'd - that's a continuation of the same mess] On Mon, May 26, 2014 at 12:37:41PM +0300, Mika Westerberg wrote: Hi, After v3.15-rc4 my Fedora 20 system with mainline kernel has been suffering from the above lockup. This is easy to reproduce: 1) Plug in USB memory stick (to xHCI port) 2) Unplug it Typically only one iteration is needed and suddenly I can see systemd-udev taking 100% CPU and eventually the whole system becomes unusable. I've tried to investigate and it looks like we just spin in shrink_dentry_list() forever. Reverting following fs/dcache.c commits the issue goes away: 60942f2f235ce7b817166cdf355eed729094834d dcache: don't need rcu in shrink_dentry_list() 9c8c10e262e0f62cb2530f1b076de979123183dd more graceful recovery in umount_collect() fe91522a7ba82ca1a51b07e19954b3825e4aaa22 don't remove from shrink list in select_collect() Which means that we very likely have a reproducer for d_lru-corrupting races in earlier kernels here. I wonder if it can be simulated under KVM... (The first two commits themselves don't seem to be related but reverting them is needed so that the last one can be cleanly reverted). What I really wonder is what else is going on there; it keeps finding a bunch of dentries _already_ on shrink list(s) of somebody else. And spins (with eviction of everything worthy not already on shrink lists and cond_resched() thrown in) to give whoever's trying to evict those suckers do their job. This means that we either have somebody stuck trying to evict a dentry, or that more and more dentries keep being added and evicted there. Is somebody sitting in a subdirectory of invalid one and trying to do lookups there, perhaps? But in that case we would have the same livelock in the older kernels, possibly harder to hit, but still there... FWIW, older kernels just went ahead, picked those already-on-shrink-list dentries and did dentry_kill(), hopefully not at the time when the owner of shrink list got around to removing the neighbor from that list. With list corruption in case it happened at just the wrong moment. I don't have Fedora anywhere outside of KVM test images, and it'll take a while to inflict it on actual hardware; in the meanwhile, could you hit alt-sysrq-t after it gets stuck and post the results? At least that would give some idea whether it's somebody stuck on trying to evict a dentry or a stream of new dentries being added and killed there. AFAICS, kernfs -d_release() isn't blocking and final iput() there also doesn't look like it's likely to get stuck, but I'd rather have that possibility excluded... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 05:29:48PM +0300, Mika Westerberg wrote: I attached the dmesg with 'echo t /proc/sysrq-trigger' included. [ 133.826957] usb 3-10.3: USB disconnect, device number 7 [ 159.326769] BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:1824] [ 159.326809] CPU: 6 PID: 1824 Comm: systemd-udevd Tainted: G I 3.15.0-rc7 #55 [ 159.326810] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014 [ 159.326812] task: 880472854a80 ti: 8804747ec000 task.ti: 8804747ec000 [snip] [ 159.326834] Call Trace: [ 159.326838] [811e74e6] dentry_kill+0x36/0x280 [ 159.326840] [811e793a] shrink_dentry_list+0x8a/0x110 [ 159.326842] [811e81c4] check_submounts_and_drop+0x74/0xa0 [ 159.326845] [81245c5d] kernfs_dop_revalidate+0x5d/0xd0 [ 159.326847] [811dba4d] lookup_fast+0x26d/0x2c0 [ 159.326849] [811dd2b5] path_lookupat+0x155/0x780 [ 159.326851] [811dc152] ? final_putname+0x22/0x50 [ 159.326853] [811e198f] ? user_path_at_empty+0x5f/0x90 [ 159.326856] [811b6eb5] ? kmem_cache_alloc+0x35/0x1f0 [ 159.326858] [811dc1cf] ? getname_flags+0x4f/0x1a0 [ 159.326860] [811dd90b] filename_lookup+0x2b/0xc0 [ 159.326862] [811e1984] user_path_at_empty+0x54/0x90 [ 159.326865] [811d65ff] ? SYSC_newstat+0x1f/0x40 [ 159.326867] [811d69ec] SyS_readlink+0x4c/0x130 [ 159.326870] [81113c76] ? __audit_syscall_exit+0x1f6/0x2a0 [ 159.326872] [816ade69] system_call_fastpath+0x16/0x1b That's the livelock. OK. But in the stack traces below a) that systemd-udevd instance is happily running in userland and b) the only traces of dentry_kill() in the stack are noise in stacks of gdbus and dbus-daemon. *grumble* I wonder if we should instrument d_shrink_add()/d_lru_shrink_mode() so that they would tag dentry with pointer to current, allowing us to report something useful in select_collect()... What's really strange is that the same livelock seems to repeat at least once more; dentries involved the first time around should've been dead and buried by then. And AFAICS, in the log you've originally posted exactly that has happened - both times to the same process... Do these livelocks keep happening indefinitely, once triggered? IOW, is that a buggered state of dcache and/or kernfs, or is it a transient pileup that happens when we invalidate a subtree there? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 04:27:03PM +0100, Al Viro wrote: Do these livelocks keep happening indefinitely, once triggered? IOW, is that a buggered state of dcache and/or kernfs, or is it a transient pileup that happens when we invalidate a subtree there? Could you slap if (dentry-d_sb-s_magic == SYSFS_MAGIC) { printk(KERN_INFO killing %p4d, dentry); WARN_ON(1); } in the very beginning of dentry_kill(), if (dentry-d_sb-s_magic == SYSFS_MAGIC) { printk(KERN_INFO invalidate %p4d, dentry); WARN_ON(1); } right after the if (!dentry-d_inode) { d_drop(dentry); goto out; } in check_submounts_and_drop(), reproduce that shite and see what gets into the log between USB disconnect and soft lockup? Warning: it will produce an obscene amount of output. If it gets _really_ excessive (as in can't even get through the boot without drowning in syslog traffic), we could add a sysctl turning those on, but let's try and see if it's survivable in that form first... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 8:27 AM, Al Viro v...@zeniv.linux.org.uk wrote: That's the livelock. OK. Hmm. Is there any reason we don't have some exclusion around check_submounts_and_drop()? That would seem to be the simplest way to avoid any livelock: just don't allow concurrent calls (we could make the lock per-filesystem or whatever). This whole case should all be for just exceptional cases anyway. We already sleep in that thing (well, cond_resched()), so taking a mutex should be fine. The attached (TOTALLY UNTESTED!) patch as an example. Avert your eyes. Mika, does this make any difference? Linus fs/dcache.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01eefc07..663fd04614cc 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1333,16 +1333,19 @@ int check_submounts_and_drop(struct dentry *dentry) for (;;) { struct select_data data; + static DEFINE_MUTEX(mutex); INIT_LIST_HEAD(data.dispose); data.start = dentry; data.found = 0; + mutex_lock(mutex); d_walk(dentry, data, check_and_collect, check_and_drop); ret = data.found; if (!list_empty(data.dispose)) shrink_dentry_list(data.dispose); + mutex_unlock(mutex); if (ret = 0) break;
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote: On Mon, May 26, 2014 at 8:27 AM, Al Viro v...@zeniv.linux.org.uk wrote: That's the livelock. OK. Hmm. Is there any reason we don't have some exclusion around check_submounts_and_drop()? That would seem to be the simplest way to avoid any livelock: just don't allow concurrent calls (we could make the lock per-filesystem or whatever). This whole case should all be for just exceptional cases anyway. We already sleep in that thing (well, cond_resched()), so taking a mutex should be fine. What makes you think that it's another check_submounts_and_drop()? And not, e.g., shrink_dcache_parent(). Or memory shrinkers. Or some twit sitting in a subdirectory and doing stat(2) in a loop, for that matter... I really, really wonder WTF is causing that - we have spent 20-odd seconds spinning while dentries in there were being evicted by something. That - on sysfs, where dentry_kill() should be non-blocking and very fast. Something very fishy is going on and I'd really like to understand the use pattern we are seeing there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 11:26 AM, Al Viro v...@zeniv.linux.org.uk wrote: On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote: On Mon, May 26, 2014 at 8:27 AM, Al Viro v...@zeniv.linux.org.uk wrote: That's the livelock. OK. Hmm. Is there any reason we don't have some exclusion around check_submounts_and_drop()? That would seem to be the simplest way to avoid any livelock: just don't allow concurrent calls (we could make the lock per-filesystem or whatever). This whole case should all be for just exceptional cases anyway. We already sleep in that thing (well, cond_resched()), so taking a mutex should be fine. What makes you think that it's another check_submounts_and_drop()? Two things. (1) The facts. Just check the callchains on every single CPU in Mika's original email. It *is* another check_submounts_and_drop() (in Mika's case, always through kernfs_dop_revalidate()). It's not me thinking so. You can see it for yourself. This seems to be triggered by systemd-udevd doing (in every single thread): readlink(): SyS_readlink - user_path_at_empty - filename_lookup - path_lookupat - link_path_walk - lookup_fast - kernfs_dop_revalidate - check_submounts_and_drop and I suspect it's the kernfs_active() check that basically causes that storm of revalidate failures that leads to lots of check_submounts_and_drop() cases. (2) The code. Yes, the whole looping over the dentry tree happens in other places too, but shrink_dcache_parents() is already called under s_umount, and the out-of-memory pruning isn't done in a for-loop. So if it's a livelock, check_submounts_and_drop() really is pretty special. I agree that it's not necessarily unique from a race standpoint, but it does seem to be in a class of its own when it comes to be able to *trigger* any potential livelock. In particular, the fact that kernfs can generate that sudden storm of check_submounts_and_drop() calls when something goes away. I really, really wonder WTF is causing that - we have spent 20-odd seconds spinning while dentries in there were being evicted by something. That - on sysfs, where dentry_kill() should be non-blocking and very fast. Something very fishy is going on and I'd really like to understand the use pattern we are seeing there. I think it literally is just a livelock. Just look at the NMI backtraces for each stuck CPU: most of them are waiting for the dentry lock in d_walk(). They have probably all a few dentries on their own list. One of the CPU is actually _in_ shrink_dentry_list(). Now, the way our ticket spinlocks work, they are actually fair: which means that I can easily imagine us getting into a pattern, where if you have the right insane starting conditions, each CPU will basically get their own dentry list. That said, the only way I can see that nobody ever makes any progress is if somebody as the inode locked, and then dentry_kill() turns into a no-op. Otherwise one of those threads should always kill one or more dentries, afaik. We do have that trylock on i_lock, then trylock on parent-d_lock, and if either of those fails, drop and re-try loop. I wonder if we can get into a situation where lots of people hold each others dentries locks sufficiently that dentry_kill() just ends up failing and looping.. Anyway, I'd like Mika to test the stupid let's serialize the dentry shrinking in check_submounts_and_drop() to see if his problem goes away. I agree that it's not the _proper_ fix, but we're damn late in the rc series.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Mon, May 26, 2014 at 01:24:52PM -0700, Linus Torvalds wrote: Two things. (1) The facts. Just check the callchains on every single CPU in Mika's original email. Point. (2) The code. Yes, the whole looping over the dentry tree happens in other places too, but shrink_dcache_parents() is already called under s_umount But that one's not true. shrink_dcache_parent() is called from all kinds of places, and it isn't guaranteed to be serialized at all. For example, d_invalidate() will do it, and I wouldn't be surprised to see it called in environment where we see shitloads of -d_revalidate() hitting dentries that ought to be invalidated. In fact, unless we have something mounted under sysfs, those calls of check_submounts_and_drop() will be followed by d_invalidate(). I really, really wonder WTF is causing that - we have spent 20-odd seconds spinning while dentries in there were being evicted by something. That - on sysfs, where dentry_kill() should be non-blocking and very fast. Something very fishy is going on and I'd really like to understand the use pattern we are seeing there. I think it literally is just a livelock. Just look at the NMI backtraces for each stuck CPU: most of them are waiting for the dentry lock in d_walk(). They have probably all a few dentries on their own list. One of the CPU is actually _in_ shrink_dentry_list(). Now, the way our ticket spinlocks work, they are actually fair: which means that I can easily imagine us getting into a pattern, where if you have the right insane starting conditions, each CPU will basically get their own dentry list. That said, the only way I can see that nobody ever makes any progress is if somebody as the inode locked, and then dentry_kill() turns into a no-op. Otherwise one of those threads should always kill one or more dentries, afaik. We do have that trylock on i_lock, then trylock on parent-d_lock, and if either of those fails, drop and re-try loop. I wonder if we can get into a situation where lots of people hold each others dentries locks sufficiently that dentry_kill() just ends up failing and looping.. Umm... Let me see if I understood you correctly - you think that it's shrink_dentry_list() cycling through a bunch of dentries, failing trylocks on all of them due to d_walk() from other threads that keeps hitting -d_lock on parents (-i_lock is less likely, AFAICS). Then we move the sucker to the end of shrink list and try the next one, ad infinitum. And those d_walk() callers keep looping since they keep finding those dentries and nothing else... Right? It looks plausible, but I doubt that serializing check_submounts_and_drop() will suffice - shrink_dcache_parent() is just as unpleasant and it *is* triggered in the same situations. Moreover, the lack of loop in memory shrinkers doesn't help - we might get shrink_dentry_list() from one of those and loops that keep calling d_walk() from check_submounts_and_drop() or shrink_dcache_parent()... Anyway, I'd like Mika to test the stupid let's serialize the dentry shrinking in check_submounts_and_drop() to see if his problem goes away. I agree that it's not the _proper_ fix, but we're damn late in the rc series.. That we are... FWIW, if the nastiness matches the description above, the right place to do something probably would be when those two suckers get positive return value from d_walk() along with an empty shrink list. I wonder if we should do down_read() in shrink_dentry_list() and down_write();up_write() in that case in shrink_dcache_parent() and check_submounts_and_drop(). How about the following? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..72f2c95 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -795,7 +795,14 @@ EXPORT_SYMBOL(d_prune_aliases); static void shrink_dentry_list(struct list_head *list) { struct dentry *dentry, *parent; + static DECLARE_RWSEM(shrink_sem); + if (unlikely(list_empty(list))) { + down_write(shrink_sem); + up_write(shrink_sem); + return; + } + down_read(shrink_sem); while (!list_empty(list)) { dentry = list_entry(list-prev, struct dentry, d_lru); spin_lock(dentry-d_lock); @@ -842,6 +849,7 @@ static void shrink_dentry_list(struct list_head *list) while (dentry !lockref_put_or_lock(dentry-d_lockref)) dentry = dentry_kill(dentry, 1); } + up_read(shrink_sem); } static enum lru_status @@ -923,7 +931,8 @@ long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, freed = list_lru_walk_node(sb-s_dentry_lru, nid, dentry_lru_isolate, dispose, nr_to_scan); - shrink_dentry_list(dispose); + if (!list_empty(dispose)) + shrink_dentry_list(dispose); return freed; } @@ -966,7 +975,8 @@ void shrink_dcache_sb(struct super_block *sb)
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Tue, May 27, 2014 at 02:40:54AM +0100, Al Viro wrote: It looks plausible, but I doubt that serializing check_submounts_and_drop() will suffice - shrink_dcache_parent() is just as unpleasant and it *is* triggered in the same situations. Moreover, the lack of loop in memory shrinkers doesn't help - we might get shrink_dentry_list() from one of those and loops that keep calling d_walk() from check_submounts_and_drop() or shrink_dcache_parent()... Anyway, I'd like Mika to test the stupid let's serialize the dentry shrinking in check_submounts_and_drop() to see if his problem goes away. I agree that it's not the _proper_ fix, but we're damn late in the rc series.. That we are... FWIW, if the nastiness matches the description above, the right place to do something probably would be when those two suckers get positive return value from d_walk() along with an empty shrink list. I wonder if we should do down_read() in shrink_dentry_list() and down_write();up_write() in that case in shrink_dcache_parent() and check_submounts_and_drop(). How about the following? As the matter of fact, let's try this instead - retry the same sucker immediately in case if trylocks fail. Comments? diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..d58d4cc 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list) while (!list_empty(list)) { dentry = list_entry(list-prev, struct dentry, d_lru); +again: spin_lock(dentry-d_lock); /* * The dispose list is isolated and dentries are not accounted @@ -830,7 +831,8 @@ static void shrink_dentry_list(struct list_head *list) */ d_shrink_add(dentry, list); spin_unlock(dentry-d_lock); - continue; + cpu_relax(); + goto again; } /* * We need to prune ancestors too. This is necessary to prevent -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote: As the matter of fact, let's try this instead - retry the same sucker immediately in case if trylocks fail. Comments? Better yet, let's take move back to shrink list into dentry_kill() itself. Then we get consistent locking rules for dentry_kill() and instead of unlock_on_failure we simply pass it NULL or the shrink list to put the sucker back. Mika, could you test this one and see if it fixes that livelock? The difference in behaviour is that in case of trylock failure we hit that sucker again without letting it ride all the way around the list, same as we do for other dentry_kill() callers. Signed-off-by: Al Viro v...@zeniv.linux.org.uk --- diff --git a/fs/dcache.c b/fs/dcache.c index 42ae01e..77c95e5 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -448,7 +448,7 @@ EXPORT_SYMBOL(d_drop); * Returns dentry requiring refcount drop, or NULL if we're done. */ static struct dentry * -dentry_kill(struct dentry *dentry, int unlock_on_failure) +dentry_kill(struct dentry *dentry, struct list_head *shrink_list) __releases(dentry-d_lock) { struct inode *inode; @@ -464,10 +464,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure) inode = dentry-d_inode; if (inode !spin_trylock(inode-i_lock)) { relock: - if (unlock_on_failure) { - spin_unlock(dentry-d_lock); - cpu_relax(); - } + if (shrink_list) + d_shrink_add(dentry, shrink_list); + spin_unlock(dentry-d_lock); + cpu_relax(); return dentry; /* try again with same dentry */ } if (!IS_ROOT(dentry)) @@ -579,7 +579,7 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry, 1); + dentry = dentry_kill(dentry, NULL); if (dentry) goto repeat; } @@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list) while (!list_empty(list)) { dentry = list_entry(list-prev, struct dentry, d_lru); +again: spin_lock(dentry-d_lock); /* * The dispose list is isolated and dentries are not accounted @@ -815,23 +816,16 @@ static void shrink_dentry_list(struct list_head *list) continue; } - parent = dentry_kill(dentry, 0); + parent = dentry_kill(dentry, list); /* * If dentry_kill returns NULL, we have nothing more to do. */ if (!parent) continue; - if (unlikely(parent == dentry)) { - /* -* trylocks have failed and d_lock has been held the -* whole time, so it could not have been added to any -* other lists. Just add it back to the shrink list. -*/ - d_shrink_add(dentry, list); - spin_unlock(dentry-d_lock); - continue; - } +/* if trylocks have failed; just do it again */ + if (unlikely(parent == dentry)) + goto again; /* * We need to prune ancestors too. This is necessary to prevent * quadratic behavior of shrink_dcache_parent(), but is also @@ -840,7 +834,7 @@ static void shrink_dentry_list(struct list_head *list) */ dentry = parent; while (dentry !lockref_put_or_lock(dentry-d_lockref)) - dentry = dentry_kill(dentry, 1); + dentry = dentry_kill(dentry, NULL); } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/