subject:"fs\/dcache.c \- BUG\: soft lockup \- CPU#5 stuck for 22s\! \[systemd\-udevd\:1667\]"

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Josh Boyer

On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds
 wrote:
>
>
> On Sat, 31 May 2014, Josh Boyer wrote:
>>
>> One of my machines got the lockdep report below when booting a kernel
>> that contained these patches.
>
> I think this is just a lacking annotation.
>
> We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the
> rule is that we lock things in topological order (parent dentry first).
> lock_parent() is very careful about that, but doesn't actually tell
> lockdep about it.
>
> This trivial oneliner should fix it.
>
> Linus

Built the previous kernel with just this applied and it seems to
behave correctly on several reboots.  I think this looks good.
Thanks!

josh

>
> ---
>  fs/dcache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index bce851dc03ef..be2bea834bf4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -553,7 +553,7 @@ again:
> }
> rcu_read_unlock();
> if (parent != dentry)
> -   spin_lock(>d_lock);
> +   spin_lock_nested(>d_lock, DENTRY_D_LOCK_NESTED);
> else
> parent = NULL;
> return parent;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Josh Boyer

On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds
 wrote:
>
>
> On Sat, 31 May 2014, Josh Boyer wrote:
>>
>> One of my machines got the lockdep report below when booting a kernel
>> that contained these patches.
>
> I think this is just a lacking annotation.
>
> We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the
> rule is that we lock things in topological order (parent dentry first).
> lock_parent() is very careful about that, but doesn't actually tell
> lockdep about it.
>
> This trivial oneliner should fix it.

Thanks, I'll give it a spin later today.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Linus Torvalds



On Sat, 31 May 2014, Josh Boyer wrote:
> 
> One of my machines got the lockdep report below when booting a kernel
> that contained these patches. 

I think this is just a lacking annotation.

We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the 
rule is that we lock things in topological order (parent dentry first). 
lock_parent() is very careful about that, but doesn't actually tell 
lockdep about it.

This trivial oneliner should fix it.

Linus

---
 fs/dcache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index bce851dc03ef..be2bea834bf4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -553,7 +553,7 @@ again:
}
rcu_read_unlock();
if (parent != dentry)
-   spin_lock(>d_lock);
+   spin_lock_nested(>d_lock, DENTRY_D_LOCK_NESTED);
else
parent = NULL;
return parent;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Josh Boyer

On Fri, May 30, 2014 at 1:14 PM, Al Viro  wrote:
> On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote:
>> On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
>> > On Fri, May 30, 2014 at 8:21 AM, Al Viro  wrote:
>> > >
>> > > Linus, how would you prefer it to be handled?
>> >
>> > I'll just have to do an rc8. I really hoped to avoid it, because we're
>> > going on our family vacation when school is out in two weeks, and it
>> > causes problems for the merge window, but it's not like there is much
>> > choice - I can't do a 3.15 release with a known regression like that.
>>
>> Sorry about that... ;-/
>>
>> > So just send me the pull request, and I'll pull it. I'll probably do
>> > the "let's increase the x86-64 stack size to 16kB" too, to close
>> > _that_ issue as well.
>>
>> OK, here it is:
>>
>> Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
>> list corruption; the root cause was that trylock of parent's ->d_lock could
>> be disrupted by d_walk() happening on other CPUs, resulting in
>> shrink_dentry_list() making no progress *and* the same d_walk() being called
>> again and again for as long as shrink_dentry_list() doesn't get past that
>> mess.  Solution is to have shrink_dentry_list() treat that trylock failure 
>> not
>> as "try to do the same thing again", but "lock them in the right order".
>> Please, pull from
>> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2
>
> *GYAH*
>
> Shortlog and diffstat are from the local branch that has two more cleanups
> on top what's been pushed to vfs.git (and what had been tested).  Ones
> matching what's really in that branch are here:
> Shortlog:
> Al Viro (6):
>   lift the "already marked killed" case into shrink_dentry_list()
>   split dentry_kill()
>   expand dentry_kill(dentry, 0) in shrink_dentry_list()
>   shrink_dentry_list(): take parent's ->d_lock earlier
>   dealing with the rest of shrink_dentry_list() livelock
>   dentry_kill() doesn't need the second argument now
>
> Diffstat:
>  fs/dcache.c |  153 
> ++--
>  1 file changed, 107 insertions(+), 46 deletions(-)
>
> My apologies - the script I'm using to generate shortlogs takes branch
> name as an argument, defaulting to HEAD, which was two commits past
> vfs/for-linus-2.  And no, I'm _not_ planning to push that followup stuff
> until the merge window.  Just to make sure: the branch to pull should have
> head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree
> as vfs.git#for-linus, which is what got testing.

One of my machines got the lockdep report below when booting a kernel
that contained these patches.  This corresponds to Linux
v3.15-rc7-102-g1487385edb55.

josh

[   11.205628] usbcore: registered new interface driver btusb
[   11.212994] systemd-journald[430]: Received request to flush
runtime journal from PID 1
[   11.230780] audit: type=1305 audit(140157.755:4): audit_pid=661
old=0 auid=4294967295 ses=4294967295
subj=system_u:system_r:auditd_t:s0 res=1
[   11.233853] usb 2-1.8.1.1: USB disconnect, device number 6

[   11.253251] =
[   11.253254] [ INFO: possible recursive locking detected ]
[   11.253257] 3.15.0-0.rc7.git4.1.fc21.x86_64 #1 Not tainted
[   11.253259] -
[   11.253261] systemd-udevd/448 is trying to acquire lock:
[   11.253264]  (&(>d_lockref.lock)->rlock){+.+...}, at:
[] lock_parent.part.21+0x59/0x69
[   11.253291]
but task is already holding lock:
[   11.253295]  (&(>d_lockref.lock)->rlock){+.+...}, at:
[] lock_parent.part.21+0x36/0x69
[   11.253308]
other info that might help us debug this:
[   11.253312]  Possible unsafe locking scenario:

[   11.253317]CPU0
[   11.253319]
[   11.253322]   lock(&(>d_lockref.lock)->rlock);
[   11.253327]   lock(&(>d_lockref.lock)->rlock);
[   11.253332]
 *** DEADLOCK ***

[   11.253337]  May be due to missing lock nesting notation

[   11.253342] 1 lock held by systemd-udevd/448:
[   11.253345]  #0:  (&(>d_lockref.lock)->rlock){+.+...}, at:
[] lock_parent.part.21+0x36/0x69
[   11.253359]
stack backtrace:
[   11.253366] CPU: 1 PID: 448 Comm: systemd-udevd Not tainted
3.15.0-0.rc7.git4.1.fc21.x86_64 #1
[   11.253371] Hardware name: Apple Inc.
MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS
MBP102.88Z.0106.B03.1211161133 11/16/2012
[   11.253375]   652a059c 88025f71b978
817d7dd3
[   11.253384]  825a8c60 88025f71ba50 810f9084
88003f83d8d8
[   11.253393]  825a8c60  
8802
[   11.253402] Call Trace:
[   11.253411]  [] dump_stack+0x4d/0x66
[   11.253422]  [] __lock_acquire+0x16e4/0x1ca0
[   11.253434]  [] ? native_sched_clock+0x35/0xa0
[   11.253443]  [] lock_acquire+0xa2/0x1d0
[   11.253452]  [] ? lock_parent.part.21+0x59/0x69
[

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Josh Boyer

On Fri, May 30, 2014 at 1:14 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote:
 On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
  On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote:
  
   Linus, how would you prefer it to be handled?
 
  I'll just have to do an rc8. I really hoped to avoid it, because we're
  going on our family vacation when school is out in two weeks, and it
  causes problems for the merge window, but it's not like there is much
  choice - I can't do a 3.15 release with a known regression like that.

 Sorry about that... ;-/

  So just send me the pull request, and I'll pull it. I'll probably do
  the let's increase the x86-64 stack size to 16kB too, to close
  _that_ issue as well.

 OK, here it is:

 Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
 list corruption; the root cause was that trylock of parent's -d_lock could
 be disrupted by d_walk() happening on other CPUs, resulting in
 shrink_dentry_list() making no progress *and* the same d_walk() being called
 again and again for as long as shrink_dentry_list() doesn't get past that
 mess.  Solution is to have shrink_dentry_list() treat that trylock failure 
 not
 as try to do the same thing again, but lock them in the right order.
 Please, pull from
 git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

 *GYAH*

 Shortlog and diffstat are from the local branch that has two more cleanups
 on top what's been pushed to vfs.git (and what had been tested).  Ones
 matching what's really in that branch are here:
 Shortlog:
 Al Viro (6):
   lift the already marked killed case into shrink_dentry_list()
   split dentry_kill()
   expand dentry_kill(dentry, 0) in shrink_dentry_list()
   shrink_dentry_list(): take parent's -d_lock earlier
   dealing with the rest of shrink_dentry_list() livelock
   dentry_kill() doesn't need the second argument now

 Diffstat:
  fs/dcache.c |  153 
 ++--
  1 file changed, 107 insertions(+), 46 deletions(-)

 My apologies - the script I'm using to generate shortlogs takes branch
 name as an argument, defaulting to HEAD, which was two commits past
 vfs/for-linus-2.  And no, I'm _not_ planning to push that followup stuff
 until the merge window.  Just to make sure: the branch to pull should have
 head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree
 as vfs.git#for-linus, which is what got testing.

One of my machines got the lockdep report below when booting a kernel
that contained these patches.  This corresponds to Linux
v3.15-rc7-102-g1487385edb55.

josh

[   11.205628] usbcore: registered new interface driver btusb
[   11.212994] systemd-journald[430]: Received request to flush
runtime journal from PID 1
[   11.230780] audit: type=1305 audit(140157.755:4): audit_pid=661
old=0 auid=4294967295 ses=4294967295
subj=system_u:system_r:auditd_t:s0 res=1
[   11.233853] usb 2-1.8.1.1: USB disconnect, device number 6

[   11.253251] =
[   11.253254] [ INFO: possible recursive locking detected ]
[   11.253257] 3.15.0-0.rc7.git4.1.fc21.x86_64 #1 Not tainted
[   11.253259] -
[   11.253261] systemd-udevd/448 is trying to acquire lock:
[   11.253264]  ((dentry-d_lockref.lock)-rlock){+.+...}, at:
[817d66c1] lock_parent.part.21+0x59/0x69
[   11.253291]
but task is already holding lock:
[   11.253295]  ((dentry-d_lockref.lock)-rlock){+.+...}, at:
[817d669e] lock_parent.part.21+0x36/0x69
[   11.253308]
other info that might help us debug this:
[   11.253312]  Possible unsafe locking scenario:

[   11.253317]CPU0
[   11.253319]
[   11.253322]   lock((dentry-d_lockref.lock)-rlock);
[   11.253327]   lock((dentry-d_lockref.lock)-rlock);
[   11.253332]
 *** DEADLOCK ***

[   11.253337]  May be due to missing lock nesting notation

[   11.253342] 1 lock held by systemd-udevd/448:
[   11.253345]  #0:  ((dentry-d_lockref.lock)-rlock){+.+...}, at:
[817d669e] lock_parent.part.21+0x36/0x69
[   11.253359]
stack backtrace:
[   11.253366] CPU: 1 PID: 448 Comm: systemd-udevd Not tainted
3.15.0-0.rc7.git4.1.fc21.x86_64 #1
[   11.253371] Hardware name: Apple Inc.
MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS
MBP102.88Z.0106.B03.1211161133 11/16/2012
[   11.253375]   652a059c 88025f71b978
817d7dd3
[   11.253384]  825a8c60 88025f71ba50 810f9084
88003f83d8d8
[   11.253393]  825a8c60  
8802
[   11.253402] Call Trace:
[   11.253411]  [817d7dd3] dump_stack+0x4d/0x66
[   11.253422]  [810f9084] __lock_acquire+0x16e4/0x1ca0
[   11.253434]  [81023595] ? native_sched_clock+0x35/0xa0
[   11.253443]  [810f9e32]

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Linus Torvalds



On Sat, 31 May 2014, Josh Boyer wrote:
 
 One of my machines got the lockdep report below when booting a kernel
 that contained these patches. 

I think this is just a lacking annotation.

We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the 
rule is that we lock things in topological order (parent dentry first). 
lock_parent() is very careful about that, but doesn't actually tell 
lockdep about it.

This trivial oneliner should fix it.

Linus

---
 fs/dcache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index bce851dc03ef..be2bea834bf4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -553,7 +553,7 @@ again:
}
rcu_read_unlock();
if (parent != dentry)
-   spin_lock(dentry-d_lock);
+   spin_lock_nested(dentry-d_lock, DENTRY_D_LOCK_NESTED);
else
parent = NULL;
return parent;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Josh Boyer

On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds
torva...@linux-foundation.org wrote:


 On Sat, 31 May 2014, Josh Boyer wrote:

 One of my machines got the lockdep report below when booting a kernel
 that contained these patches.

 I think this is just a lacking annotation.

 We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the
 rule is that we lock things in topological order (parent dentry first).
 lock_parent() is very careful about that, but doesn't actually tell
 lockdep about it.

 This trivial oneliner should fix it.

Thanks, I'll give it a spin later today.

josh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-31 Thread Josh Boyer

On Sat, May 31, 2014 at 10:48 AM, Linus Torvalds
torva...@linux-foundation.org wrote:


 On Sat, 31 May 2014, Josh Boyer wrote:

 One of my machines got the lockdep report below when booting a kernel
 that contained these patches.

 I think this is just a lacking annotation.

 We do nest dentry d_lock locking, and in order to avoid ABBA deadlocks the
 rule is that we lock things in topological order (parent dentry first).
 lock_parent() is very careful about that, but doesn't actually tell
 lockdep about it.

 This trivial oneliner should fix it.

 Linus

Built the previous kernel with just this applied and it seems to
behave correctly on several reboots.  I think this looks good.
Thanks!

josh


 ---
  fs/dcache.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/dcache.c b/fs/dcache.c
 index bce851dc03ef..be2bea834bf4 100644
 --- a/fs/dcache.c
 +++ b/fs/dcache.c
 @@ -553,7 +553,7 @@ again:
 }
 rcu_read_unlock();
 if (parent != dentry)
 -   spin_lock(dentry-d_lock);
 +   spin_lock_nested(dentry-d_lock, DENTRY_D_LOCK_NESTED);
 else
 parent = NULL;
 return parent;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Sedat Dilek

On Fri, May 30, 2014 at 6:48 PM, Al Viro  wrote:
> On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
>> On Fri, May 30, 2014 at 8:21 AM, Al Viro  wrote:
>> >
>> > Linus, how would you prefer it to be handled?
>>
>> I'll just have to do an rc8. I really hoped to avoid it, because we're
>> going on our family vacation when school is out in two weeks, and it
>> causes problems for the merge window, but it's not like there is much
>> choice - I can't do a 3.15 release with a known regression like that.
>
> Sorry about that... ;-/
>
>> So just send me the pull request, and I'll pull it. I'll probably do
>> the "let's increase the x86-64 stack size to 16kB" too, to close
>> _that_ issue as well.
>
> OK, here it is:
>
> Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
> list corruption; the root cause was that trylock of parent's ->d_lock could
> be disrupted by d_walk() happening on other CPUs, resulting in
> shrink_dentry_list() making no progress *and* the same d_walk() being called
> again and again for as long as shrink_dentry_list() doesn't get past that
> mess.  Solution is to have shrink_dentry_list() treat that trylock failure not
> as "try to do the same thing again", but "lock them in the right order".
> Please, pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2
>
> Shortlog:
> Al Viro (8):
>   lift the "already marked killed" case into shrink_dentry_list()
>   split dentry_kill()
>   expand dentry_kill(dentry, 0) in shrink_dentry_list()
>   shrink_dentry_list(): take parent's ->d_lock earlier
>   dealing with the rest of shrink_dentry_list() livelock
>   dentry_kill() doesn't need the second argument now
>   d_prune_alias(): just lock the parent and call __dentry_kill()
>   dcache.c: call ->d_prune() regardless of d_unhashed()
>
> Diffstat:
>  fs/dcache.c |  176 
> +---
>  1 file changed, 115 insertions(+), 61 deletions(-)
>

Did you push all?

$ git log --oneline v3.15-rc7..
8cbf74d dentry_kill() doesn't need the second argument now
b2b8019 dealing with the rest of shrink_dentry_list() livelock
046b961 shrink_dentry_list(): take parent's ->d_lock earlier
ff2fde9 expand dentry_kill(dentry, 0) in shrink_dentry_list()
e55fd01 split dentry_kill()
64fd72e lift the "already marked killed" case into shrink_dentry_list()
b6dd6f4 vfs: fix vmplice_to_user()

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote:
> On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
> > On Fri, May 30, 2014 at 8:21 AM, Al Viro  wrote:
> > >
> > > Linus, how would you prefer it to be handled?
> > 
> > I'll just have to do an rc8. I really hoped to avoid it, because we're
> > going on our family vacation when school is out in two weeks, and it
> > causes problems for the merge window, but it's not like there is much
> > choice - I can't do a 3.15 release with a known regression like that.
> 
> Sorry about that... ;-/
> 
> > So just send me the pull request, and I'll pull it. I'll probably do
> > the "let's increase the x86-64 stack size to 16kB" too, to close
> > _that_ issue as well.
> 
> OK, here it is:
> 
> Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
> list corruption; the root cause was that trylock of parent's ->d_lock could
> be disrupted by d_walk() happening on other CPUs, resulting in
> shrink_dentry_list() making no progress *and* the same d_walk() being called
> again and again for as long as shrink_dentry_list() doesn't get past that
> mess.  Solution is to have shrink_dentry_list() treat that trylock failure not
> as "try to do the same thing again", but "lock them in the right order".
> Please, pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

*GYAH*

Shortlog and diffstat are from the local branch that has two more cleanups
on top what's been pushed to vfs.git (and what had been tested).  Ones
matching what's really in that branch are here:
Shortlog:
Al Viro (6):
  lift the "already marked killed" case into shrink_dentry_list()
  split dentry_kill()
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  shrink_dentry_list(): take parent's ->d_lock earlier
  dealing with the rest of shrink_dentry_list() livelock
  dentry_kill() doesn't need the second argument now

Diffstat:
 fs/dcache.c |  153 
++--
 1 file changed, 107 insertions(+), 46 deletions(-)

My apologies - the script I'm using to generate shortlogs takes branch
name as an argument, defaulting to HEAD, which was two commits past
vfs/for-linus-2.  And no, I'm _not_ planning to push that followup stuff
until the merge window.  Just to make sure: the branch to pull should have
head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree
as vfs.git#for-linus, which is what got testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
> On Fri, May 30, 2014 at 8:21 AM, Al Viro  wrote:
> >
> > Linus, how would you prefer it to be handled?
> 
> I'll just have to do an rc8. I really hoped to avoid it, because we're
> going on our family vacation when school is out in two weeks, and it
> causes problems for the merge window, but it's not like there is much
> choice - I can't do a 3.15 release with a known regression like that.

Sorry about that... ;-/

> So just send me the pull request, and I'll pull it. I'll probably do
> the "let's increase the x86-64 stack size to 16kB" too, to close
> _that_ issue as well.

OK, here it is:

Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
list corruption; the root cause was that trylock of parent's ->d_lock could
be disrupted by d_walk() happening on other CPUs, resulting in
shrink_dentry_list() making no progress *and* the same d_walk() being called
again and again for as long as shrink_dentry_list() doesn't get past that
mess.  Solution is to have shrink_dentry_list() treat that trylock failure not
as "try to do the same thing again", but "lock them in the right order".
Please, pull from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

Shortlog:
Al Viro (8):
  lift the "already marked killed" case into shrink_dentry_list()
  split dentry_kill()
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  shrink_dentry_list(): take parent's ->d_lock earlier
  dealing with the rest of shrink_dentry_list() livelock
  dentry_kill() doesn't need the second argument now
  d_prune_alias(): just lock the parent and call __dentry_kill()
  dcache.c: call ->d_prune() regardless of d_unhashed()

Diffstat:
 fs/dcache.c |  176 
+---
 1 file changed, 115 insertions(+), 61 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Linus Torvalds

On Fri, May 30, 2014 at 8:21 AM, Al Viro  wrote:
>
> Linus, how would you prefer it to be handled?

I'll just have to do an rc8. I really hoped to avoid it, because we're
going on our family vacation when school is out in two weeks, and it
causes problems for the merge window, but it's not like there is much
choice - I can't do a 3.15 release with a known regression like that.

So just send me the pull request, and I'll pull it. I'll probably do
the "let's increase the x86-64 stack size to 16kB" too, to close
_that_ issue as well.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Fri, May 30, 2014 at 11:12:38AM +0300, Mika Westerberg wrote:

> Tested your latest #for-linus from here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus
> 
> and the livelock is gone,
> 
> Tested-by: Mika Westerberg 
> 
> Thanks again!

OK...  I've just pushed another branch (#for-linus-2) with lock_parent()
change folded into the commit that used to introduce rename_lock-based
variant.  Heads of the branches are byte-for-byte identical:

al@duke:~/linux/trees/vfs$ git log vfs/for-linus|head -1
commit ebc6cb92bfeddf25462842f64604bc9fd2aab5b0
al@duke:~/linux/trees/vfs$ git log vfs/for-linus-2|head -1
commit 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4
al@duke:~/linux/trees/vfs$ git diff ebc6cb92bfeddf25462842f64604bc9fd2aab5b0 
8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4
al@duke:~/linux/trees/vfs$

so testing done on one of them obviously applies to another.  FWIW, on
#for-linus-2 we have
Shortlog:
Al Viro (6):
  lift the "already marked killed" case into shrink_dentry_list()
  split dentry_kill()
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  shrink_dentry_list(): take parent's ->d_lock earlier
  dealing with the rest of shrink_dentry_list() livelock
  dentry_kill() doesn't need the second argument now

Diffstat:
 fs/dcache.c |  153 
++--
 1 file changed, 107 insertions(+), 46 deletions(-)

Linus, how would you prefer it to be handled?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Mika Westerberg

On Thu, May 29, 2014 at 07:52:01PM +0100, Al Viro wrote:
> On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote:
> > On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
> > > On Thu, May 29, 2014 at 9:23 AM, Al Viro  wrote:
> > > >
> > > > BTW, lock_parent() might be better off if in contended case it would not
> > > > bother with rename_lock and did something like this:
> > > > again:
> > > 
> > > Ack. I think that's much better.
> > 
> > Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
> > not if (parent)).  I'll wait with folding it back into the commit that
> > introduces lock_parent() until we get testing results...
> 
> Grrr...  Sadly, that's not good enough.  Leaking rcu_read_lock() on
> success is trivial, but there's more serious problem: suppose dentries
> involved get moved before we get to locking what we thought was parent.
> We end up taking ->d_lock on two dentries that might be nowhere near each
> other in the tree, with obvious nasty implications.  Would be _very_ hard
> to reproduce ;-/
> 
> AFAICS, the following would be safe, but I'd really appreciate any extra
> eyes on that sucker:
> 
> static inline struct dentry *lock_parent(struct dentry *dentry)
> {
> struct dentry *parent = dentry->d_parent;
> if (IS_ROOT(dentry))
> return NULL;
> if (likely(spin_trylock(>d_lock)))
> return parent;
> spin_unlock(>d_lock);
> rcu_read_lock();
> again:
> parent = ACCESS_ONCE(dentry->d_parent);
> spin_lock(>d_lock);
> /*
>  * We can't blindly lock dentry until we are sure
>  * that we won't violate the locking order.
>  * While parent->d_lock is not enough to stabilize
>* dentry->d_parent, it *is* enough to stabilize
>* dentry->d_parent == parent.
>  */
> if (unlikely(parent != dentry->d_parent)) {
> spin_unlock(>d_lock);
> goto again;
> }
> rcu_read_unlock();
> if (parent != dentry)
> spin_lock(>d_lock);
> else
> parent = NULL;
> return parent;
> }
> 
> That variant got force-pushed in place of the previous one, again at the
> head of #for-linus.  And I'm definitely not folding it in until it gets
> more review and testing.

Tested your latest #for-linus from here:

https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus

and the livelock is gone,

Tested-by: Mika Westerberg 

Thanks again!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Thu, May 29, 2014 at 10:00:49PM -0700, Linus Torvalds wrote:

> Yeah, I think that would be good. Except I think you should create a
> "dentry_is_dead()" helper function that then has that "if you hold the
> dentry or parent lock, this is safe" comment, because for lockref in
> general you do need to have the lock in the lockref itself. The fact
> that dentries have more locking is very much dentry-specific.

With how many callers?  There literally are only two places where we look
at that bit; one of them very tempting to convert to ->d_lockref.count != 0,
since reactions to positive and negative are very similar.  We also have
that bogus BUG_ON() you've mentioned (that one simply should die) and only
one place where we check it for being negative - autofs4_lookup_active().
And that one is better dealt with by taking removal from their active
list from ->d_release() to ->d_prune() (if not turning their ->d_release()
into ->d_prune() wholesale) and making ->d_prune() called for hashed and
unhashed alike (the only instance *already* checks for d_unhashed() and does
nothing in that case; no need to check that in fs/dcache.c).  With that done,
the check will be gone - all it does is filtering out the ones that are
already on the way out, but still hadn't reached ->d_release()).

IOW, it's not a widely used functionality and it's really not something
that should be ever needed outside of fs/dcache.c.  And in fs/dcache.c
we have one call site, so I'm not sure if even mentioning __lockref_not_dead()
would make much sense - (int)child->d_lockref.count < 0 might be better,
along with a comment about ->d_parent->d_lock serializing it against
lockref_mark_dead() in __dentry_kill() just as well as ->d_lock would...

Note that the only reason why autofs is playing those games is that they
keep references to dentries that do not contribute to refcount, rip them
out when dentry is killed and do that in the wrong method, which opens the
window when ->d_lock is already dropped and ->d_release() is inevitable
but yet to be called.  Solution: rip those references out before dropping
->d_lock, which is what ->d_prune() gives us.  To be fair, that code
predates ->d_prune() by several years (Jul 2008 and Oct 2011, resp.)

And "vfs: call d_op->d_prune() before unhashing dentry" has added very
odd checks for !d_unhashed(), despite ceph ->d_prune() being an explicit
no-op in other cases...

While we are at it, what the devil is d_prune_aliases() doing?  OK, we
grab ->d_lock and see that refcount is 0.  Then we call ->d_prune(),
bump refcount to 1, forcibly unhash the sucker, drop ->d_lock and ->i_lock
and call dput().  Which seems to be far too convoluted...  AFAICS, what we
want to do is
spin_lock(>d_lock);
if (!dentry->d_lockref.count) {
parent = lock_parent(dentry);
if (likely(!dentry->d_lockref.count)) {
__dentry_kill(dentry);
goto restart;
}
if (parent)
spin_unlock(>d_lock);
}
spin_unlock(>d_lock);
(which means that pulling ->i_lock trylock into __dentry_kill() is
probably not a good plan, more's the pity...)  And there goes this second
call site of ->d_prune() - after that it would be called exactly in one place,
right after __dentry_kill() has done lockref_mark_dead().  The whole reason
for calling it there was that forcible unhash used to force dput() to kill
the sucker has a side effect of messing ceph ->d_prune()...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Thu, May 29, 2014 at 10:00:49PM -0700, Linus Torvalds wrote:

 Yeah, I think that would be good. Except I think you should create a
 dentry_is_dead() helper function that then has that if you hold the
 dentry or parent lock, this is safe comment, because for lockref in
 general you do need to have the lock in the lockref itself. The fact
 that dentries have more locking is very much dentry-specific.

With how many callers?  There literally are only two places where we look
at that bit; one of them very tempting to convert to -d_lockref.count != 0,
since reactions to positive and negative are very similar.  We also have
that bogus BUG_ON() you've mentioned (that one simply should die) and only
one place where we check it for being negative - autofs4_lookup_active().
And that one is better dealt with by taking removal from their active
list from -d_release() to -d_prune() (if not turning their -d_release()
into -d_prune() wholesale) and making -d_prune() called for hashed and
unhashed alike (the only instance *already* checks for d_unhashed() and does
nothing in that case; no need to check that in fs/dcache.c).  With that done,
the check will be gone - all it does is filtering out the ones that are
already on the way out, but still hadn't reached -d_release()).

IOW, it's not a widely used functionality and it's really not something
that should be ever needed outside of fs/dcache.c.  And in fs/dcache.c
we have one call site, so I'm not sure if even mentioning __lockref_not_dead()
would make much sense - (int)child-d_lockref.count  0 might be better,
along with a comment about -d_parent-d_lock serializing it against
lockref_mark_dead() in __dentry_kill() just as well as -d_lock would...

Note that the only reason why autofs is playing those games is that they
keep references to dentries that do not contribute to refcount, rip them
out when dentry is killed and do that in the wrong method, which opens the
window when -d_lock is already dropped and -d_release() is inevitable
but yet to be called.  Solution: rip those references out before dropping
-d_lock, which is what -d_prune() gives us.  To be fair, that code
predates -d_prune() by several years (Jul 2008 and Oct 2011, resp.)

And vfs: call d_op-d_prune() before unhashing dentry has added very
odd checks for !d_unhashed(), despite ceph -d_prune() being an explicit
no-op in other cases...

While we are at it, what the devil is d_prune_aliases() doing?  OK, we
grab -d_lock and see that refcount is 0.  Then we call -d_prune(),
bump refcount to 1, forcibly unhash the sucker, drop -d_lock and -i_lock
and call dput().  Which seems to be far too convoluted...  AFAICS, what we
want to do is
spin_lock(dentry-d_lock);
if (!dentry-d_lockref.count) {
parent = lock_parent(dentry);
if (likely(!dentry-d_lockref.count)) {
__dentry_kill(dentry);
goto restart;
}
if (parent)
spin_unlock(parent-d_lock);
}
spin_unlock(dentry-d_lock);
(which means that pulling -i_lock trylock into __dentry_kill() is
probably not a good plan, more's the pity...)  And there goes this second
call site of -d_prune() - after that it would be called exactly in one place,
right after __dentry_kill() has done lockref_mark_dead().  The whole reason
for calling it there was that forcible unhash used to force dput() to kill
the sucker has a side effect of messing ceph -d_prune()...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Mika Westerberg

On Thu, May 29, 2014 at 07:52:01PM +0100, Al Viro wrote:
 On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote:
  On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
   On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote:
   
BTW, lock_parent() might be better off if in contended case it would not
bother with rename_lock and did something like this:
again:
   
   Ack. I think that's much better.
  
  Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
  not if (parent)).  I'll wait with folding it back into the commit that
  introduces lock_parent() until we get testing results...
 
 Grrr...  Sadly, that's not good enough.  Leaking rcu_read_lock() on
 success is trivial, but there's more serious problem: suppose dentries
 involved get moved before we get to locking what we thought was parent.
 We end up taking -d_lock on two dentries that might be nowhere near each
 other in the tree, with obvious nasty implications.  Would be _very_ hard
 to reproduce ;-/
 
 AFAICS, the following would be safe, but I'd really appreciate any extra
 eyes on that sucker:
 
 static inline struct dentry *lock_parent(struct dentry *dentry)
 {
 struct dentry *parent = dentry-d_parent;
 if (IS_ROOT(dentry))
 return NULL;
 if (likely(spin_trylock(parent-d_lock)))
 return parent;
 spin_unlock(dentry-d_lock);
 rcu_read_lock();
 again:
 parent = ACCESS_ONCE(dentry-d_parent);
 spin_lock(parent-d_lock);
 /*
  * We can't blindly lock dentry until we are sure
  * that we won't violate the locking order.
  * While parent-d_lock is not enough to stabilize
* dentry-d_parent, it *is* enough to stabilize
* dentry-d_parent == parent.
  */
 if (unlikely(parent != dentry-d_parent)) {
 spin_unlock(parent-d_lock);
 goto again;
 }
 rcu_read_unlock();
 if (parent != dentry)
 spin_lock(dentry-d_lock);
 else
 parent = NULL;
 return parent;
 }
 
 That variant got force-pushed in place of the previous one, again at the
 head of #for-linus.  And I'm definitely not folding it in until it gets
 more review and testing.

Tested your latest #for-linus from here:

https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus

and the livelock is gone,

Tested-by: Mika Westerberg mika.westerb...@linux.intel.com

Thanks again!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Fri, May 30, 2014 at 11:12:38AM +0300, Mika Westerberg wrote:

 Tested your latest #for-linus from here:
 
 https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-linus
 
 and the livelock is gone,
 
 Tested-by: Mika Westerberg mika.westerb...@linux.intel.com
 
 Thanks again!

OK...  I've just pushed another branch (#for-linus-2) with lock_parent()
change folded into the commit that used to introduce rename_lock-based
variant.  Heads of the branches are byte-for-byte identical:

al@duke:~/linux/trees/vfs$ git log vfs/for-linus|head -1
commit ebc6cb92bfeddf25462842f64604bc9fd2aab5b0
al@duke:~/linux/trees/vfs$ git log vfs/for-linus-2|head -1
commit 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4
al@duke:~/linux/trees/vfs$ git diff ebc6cb92bfeddf25462842f64604bc9fd2aab5b0 
8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4
al@duke:~/linux/trees/vfs$

so testing done on one of them obviously applies to another.  FWIW, on
#for-linus-2 we have
Shortlog:
Al Viro (6):
  lift the already marked killed case into shrink_dentry_list()
  split dentry_kill()
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  shrink_dentry_list(): take parent's -d_lock earlier
  dealing with the rest of shrink_dentry_list() livelock
  dentry_kill() doesn't need the second argument now

Diffstat:
 fs/dcache.c |  153 
++--
 1 file changed, 107 insertions(+), 46 deletions(-)

Linus, how would you prefer it to be handled?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Linus Torvalds

On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 Linus, how would you prefer it to be handled?

I'll just have to do an rc8. I really hoped to avoid it, because we're
going on our family vacation when school is out in two weeks, and it
causes problems for the merge window, but it's not like there is much
choice - I can't do a 3.15 release with a known regression like that.

So just send me the pull request, and I'll pull it. I'll probably do
the let's increase the x86-64 stack size to 16kB too, to close
_that_ issue as well.

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
 On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  Linus, how would you prefer it to be handled?
 
 I'll just have to do an rc8. I really hoped to avoid it, because we're
 going on our family vacation when school is out in two weeks, and it
 causes problems for the merge window, but it's not like there is much
 choice - I can't do a 3.15 release with a known regression like that.

Sorry about that... ;-/

 So just send me the pull request, and I'll pull it. I'll probably do
 the let's increase the x86-64 stack size to 16kB too, to close
 _that_ issue as well.

OK, here it is:

Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
list corruption; the root cause was that trylock of parent's -d_lock could
be disrupted by d_walk() happening on other CPUs, resulting in
shrink_dentry_list() making no progress *and* the same d_walk() being called
again and again for as long as shrink_dentry_list() doesn't get past that
mess.  Solution is to have shrink_dentry_list() treat that trylock failure not
as try to do the same thing again, but lock them in the right order.
Please, pull from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

Shortlog:
Al Viro (8):
  lift the already marked killed case into shrink_dentry_list()
  split dentry_kill()
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  shrink_dentry_list(): take parent's -d_lock earlier
  dealing with the rest of shrink_dentry_list() livelock
  dentry_kill() doesn't need the second argument now
  d_prune_alias(): just lock the parent and call __dentry_kill()
  dcache.c: call -d_prune() regardless of d_unhashed()

Diffstat:
 fs/dcache.c |  176 
+---
 1 file changed, 115 insertions(+), 61 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Al Viro

On Fri, May 30, 2014 at 05:48:16PM +0100, Al Viro wrote:
 On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
  On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote:
  
   Linus, how would you prefer it to be handled?
  
  I'll just have to do an rc8. I really hoped to avoid it, because we're
  going on our family vacation when school is out in two weeks, and it
  causes problems for the merge window, but it's not like there is much
  choice - I can't do a 3.15 release with a known regression like that.
 
 Sorry about that... ;-/
 
  So just send me the pull request, and I'll pull it. I'll probably do
  the let's increase the x86-64 stack size to 16kB too, to close
  _that_ issue as well.
 
 OK, here it is:
 
 Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
 list corruption; the root cause was that trylock of parent's -d_lock could
 be disrupted by d_walk() happening on other CPUs, resulting in
 shrink_dentry_list() making no progress *and* the same d_walk() being called
 again and again for as long as shrink_dentry_list() doesn't get past that
 mess.  Solution is to have shrink_dentry_list() treat that trylock failure not
 as try to do the same thing again, but lock them in the right order.
 Please, pull from
 git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

*GYAH*

Shortlog and diffstat are from the local branch that has two more cleanups
on top what's been pushed to vfs.git (and what had been tested).  Ones
matching what's really in that branch are here:
Shortlog:
Al Viro (6):
  lift the already marked killed case into shrink_dentry_list()
  split dentry_kill()
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  shrink_dentry_list(): take parent's -d_lock earlier
  dealing with the rest of shrink_dentry_list() livelock
  dentry_kill() doesn't need the second argument now

Diffstat:
 fs/dcache.c |  153 
++--
 1 file changed, 107 insertions(+), 46 deletions(-)

My apologies - the script I'm using to generate shortlogs takes branch
name as an argument, defaulting to HEAD, which was two commits past
vfs/for-linus-2.  And no, I'm _not_ planning to push that followup stuff
until the merge window.  Just to make sure: the branch to pull should have
head at 8cbf74da435d1bd13dbb790f94c7ff67b2fb6af4 and have the same tree
as vfs.git#for-linus, which is what got testing.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-30 Thread Sedat Dilek

On Fri, May 30, 2014 at 6:48 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Fri, May 30, 2014 at 08:31:30AM -0700, Linus Torvalds wrote:
 On Fri, May 30, 2014 at 8:21 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  Linus, how would you prefer it to be handled?

 I'll just have to do an rc8. I really hoped to avoid it, because we're
 going on our family vacation when school is out in two weeks, and it
 causes problems for the merge window, but it's not like there is much
 choice - I can't do a 3.15 release with a known regression like that.

 Sorry about that... ;-/

 So just send me the pull request, and I'll pull it. I'll probably do
 the let's increase the x86-64 stack size to 16kB too, to close
 _that_ issue as well.

 OK, here it is:

 Fixes for livelocks in shrink_dentry_list() introduced by fixes to shrink
 list corruption; the root cause was that trylock of parent's -d_lock could
 be disrupted by d_walk() happening on other CPUs, resulting in
 shrink_dentry_list() making no progress *and* the same d_walk() being called
 again and again for as long as shrink_dentry_list() doesn't get past that
 mess.  Solution is to have shrink_dentry_list() treat that trylock failure not
 as try to do the same thing again, but lock them in the right order.
 Please, pull from
 git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

 Shortlog:
 Al Viro (8):
   lift the already marked killed case into shrink_dentry_list()
   split dentry_kill()
   expand dentry_kill(dentry, 0) in shrink_dentry_list()
   shrink_dentry_list(): take parent's -d_lock earlier
   dealing with the rest of shrink_dentry_list() livelock
   dentry_kill() doesn't need the second argument now
   d_prune_alias(): just lock the parent and call __dentry_kill()
   dcache.c: call -d_prune() regardless of d_unhashed()

 Diffstat:
  fs/dcache.c |  176 
 +---
  1 file changed, 115 insertions(+), 61 deletions(-)


Did you push all?

$ git log --oneline v3.15-rc7..
8cbf74d dentry_kill() doesn't need the second argument now
b2b8019 dealing with the rest of shrink_dentry_list() livelock
046b961 shrink_dentry_list(): take parent's -d_lock earlier
ff2fde9 expand dentry_kill(dentry, 0) in shrink_dentry_list()
e55fd01 split dentry_kill()
64fd72e lift the already marked killed case into shrink_dentry_list()
b6dd6f4 vfs: fix vmplice_to_user()

- Sedat -
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 9:50 PM, Al Viro  wrote:
>
> BTW, how serious is the problem with __lockref_is_dead(>d_lockref)
> with only ->d_parent->d_lock held?  From my reading of lib/lockref.c it
> should be safe - we only do lockref_mark_dead() with ->d_parent->d_lock
> held, and it'll provide all the serialization and barriers we need.
>
> If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely

Yeah, I think that would be good. Except I think you should create a
"dentry_is_dead()" helper function that then has that "if you hold the
dentry or parent lock, this is safe" comment, because for lockref in
general you do need to have the lock in the lockref itself. The fact
that dentries have more locking is very much dentry-specific.

But with that, go wild. I'd love to get rid of some of the redundant stuff.

For another example, the

BUG_ON((int)dentry->d_lockref.count > 0);

test makes very little sense any more with lockrefs and the whole dead
marker (that should make sure that it never gets incremented), but
exists due to the direct conversion.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 12:14:51PM -0700, Linus Torvalds wrote:
> Yeah, I don't think you can reproduce that, but I guess renaming
> directories into each other (two renames needed) could trigger an ABBA
> deadlock by changing the topological order of dentry/parent.
> 
> I suspect there's no way in hell that tiny race will ever happen in
> practice, but let's not risk it.
> 
> And your solution (to re-check after just taking the parent lock)
> seems sufficient and sane, since dentry_lock_for_move() will always
> take the parent lock(s) before we move a dentry.
> 
> So that looks good to me.

BTW, how serious is the problem with __lockref_is_dead(>d_lockref)
with only ->d_parent->d_lock held?  From my reading of lib/lockref.c it
should be safe - we only do lockref_mark_dead() with ->d_parent->d_lock
held, and it'll provide all the serialization and barriers we need.

If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely and replace
checking for it with checking for negative ->d_lockref.count.  There are two
places where we check for it; in shrink_dentry_list() we definitely can go
that way (we are holding ->d_lock there) and it simplifies the code nicely.
In d_walk(), though (in the bit that used to be try_to_ascend() we only hold
->d_parent->d_lock.  It looks like that ought to be safe to replace
if (this_parent != child->d_parent ||  
 (child->d_flags & DCACHE_DENTRY_KILLED) ||
 need_seqretry(_lock, seq)) {
with
if (this_parent != child->d_parent ||  
 __lockref_is_dead(>d_lockref) ||
 need_seqretry(_lock, seq)) {
and remove DCACHE_DENTRY_KILLED completely...

The other user (in shrink_dentry_list()) simplifies to
if (dentry->d_lockref.count != 0) {
bool can_free = dentry->d_flags & DCACHE_MAY_FREE;
spin_unlock(>d_lock);
if (parent)
spin_unlock(>d_lock);
if (can_free)
dentry_free(dentry);
continue;
}
taking care of both the DCACHE_DENTRY_KILLED case and simple "lazy dget"
one, and that one's definitely safe and worth doing.

Would be nice if we could switch d_walk() one as well and kill that flag off,
though...

Basically, we have
spin_lock();
spin_lock();
V = 1;
lockref_mark_dead();
...
as the only place where R goes dead and we want to replace
spin_lock();
if (V)
...
with
spin_lock();
if (__lockref_is_dead())
...
Unless I'm missing something subtle in lockref.c, that should be safe...
Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 11:52 AM, Al Viro  wrote:
>
> Grrr...  Sadly, that's not good enough.  Leaking rcu_read_lock() on
> success is trivial, but there's more serious problem: suppose dentries
> involved get moved before we get to locking what we thought was parent.
> We end up taking ->d_lock on two dentries that might be nowhere near each
> other in the tree, with obvious nasty implications.  Would be _very_ hard
> to reproduce ;-/

Yeah, I don't think you can reproduce that, but I guess renaming
directories into each other (two renames needed) could trigger an ABBA
deadlock by changing the topological order of dentry/parent.

I suspect there's no way in hell that tiny race will ever happen in
practice, but let's not risk it.

And your solution (to re-check after just taking the parent lock)
seems sufficient and sane, since dentry_lock_for_move() will always
take the parent lock(s) before we move a dentry.

So that looks good to me.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote:
> On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
> > On Thu, May 29, 2014 at 9:23 AM, Al Viro  wrote:
> > >
> > > BTW, lock_parent() might be better off if in contended case it would not
> > > bother with rename_lock and did something like this:
> > > again:
> > 
> > Ack. I think that's much better.
> 
> Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
> not if (parent)).  I'll wait with folding it back into the commit that
> introduces lock_parent() until we get testing results...

Grrr...  Sadly, that's not good enough.  Leaking rcu_read_lock() on
success is trivial, but there's more serious problem: suppose dentries
involved get moved before we get to locking what we thought was parent.
We end up taking ->d_lock on two dentries that might be nowhere near each
other in the tree, with obvious nasty implications.  Would be _very_ hard
to reproduce ;-/

AFAICS, the following would be safe, but I'd really appreciate any extra
eyes on that sucker:

static inline struct dentry *lock_parent(struct dentry *dentry)
{
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
if (likely(spin_trylock(>d_lock)))
return parent;
spin_unlock(>d_lock);
rcu_read_lock();
again:
parent = ACCESS_ONCE(dentry->d_parent);
spin_lock(>d_lock);
/*
 * We can't blindly lock dentry until we are sure
 * that we won't violate the locking order.
 * While parent->d_lock is not enough to stabilize
 * dentry->d_parent, it *is* enough to stabilize
 * dentry->d_parent == parent.
 */
if (unlikely(parent != dentry->d_parent)) {
spin_unlock(>d_lock);
goto again;
}
rcu_read_unlock();
if (parent != dentry)
spin_lock(>d_lock);
else
parent = NULL;
return parent;
}

That variant got force-pushed in place of the previous one, again at the
head of #for-linus.  And I'm definitely not folding it in until it gets
more review and testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
> On Thu, May 29, 2014 at 9:23 AM, Al Viro  wrote:
> >
> > BTW, lock_parent() might be better off if in contended case it would not
> > bother with rename_lock and did something like this:
> > again:
> 
> Ack. I think that's much better.

Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
not if (parent)).  I'll wait with folding it back into the commit that
introduces lock_parent() until we get testing results...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 9:23 AM, Al Viro  wrote:
>
> BTW, lock_parent() might be better off if in contended case it would not
> bother with rename_lock and did something like this:
> again:

Ack. I think that's much better.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 04:44:54PM +0100, Al Viro wrote:
> On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote:
> > If so, though, that brings up two questions:
> > 
> >  (a) do we really want to be that aggressive? Can we ever traverse
> > _past_ the point we're actually trying to shrink in
> > shrink_dcache_parent()?
> 
> Caller of shrink_dcache_parent() would better hold a reference to the
> argument, or it might get freed right under us ;-)  So no, we can't
> go past that point - the subtree root will stay busy.
> 
> The reason we want to be aggressive there is to avoid excessive iterations -
> think what happens e.g. if we have a chain of N dentries, with nothing pinning
> them (i.e. the last one has refcount 0, the first - 2, everything else - 1).
> Simply doing dput() would result in O(N^2) vs. O(N)...
> 
> >  (b) why does the "dput()" (or rather, the dentry_kill()) locking
> > logic have to retain the old trylock case rather than share the parent
> > locking logic?
> > 
> > I'm assuming the answer to (b) is that we can't afford to drop the
> > dentry lock in dentry_kill(), but I'd like that answer to the "Why" to
> > be documented somewhere.
> 
> We actually might be able to do it that way (rechecking ->d_count after
> lock_parent()), but I would really prefer to leave that until after -final.
> I want to get profiling data from that first - dput() is a much hotter path
> than shrink_dcache_parent() and friends...

FWIW, I've just done more or less edible splitup of stuff past #for-linus -
see #experimental-dentry_kill for that.  Again, I really want to get
profiling data to see if that hurts dput() - it takes ->d_lock on parent
before the trylock on ->i_lock and in case of ->d_lock on parent being
held by somebody else it bangs on rename_lock.lock cacheline.  I'd expect
that to be non-issue on any loads, but we need something stronger than
my gut feelings...

BTW, lock_parent() might be better off if in contended case it would not
bother with rename_lock and did something like this:
again:
spin_unlock(>d_lock);
rcu_read_lock();
parent = ACCESS_ONCE(dentry->d_parent);
if (parent != dentry)
spin_lock(>d_lock);
spin_lock(>d_lock);
if (likely(dentry->d_parent == parent)) {
rcu_read_unlock();
return parent;
}
if (parent)
spin_unlock(>d_lock);
rcu_read_unlock();
goto again;

It's almost certainly not worth bothering with right now, but if dput()
starts using lock_parent(), it might be worth investigating...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote:
> If so, though, that brings up two questions:
> 
>  (a) do we really want to be that aggressive? Can we ever traverse
> _past_ the point we're actually trying to shrink in
> shrink_dcache_parent()?

Caller of shrink_dcache_parent() would better hold a reference to the
argument, or it might get freed right under us ;-)  So no, we can't
go past that point - the subtree root will stay busy.

The reason we want to be aggressive there is to avoid excessive iterations -
think what happens e.g. if we have a chain of N dentries, with nothing pinning
them (i.e. the last one has refcount 0, the first - 2, everything else - 1).
Simply doing dput() would result in O(N^2) vs. O(N)...

>  (b) why does the "dput()" (or rather, the dentry_kill()) locking
> logic have to retain the old trylock case rather than share the parent
> locking logic?
> 
> I'm assuming the answer to (b) is that we can't afford to drop the
> dentry lock in dentry_kill(), but I'd like that answer to the "Why" to
> be documented somewhere.

We actually might be able to do it that way (rechecking ->d_count after
lock_parent()), but I would really prefer to leave that until after -final.
I want to get profiling data from that first - dput() is a much hotter path
than shrink_dcache_parent() and friends...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 6:30 AM, Al Viro  wrote:
>
> Great...  OK, saner splitup of that sucker (equivalent to combination of
> these two patches) is in vfs.git#for-linus.
>
> Review and testing would be very welcome.

So looking at the "dealing with the rest of shrink_dentry_list()
livelock" patch, I think the "release the parents" case is now big and
complicated enough to be split into a function of its own.

However, I have a bigger question too: the "release the parent" case
_should_ be possible to do with just a "dput()" on the parent, and the
only reason we're doing the special case is that since we are
shrinking things, we try to be more aggressive and shrink all the
parents if possible. Right?

If so, though, that brings up two questions:

 (a) do we really want to be that aggressive? Can we ever traverse
_past_ the point we're actually trying to shrink in
shrink_dcache_parent()?

 (b) why does the "dput()" (or rather, the dentry_kill()) locking
logic have to retain the old trylock case rather than share the parent
locking logic?

I'm assuming the answer to (b) is that we can't afford to drop the
dentry lock in dentry_kill(), but I'd like that answer to the "Why" to
be documented somewhere.

I don't much care what the answer to (a) is, but again, it would be
good to have that mentioned somewhere.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Mika Westerberg

On Thu, May 29, 2014 at 02:30:36PM +0100, Al Viro wrote:
> On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote:
> 
> > With your both patches applied the problem is gone :-)
> > 
> > I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs
> > and didn't see the livelock at once.
> 
> Great...  OK, saner splitup of that sucker (equivalent to combination of
> these two patches) is in vfs.git#for-linus.
> 
> Review and testing would be very welcome.

I'll re-test that branch first thing tomorrow morning (currently at home
and the machine where this reproduces is my work desktop).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote:

> With your both patches applied the problem is gone :-)
> 
> I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs
> and didn't see the livelock at once.

Great...  OK, saner splitup of that sucker (equivalent to combination of
these two patches) is in vfs.git#for-linus.

Review and testing would be very welcome.

PS: The diff between that and aforementioned "both patches applied" is noise -
dentry and parent declarations had been moved into the loop body there,
while in #for-linus they are left alone.  So Mika's testing results should
apply to that one as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Mika Westerberg

On Thu, May 29, 2014 at 01:51:07PM +0300, Mika Westerberg wrote:
> On Thu, May 29, 2014 at 06:34:44AM +0100, Al Viro wrote:
> > On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote:
> > > On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote:
> > > > On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:
> > > > 
> > > > > OK, the warnings about averting your eyes very much apply; the thing 
> > > > > below
> > > > > definitely needs more massage before it becomes acceptable (and no, 
> > > > > it's
> > > > > not a single commit; I'm not that insane), but it changes behaviour 
> > > > > in the
> > > > > way described above.  Could you check if the livelock persists with 
> > > > > it?
> > > > > No trace-generating code in there, so the logs should be compact 
> > > > > enough...
> > > > 
> > > > Here's an updated patch, hopefully slightly less vomit-inducing.  Should
> > > > give the same behaviour as the previous one...  Again, it's a cumulative
> > > > diff - I'm still massaging the splitup here.
> > > 
> > > BTW, it still leaves the "proceed to parent" case in shrink_dentry_list();
> > > in theory, it's also vulnerable to the same livelock.  Can be dealt pretty
> > > much the same way; I'd rather leave that one for right after -final, 
> > > though,
> > > if the already posted variant turns out to be sufficient...
> > 
> > ... which is (presumably) dealt with the incremental I'd just sent to Linus;
> > seeing what kind of dumb mistakes I'm making, I'd better call it quits for
> > tonight - it's 1:30am here and I didn't have anywhere near enough sleep
> > yesterday.  I'd appeciate if you could test the patch immediately
> > upthread (from Message-ID: <20140529031149.ge18...@zeniv.linux.org.uk>)
> > and see if it helps.  There's an incremental on top of it (from
> > Message-ID: <20140529052621.gh18...@zeniv.linux.org.uk>) that might or
> > might not be a good idea.
> 
> Thanks for the patch.
> 
> I tested patch <20140529031149.ge18...@zeniv.linux.org.uk> and it seems
> to improve things. After first plug/unplug I can see similar behaviour
> but after a while it recovered. I did several iterations of plug/unplug
> afterwards and didn't see the livelock to trigger.
> 
> dmesg is attached.
> 
> I'm going to try your incremental patch now.

With your both patches applied the problem is gone :-)

I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs
and didn't see the livelock at once.

Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Mika Westerberg

On Thu, May 29, 2014 at 01:51:07PM +0300, Mika Westerberg wrote:
 On Thu, May 29, 2014 at 06:34:44AM +0100, Al Viro wrote:
  On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote:
   On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote:
On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:

 OK, the warnings about averting your eyes very much apply; the thing 
 below
 definitely needs more massage before it becomes acceptable (and no, 
 it's
 not a single commit; I'm not that insane), but it changes behaviour 
 in the
 way described above.  Could you check if the livelock persists with 
 it?
 No trace-generating code in there, so the logs should be compact 
 enough...

Here's an updated patch, hopefully slightly less vomit-inducing.  Should
give the same behaviour as the previous one...  Again, it's a cumulative
diff - I'm still massaging the splitup here.
   
   BTW, it still leaves the proceed to parent case in shrink_dentry_list();
   in theory, it's also vulnerable to the same livelock.  Can be dealt pretty
   much the same way; I'd rather leave that one for right after -final, 
   though,
   if the already posted variant turns out to be sufficient...
  
  ... which is (presumably) dealt with the incremental I'd just sent to Linus;
  seeing what kind of dumb mistakes I'm making, I'd better call it quits for
  tonight - it's 1:30am here and I didn't have anywhere near enough sleep
  yesterday.  I'd appeciate if you could test the patch immediately
  upthread (from Message-ID: 20140529031149.ge18...@zeniv.linux.org.uk)
  and see if it helps.  There's an incremental on top of it (from
  Message-ID: 20140529052621.gh18...@zeniv.linux.org.uk) that might or
  might not be a good idea.
 
 Thanks for the patch.
 
 I tested patch 20140529031149.ge18...@zeniv.linux.org.uk and it seems
 to improve things. After first plug/unplug I can see similar behaviour
 but after a while it recovered. I did several iterations of plug/unplug
 afterwards and didn't see the livelock to trigger.
 
 dmesg is attached.
 
 I'm going to try your incremental patch now.

With your both patches applied the problem is gone :-)

I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs
and didn't see the livelock at once.

Thanks a lot!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote:

 With your both patches applied the problem is gone :-)
 
 I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs
 and didn't see the livelock at once.

Great...  OK, saner splitup of that sucker (equivalent to combination of
these two patches) is in vfs.git#for-linus.

Review and testing would be very welcome.

PS: The diff between that and aforementioned both patches applied is noise -
dentry and parent declarations had been moved into the loop body there,
while in #for-linus they are left alone.  So Mika's testing results should
apply to that one as well.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Mika Westerberg

On Thu, May 29, 2014 at 02:30:36PM +0100, Al Viro wrote:
 On Thu, May 29, 2014 at 02:04:39PM +0300, Mika Westerberg wrote:
 
  With your both patches applied the problem is gone :-)
  
  I did 20 plug/unplugs, rebooted the machine and another 20 plug/unplugs
  and didn't see the livelock at once.
 
 Great...  OK, saner splitup of that sucker (equivalent to combination of
 these two patches) is in vfs.git#for-linus.
 
 Review and testing would be very welcome.

I'll re-test that branch first thing tomorrow morning (currently at home
and the machine where this reproduces is my work desktop).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 6:30 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 Great...  OK, saner splitup of that sucker (equivalent to combination of
 these two patches) is in vfs.git#for-linus.

 Review and testing would be very welcome.

So looking at the dealing with the rest of shrink_dentry_list()
livelock patch, I think the release the parents case is now big and
complicated enough to be split into a function of its own.

However, I have a bigger question too: the release the parent case
_should_ be possible to do with just a dput() on the parent, and the
only reason we're doing the special case is that since we are
shrinking things, we try to be more aggressive and shrink all the
parents if possible. Right?

If so, though, that brings up two questions:

 (a) do we really want to be that aggressive? Can we ever traverse
_past_ the point we're actually trying to shrink in
shrink_dcache_parent()?

 (b) why does the dput() (or rather, the dentry_kill()) locking
logic have to retain the old trylock case rather than share the parent
locking logic?

I'm assuming the answer to (b) is that we can't afford to drop the
dentry lock in dentry_kill(), but I'd like that answer to the Why to
be documented somewhere.

I don't much care what the answer to (a) is, but again, it would be
good to have that mentioned somewhere.

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote:
 If so, though, that brings up two questions:
 
  (a) do we really want to be that aggressive? Can we ever traverse
 _past_ the point we're actually trying to shrink in
 shrink_dcache_parent()?

Caller of shrink_dcache_parent() would better hold a reference to the
argument, or it might get freed right under us ;-)  So no, we can't
go past that point - the subtree root will stay busy.

The reason we want to be aggressive there is to avoid excessive iterations -
think what happens e.g. if we have a chain of N dentries, with nothing pinning
them (i.e. the last one has refcount 0, the first - 2, everything else - 1).
Simply doing dput() would result in O(N^2) vs. O(N)...

  (b) why does the dput() (or rather, the dentry_kill()) locking
 logic have to retain the old trylock case rather than share the parent
 locking logic?
 
 I'm assuming the answer to (b) is that we can't afford to drop the
 dentry lock in dentry_kill(), but I'd like that answer to the Why to
 be documented somewhere.

We actually might be able to do it that way (rechecking -d_count after
lock_parent()), but I would really prefer to leave that until after -final.
I want to get profiling data from that first - dput() is a much hotter path
than shrink_dcache_parent() and friends...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 04:44:54PM +0100, Al Viro wrote:
 On Thu, May 29, 2014 at 08:10:57AM -0700, Linus Torvalds wrote:
  If so, though, that brings up two questions:
  
   (a) do we really want to be that aggressive? Can we ever traverse
  _past_ the point we're actually trying to shrink in
  shrink_dcache_parent()?
 
 Caller of shrink_dcache_parent() would better hold a reference to the
 argument, or it might get freed right under us ;-)  So no, we can't
 go past that point - the subtree root will stay busy.
 
 The reason we want to be aggressive there is to avoid excessive iterations -
 think what happens e.g. if we have a chain of N dentries, with nothing pinning
 them (i.e. the last one has refcount 0, the first - 2, everything else - 1).
 Simply doing dput() would result in O(N^2) vs. O(N)...
 
   (b) why does the dput() (or rather, the dentry_kill()) locking
  logic have to retain the old trylock case rather than share the parent
  locking logic?
  
  I'm assuming the answer to (b) is that we can't afford to drop the
  dentry lock in dentry_kill(), but I'd like that answer to the Why to
  be documented somewhere.
 
 We actually might be able to do it that way (rechecking -d_count after
 lock_parent()), but I would really prefer to leave that until after -final.
 I want to get profiling data from that first - dput() is a much hotter path
 than shrink_dcache_parent() and friends...

FWIW, I've just done more or less edible splitup of stuff past #for-linus -
see #experimental-dentry_kill for that.  Again, I really want to get
profiling data to see if that hurts dput() - it takes -d_lock on parent
before the trylock on -i_lock and in case of -d_lock on parent being
held by somebody else it bangs on rename_lock.lock cacheline.  I'd expect
that to be non-issue on any loads, but we need something stronger than
my gut feelings...

BTW, lock_parent() might be better off if in contended case it would not
bother with rename_lock and did something like this:
again:
spin_unlock(dentry-d_lock);
rcu_read_lock();
parent = ACCESS_ONCE(dentry-d_parent);
if (parent != dentry)
spin_lock(parent-d_lock);
spin_lock(dentry-d_lock);
if (likely(dentry-d_parent == parent)) {
rcu_read_unlock();
return parent;
}
if (parent)
spin_unlock(parent-d_lock);
rcu_read_unlock();
goto again;

It's almost certainly not worth bothering with right now, but if dput()
starts using lock_parent(), it might be worth investigating...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 BTW, lock_parent() might be better off if in contended case it would not
 bother with rename_lock and did something like this:
 again:

Ack. I think that's much better.

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
 On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  BTW, lock_parent() might be better off if in contended case it would not
  bother with rename_lock and did something like this:
  again:
 
 Ack. I think that's much better.

Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
not if (parent)).  I'll wait with folding it back into the commit that
introduces lock_parent() until we get testing results...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 05:53:51PM +0100, Al Viro wrote:
 On Thu, May 29, 2014 at 09:29:42AM -0700, Linus Torvalds wrote:
  On Thu, May 29, 2014 at 9:23 AM, Al Viro v...@zeniv.linux.org.uk wrote:
  
   BTW, lock_parent() might be better off if in contended case it would not
   bother with rename_lock and did something like this:
   again:
  
  Ack. I think that's much better.
 
 Pushed to #for-linus (with dumb braino fixed - it's if (parent != dentry),
 not if (parent)).  I'll wait with folding it back into the commit that
 introduces lock_parent() until we get testing results...

Grrr...  Sadly, that's not good enough.  Leaking rcu_read_lock() on
success is trivial, but there's more serious problem: suppose dentries
involved get moved before we get to locking what we thought was parent.
We end up taking -d_lock on two dentries that might be nowhere near each
other in the tree, with obvious nasty implications.  Would be _very_ hard
to reproduce ;-/

AFAICS, the following would be safe, but I'd really appreciate any extra
eyes on that sucker:

static inline struct dentry *lock_parent(struct dentry *dentry)
{
struct dentry *parent = dentry-d_parent;
if (IS_ROOT(dentry))
return NULL;
if (likely(spin_trylock(parent-d_lock)))
return parent;
spin_unlock(dentry-d_lock);
rcu_read_lock();
again:
parent = ACCESS_ONCE(dentry-d_parent);
spin_lock(parent-d_lock);
/*
 * We can't blindly lock dentry until we are sure
 * that we won't violate the locking order.
 * While parent-d_lock is not enough to stabilize
 * dentry-d_parent, it *is* enough to stabilize
 * dentry-d_parent == parent.
 */
if (unlikely(parent != dentry-d_parent)) {
spin_unlock(parent-d_lock);
goto again;
}
rcu_read_unlock();
if (parent != dentry)
spin_lock(dentry-d_lock);
else
parent = NULL;
return parent;
}

That variant got force-pushed in place of the previous one, again at the
head of #for-linus.  And I'm definitely not folding it in until it gets
more review and testing.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 11:52 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 Grrr...  Sadly, that's not good enough.  Leaking rcu_read_lock() on
 success is trivial, but there's more serious problem: suppose dentries
 involved get moved before we get to locking what we thought was parent.
 We end up taking -d_lock on two dentries that might be nowhere near each
 other in the tree, with obvious nasty implications.  Would be _very_ hard
 to reproduce ;-/

Yeah, I don't think you can reproduce that, but I guess renaming
directories into each other (two renames needed) could trigger an ABBA
deadlock by changing the topological order of dentry/parent.

I suspect there's no way in hell that tiny race will ever happen in
practice, but let's not risk it.

And your solution (to re-check after just taking the parent lock)
seems sufficient and sane, since dentry_lock_for_move() will always
take the parent lock(s) before we move a dentry.

So that looks good to me.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Al Viro

On Thu, May 29, 2014 at 12:14:51PM -0700, Linus Torvalds wrote:
 Yeah, I don't think you can reproduce that, but I guess renaming
 directories into each other (two renames needed) could trigger an ABBA
 deadlock by changing the topological order of dentry/parent.
 
 I suspect there's no way in hell that tiny race will ever happen in
 practice, but let's not risk it.
 
 And your solution (to re-check after just taking the parent lock)
 seems sufficient and sane, since dentry_lock_for_move() will always
 take the parent lock(s) before we move a dentry.
 
 So that looks good to me.

BTW, how serious is the problem with __lockref_is_dead(dentry-d_lockref)
with only -d_parent-d_lock held?  From my reading of lib/lockref.c it
should be safe - we only do lockref_mark_dead() with -d_parent-d_lock
held, and it'll provide all the serialization and barriers we need.

If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely and replace
checking for it with checking for negative -d_lockref.count.  There are two
places where we check for it; in shrink_dentry_list() we definitely can go
that way (we are holding -d_lock there) and it simplifies the code nicely.
In d_walk(), though (in the bit that used to be try_to_ascend() we only hold
-d_parent-d_lock.  It looks like that ought to be safe to replace
if (this_parent != child-d_parent ||  
 (child-d_flags  DCACHE_DENTRY_KILLED) ||
 need_seqretry(rename_lock, seq)) {
with
if (this_parent != child-d_parent ||  
 __lockref_is_dead(child-d_lockref) ||
 need_seqretry(rename_lock, seq)) {
and remove DCACHE_DENTRY_KILLED completely...

The other user (in shrink_dentry_list()) simplifies to
if (dentry-d_lockref.count != 0) {
bool can_free = dentry-d_flags  DCACHE_MAY_FREE;
spin_unlock(dentry-d_lock);
if (parent)
spin_unlock(parent-d_lock);
if (can_free)
dentry_free(dentry);
continue;
}
taking care of both the DCACHE_DENTRY_KILLED case and simple lazy dget
one, and that one's definitely safe and worth doing.

Would be nice if we could switch d_walk() one as well and kill that flag off,
though...

Basically, we have
spin_lock(A);
spin_lock(R.lock);
V = 1;
lockref_mark_dead(R);
...
as the only place where R goes dead and we want to replace
spin_lock(A);
if (V)
...
with
spin_lock(A);
if (__lockref_is_dead(R))
...
Unless I'm missing something subtle in lockref.c, that should be safe...
Comments?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-29 Thread Linus Torvalds

On Thu, May 29, 2014 at 9:50 PM, Al Viro v...@zeniv.linux.org.uk wrote:

 BTW, how serious is the problem with __lockref_is_dead(dentry-d_lockref)
 with only -d_parent-d_lock held?  From my reading of lib/lockref.c it
 should be safe - we only do lockref_mark_dead() with -d_parent-d_lock
 held, and it'll provide all the serialization and barriers we need.

 If I'm right, we could get rid of DCACHE_DENTRY_KILLED completely

Yeah, I think that would be good. Except I think you should create a
dentry_is_dead() helper function that then has that if you hold the
dentry or parent lock, this is safe comment, because for lockref in
general you do need to have the lock in the lockref itself. The fact
that dentries have more locking is very much dentry-specific.

But with that, go wild. I'd love to get rid of some of the redundant stuff.

For another example, the

BUG_ON((int)dentry-d_lockref.count  0);

test makes very little sense any more with lockrefs and the whole dead
marker (that should make sure that it never gets incremented), but
exists due to the direct conversion.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote:
> On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote:
> > On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:
> > 
> > > OK, the warnings about averting your eyes very much apply; the thing below
> > > definitely needs more massage before it becomes acceptable (and no, it's
> > > not a single commit; I'm not that insane), but it changes behaviour in the
> > > way described above.  Could you check if the livelock persists with it?
> > > No trace-generating code in there, so the logs should be compact enough...
> > 
> > Here's an updated patch, hopefully slightly less vomit-inducing.  Should
> > give the same behaviour as the previous one...  Again, it's a cumulative
> > diff - I'm still massaging the splitup here.
> 
> BTW, it still leaves the "proceed to parent" case in shrink_dentry_list();
> in theory, it's also vulnerable to the same livelock.  Can be dealt pretty
> much the same way; I'd rather leave that one for right after -final, though,
> if the already posted variant turns out to be sufficient...

... which is (presumably) dealt with the incremental I'd just sent to Linus;
seeing what kind of dumb mistakes I'm making, I'd better call it quits for
tonight - it's 1:30am here and I didn't have anywhere near enough sleep
yesterday.  I'd appeciate if you could test the patch immediately
upthread (from Message-ID: <20140529031149.ge18...@zeniv.linux.org.uk>)
and see if it helps.  There's an incremental on top of it (from
Message-ID: <20140529052621.gh18...@zeniv.linux.org.uk>) that might or
might not be a good idea.

I'm crawling to bed right now; back in ~7 hours...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Thu, May 29, 2014 at 06:16:47AM +0100, Al Viro wrote:
> On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote:
> > On Wed, May 28, 2014 at 8:11 PM, Al Viro  wrote:
> > >
> > > Here's an updated patch, hopefully slightly less vomit-inducing.
> > 
> > Hmm. Less vomit-inducing, except for this part:
> 
> > Ugh, that just *screams* for a helper function. Something like
> > 
> > parent = get_parent_and_lock(dentry);
> > 
> > or whatever, with that trylock/renamelock dance separated out. The
> > rule would be that it would lock the "dentry", and return the
> > (similarly locked) parent. Or NULL for a root dentry, of course.
> > 
> > Please?
> 
> Already done in my current tree; see below for the next commit in there...
> I can separate that helper from the rest (dealing with the same livelock
> for dentry_kill(dentry, 1) caller in shrink_dentry_list()).  All of that
> is very much subject to reordering, resplitting, writing more decent commit
> messages, etc.

Except that it's from the wrong branch ;-/  Correct one follows:

commit fb860957449c07c6f1a1dd911e83a635b6f11f21
Author: Al Viro 
Date:   Thu May 29 00:37:49 2014 -0400

deal with "put the parent" side of shrink_dentry_list()

diff --git a/fs/dcache.c b/fs/dcache.c
index 6c2a92e..628a791 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -793,30 +793,34 @@ restart:
 }
 EXPORT_SYMBOL(d_prune_aliases);
 
-static void shrink_dentry_list(struct list_head *list)
+static inline struct dentry *lock_parent(struct dentry *dentry)
 {
-   struct dentry *dentry, *parent;
+   struct dentry *parent = dentry->d_parent;
+   if (IS_ROOT(dentry))
+   return NULL;
+   if (unlikely(!spin_trylock(>d_lock))) {
+   spin_unlock(>d_lock);
+   read_seqlock_excl(_lock);
+   parent = NULL;
+   if (!IS_ROOT(dentry)) {
+   parent = dentry->d_parent;
+   spin_lock(>d_lock);
+   }
+   read_sequnlock_excl(_lock);
+   spin_lock(>d_lock);
+   }
+   return parent;
+}
 
+static void shrink_dentry_list(struct list_head *list)
+{
while (!list_empty(list)) {
struct inode *inode;
-   dentry = list_entry(list->prev, struct dentry, d_lru);
+   struct dentry *dentry = list_entry(list->prev, struct dentry, 
d_lru);
+   struct dentry *parent;
 
-   parent = NULL;
spin_lock(>d_lock);
-   if (!IS_ROOT(dentry)) {
-   parent = dentry->d_parent;
-   if (unlikely(!spin_trylock(>d_lock))) {
-   spin_unlock(>d_lock);
-   parent = NULL;
-   read_seqlock_excl(_lock);
-   if (!IS_ROOT(dentry)) {
-   parent = dentry->d_parent;
-   spin_lock(>d_lock);
-   }
-   read_sequnlock_excl(_lock);
-   spin_lock(>d_lock);
-   }
-   }
+   parent = lock_parent(dentry);
 
/*
 * The dispose list is isolated and dentries are not accounted
@@ -864,8 +868,26 @@ static void shrink_dentry_list(struct list_head *list)
 * fragmentation.
 */
dentry = parent;
-   while (dentry && !lockref_put_or_lock(>d_lockref))
-   dentry = dentry_kill(dentry);
+   while (dentry && !lockref_put_or_lock(>d_lockref)) {
+   parent = lock_parent(dentry);
+   if (dentry->d_lockref.count != 1) {
+   dentry->d_lockref.count--;
+   spin_unlock(>d_lock);
+   if (parent)
+   spin_unlock(>d_lock);
+   break;
+   }
+   inode = dentry->d_inode;/* can't be NULL */
+   if (unlikely(!spin_trylock(>i_lock))) {
+   spin_unlock(>d_lock);
+   if (parent)
+   spin_unlock(>d_lock);
+   cpu_relax();
+   continue;
+   }
+   __dentry_kill(dentry);
+   dentry = parent;
+   }
}
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote:
> On Wed, May 28, 2014 at 8:11 PM, Al Viro  wrote:
> >
> > Here's an updated patch, hopefully slightly less vomit-inducing.
> 
> Hmm. Less vomit-inducing, except for this part:

> Ugh, that just *screams* for a helper function. Something like
> 
> parent = get_parent_and_lock(dentry);
> 
> or whatever, with that trylock/renamelock dance separated out. The
> rule would be that it would lock the "dentry", and return the
> (similarly locked) parent. Or NULL for a root dentry, of course.
> 
> Please?

Already done in my current tree; see below for the next commit in there...
I can separate that helper from the rest (dealing with the same livelock
for dentry_kill(dentry, 1) caller in shrink_dentry_list()).  All of that
is very much subject to reordering, resplitting, writing more decent commit
messages, etc.

commit 67e2554f6ca62e20363139f6f2968063410f0f3b
Author: Al Viro 
Date:   Wed May 28 23:53:36 2014 -0400

deal with "put the parent" side of shrink_dentry_list()

Signed-off-by: Al Viro 

diff --git a/fs/dcache.c b/fs/dcache.c
index 6c2a92e..7434169 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -793,30 +793,31 @@ restart:
 }
 EXPORT_SYMBOL(d_prune_aliases);
 
-static void shrink_dentry_list(struct list_head *list)
+static inline struct dentry *lock_parent(struct dentry *dentry)
 {
-   struct dentry *dentry, *parent;
-
-   while (!list_empty(list)) {
-   struct inode *inode;
-   dentry = list_entry(list->prev, struct dentry, d_lru);
-
+   struct dentry *parent = dentry->d_parent;
+   if (IS_ROOT(dentry))
+   return NULL;
+   if (unlikely(!spin_trylock(>d_lock))) {
+   spin_unlock(>d_lock);
+   read_seqlock_excl(_lock);
parent = NULL;
-   spin_lock(>d_lock);
if (!IS_ROOT(dentry)) {
parent = dentry->d_parent;
-   if (unlikely(!spin_trylock(>d_lock))) {
-   spin_unlock(>d_lock);
-   parent = NULL;
-   read_seqlock_excl(_lock);
-   if (!IS_ROOT(dentry)) {
-   parent = dentry->d_parent;
-   spin_lock(>d_lock);
-   }
-   read_sequnlock_excl(_lock);
-   spin_lock(>d_lock);
-   }
+   spin_lock(>d_lock);
}
+   read_sequnlock_excl(_lock);
+   spin_lock(>d_lock);
+   }
+   return parent;
+}
+
+static void shrink_dentry_list(struct list_head *list)
+{
+   while (!list_empty(list)) {
+   struct inode *inode;
+   struct dentry *dentry = list_entry(list->prev, struct dentry, 
d_lru);
+   struct dentry *parent = lock_parent(dentry);
 
/*
 * The dispose list is isolated and dentries are not accounted
@@ -864,8 +865,26 @@ static void shrink_dentry_list(struct list_head *list)
 * fragmentation.
 */
dentry = parent;
-   while (dentry && !lockref_put_or_lock(>d_lockref))
-   dentry = dentry_kill(dentry);
+   while (dentry && !lockref_put_or_lock(>d_lockref)) {
+   parent = lock_parent(dentry);
+   if (dentry->d_lockref.count != 1) {
+   dentry->d_lockref.count--;
+   spin_unlock(>d_lock);
+   if (parent)
+   spin_unlock(>d_lock);
+   break;
+   }
+   inode = dentry->d_inode;/* can't be NULL */
+   if (unlikely(!spin_trylock(>i_lock))) {
+   spin_unlock(>d_lock);
+   if (parent)
+   spin_unlock(>d_lock);
+   cpu_relax();
+   continue;
+   }
+   __dentry_kill(dentry);
+   dentry = parent;
+   }
}
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 8:11 PM, Al Viro  wrote:
>
> Here's an updated patch, hopefully slightly less vomit-inducing.

Hmm. Less vomit-inducing, except for this part:

> dentry = list_entry(list->prev, struct dentry, d_lru);
> +
> +   parent = NULL;
> spin_lock(>d_lock);
> +   if (!IS_ROOT(dentry)) {
> +   parent = dentry->d_parent;
> +   if (unlikely(!spin_trylock(>d_lock))) {
> +   spin_unlock(>d_lock);
> +   parent = NULL;
> +   read_seqlock_excl(_lock);
> +   if (!IS_ROOT(dentry)) {
> +   parent = dentry->d_parent;
> +   spin_lock(>d_lock);
> +   }
> +   read_sequnlock_excl(_lock);
> +   spin_lock(>d_lock);
> +   }
> +   }

Ugh, that just *screams* for a helper function. Something like

parent = get_parent_and_lock(dentry);

or whatever, with that trylock/renamelock dance separated out. The
rule would be that it would lock the "dentry", and return the
(similarly locked) parent. Or NULL for a root dentry, of course.

Please?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote:
> On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:
> 
> > OK, the warnings about averting your eyes very much apply; the thing below
> > definitely needs more massage before it becomes acceptable (and no, it's
> > not a single commit; I'm not that insane), but it changes behaviour in the
> > way described above.  Could you check if the livelock persists with it?
> > No trace-generating code in there, so the logs should be compact enough...
> 
> Here's an updated patch, hopefully slightly less vomit-inducing.  Should
> give the same behaviour as the previous one...  Again, it's a cumulative
> diff - I'm still massaging the splitup here.

BTW, it still leaves the "proceed to parent" case in shrink_dentry_list();
in theory, it's also vulnerable to the same livelock.  Can be dealt pretty
much the same way; I'd rather leave that one for right after -final, though,
if the already posted variant turns out to be sufficient...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:

> OK, the warnings about averting your eyes very much apply; the thing below
> definitely needs more massage before it becomes acceptable (and no, it's
> not a single commit; I'm not that insane), but it changes behaviour in the
> way described above.  Could you check if the livelock persists with it?
> No trace-generating code in there, so the logs should be compact enough...

Here's an updated patch, hopefully slightly less vomit-inducing.  Should
give the same behaviour as the previous one...  Again, it's a cumulative
diff - I'm still massaging the splitup here.

BTW, there's an interesting potential gotcha for ->d_delete() instances -
it *is* possible to have it called several times (returning true every
time it's called) for the same dentry.  Suppose dput() is called and
races with d_find_alias().  We get the following sequence:
CPU1:   dput sees d_count about to reach 0
CPU1:   dput grabs ->d_lock
CPU1:   dput calls ->d_delete() and gets told that it shouldn't survive
CPU2:   d_find_alias() grabs ->i_lock
CPU1:   dput calls dentry_kill()
CPU1:   dentry_kill() tries to grab ->i_lock and fails
CPU2:   d_find_alias() calls __d_find_alias()
CPU1:   dentry_kill() drops ->d_lock and returns the same dentry
CPU2:   __d_find_alias() finds our dentry, grabs ->d_lock and bumps ->d_count
CPU1:   dput() finds ->d_count greater than 1, decrements it and returns
CPU2:   eventually calls dput(), leading to another call of ->d_delete().
Moral: don't assume that ->d_delete() returning 1 means that dentry won't
survive; it's not a good place to destroy any data structures, etc.
Fortunately, we have only 11 instances in the tree and only one has code
making that assumption, the code in question being ifdefed out (lustre ;-/)
Out-of-tree folks might easily step into that trap, so it's probably worth
documenting - the comment in ll_ddelete() seems to indicate that its authors
had only noticed the problem with locking environment, not that it is the
wrong place for such code in the first place...

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..6c2a92e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_drop);

-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
-   __releases(dentry->d_lock)
+static void __dentry_kill(struct dentry *dentry)
 {
-   struct inode *inode;
struct dentry *parent = NULL;
bool can_free = true;
-
-   if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
-   can_free = dentry->d_flags & DCACHE_MAY_FREE;
-   spin_unlock(>d_lock);
-   goto out;
-   }
-
-   inode = dentry->d_inode;
-   if (inode && !spin_trylock(>i_lock)) {
-relock:
-   if (unlock_on_failure) {
-   spin_unlock(>d_lock);
-   cpu_relax();
-   }
-   return dentry; /* try again with same dentry */
-   }
if (!IS_ROOT(dentry))
parent = dentry->d_parent;
-   if (parent && !spin_trylock(>d_lock)) {
-   if (inode)
-   spin_unlock(>i_lock);
-   goto relock;
-   }

/*
 * The dentry is now unrecoverably dead to the world.
@@ -520,10 +490,41 @@ relock:
can_free = false;
}
spin_unlock(>d_lock);
-out:
if (likely(can_free))
dentry_free(dentry);
+}
+
+/*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *dentry_kill(struct dentry *dentry)
+   __releases(dentry->d_lock)
+{
+   struct inode *inode = dentry->d_inode;
+   struct dentry *parent = NULL;
+
+   if (inode && unlikely(!spin_trylock(>i_lock)))
+   goto failed;
+
+   if (!IS_ROOT(dentry)) {
+   parent = dentry->d_parent;
+   if (unlikely(!spin_trylock(>d_lock))) {
+   if (inode)
+   spin_unlock(>i_lock);
+   goto failed;
+   }
+   }
+
+   __dentry_kill(dentry);
return parent;
+
+failed:
+   spin_unlock(>d_lock);
+   cpu_relax();
+   return dentry; /* try again with same dentry */
 }

 /* 
@@ -579,7 +580,7 @@ repeat:
return;

 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   dentry = dentry_kill(dentry);
if (dentry)
goto repeat;
 }
@@ -797,8 +798,26 @@ static void shrink_dentry_list(struct

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 02:11:51PM -0700, Linus Torvalds wrote:

> > d_walk() covers its arse with ->d_lock (and it wants the starting point
> > to be pinned, obviously).  So AFAICS RCU is not a problem.
> 
> It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets
> ->d_lock, but dentry_kill() drops d_lock in the middle, and the whole
> "dentry_kill() can get called multiple times for the same dentry" just
> makes me go "Hmm". It is *not* obvious what happens the second vs
> third time that gets called. If it can get called two times, why not
> three times?

dentry_kill(dentry, 1) can only be called when ->d_count had been positive
up to the point of caller grabbing ->d_lock.  And it either does
lockref_mark_dead() before dropping ->d_lock or it fails trylock and buggers
off having done nothing, including changes of ->d_count.

In case of dentry_kill(dentry, 0), ->d_lock has been held since the moment
dentry had been found on shrink list.  It either does nothing and dentry gets
returned to the same shrink list without ever dropping ->d_lock (or changing
refcount) or it does lockref_mark_dead() before dropping ->d_lock.  In the
latter case dentry will *not* be put on any shrink lists again.

So we have at most one call of the first kind getting past the trylocks and
at most one call of the second kind doing the same.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 1:14 PM, Al Viro  wrote:
>
> Unless I'm badly misreading your patch, you are calling dentry_kill()
> with rcu_read_lock() held.  And that can trigger all sorts of interesting
> things, starting with iput() and tons of disk IO...

Yes, right you are.

As to my worry:

> d_walk() covers its arse with ->d_lock (and it wants the starting point
> to be pinned, obviously).  So AFAICS RCU is not a problem.

It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets
->d_lock, but dentry_kill() drops d_lock in the middle, and the whole
"dentry_kill() can get called multiple times for the same dentry" just
makes me go "Hmm". It is *not* obvious what happens the second vs
third time that gets called. If it can get called two times, why not
three times? And if it can get called three times, why can't _two_ of
them set "can_free" to true? That is absolutely *not* obvious to me.

The aim of my patch (and I agree that sadly holding the rcu lock is
not viable) was to make that "if it gets called twice" case _much_
more obvious. Because any but the first time will be very obviously a
no-op (apart from taking and releasing the lock, which is ok for a
rcu-delayed data structure). That's very much not the case with
DCACHE_MAY_FREE. The behavior of that things os very much non-obvious,
because different callers into dentry_kill() all do different things.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 01:02:23PM -0700, Linus Torvalds wrote:

> Anyway, one reason I looked at this is that most of the threads in
> Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole
> d_walk() thing, and I had a *really* hard time convincing myself that
> this was all safe without the RCU lock. I'm wondering if Mika perhaps
> has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the
> rename_lock sequence lock) do not end up being RCU-safe points.

d_walk() covers its arse with ->d_lock (and it wants the starting point
to be pinned, obviously).  So AFAICS RCU is not a problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 12:43:24PM -0700, Linus Torvalds wrote:

> It does require that the dentry shrinking code always hold the RCU
> lock for reading, because others may actually be doing the final
> dput() while the thing is on the shrinking list (and holding the RCU
> lock is what protects the entry from actually being free'd).
> 
> NOTE! I don't claim that this fixes anything, but I do think that it
> makes that whole cross-thread complexity of that DCACHE_MAY_FREE go
> away. I think it's easier to understand, and it removes code in the
> process. Comments?

Unless I'm badly misreading your patch, you are calling dentry_kill()
with rcu_read_lock() held.  And that can trigger all sorts of interesting
things, starting with iput() and tons of disk IO...

What am I missing here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 12:43 PM, Linus Torvalds
 wrote:
>  - the shrinker list logic depends on the actual freeing of the dentry
> to be delayed until the RCU grace period (already true for RCU-lookup
> dentries)

Side note: I just unconditionally removed the immediate __d_free()
case, but it could as well have become

-   if (!(dentry->d_flags & DCACHE_RCUACCESS))
+   if (!(dentry->d_flags & (DCACHE_RCUACCESS|DCACHE_SHRINK_LIST))
__d_free(>d_u.d_rcu);
else
call_rcu(>d_u.d_rcu, __d_free);

instead.

Anyway, one reason I looked at this is that most of the threads in
Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole
d_walk() thing, and I had a *really* hard time convincing myself that
this was all safe without the RCU lock. I'm wondering if Mika perhaps
has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the
rename_lock sequence lock) do not end up being RCU-safe points.

And once I started worrying about that, the whole "ok, who calls
dentry_kill() when shrinking races with dput()" question started just
making me worry about that whole DCACHE_MAY_FREE thing.  Because one
of the things that Miklos/Al's series of patches did was to drop some
of the RCU locking, since it wasn't "necessary" any more as far as the
lru list itself was concerned..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 11:39 AM, Al Viro  wrote:
>
> OK, the warnings about averting your eyes very much apply; the thing below
> definitely needs more massage before it becomes acceptable

I've been looking at the this too, and I have to say, I absolutely
hate your DCACHE_MAY_FREE logic. It makes it really hard to follow
what the heck is happening across threads.

So just to understand the code, how about this (UNTESTED!) patch? It
gets rid of the DCACHE_MAY_FREE logic entirely, and makes the rules
imho much more straightforward:

 - whoever calls "dentry_kill()" first is the one that frees things.
 - dentry_kill() does *not* touch the shrink list at all. In fact,
*nothing* touches the shrink list, except for the shrinker.
 - the shrinkers remove entries from their own lists
 - the shrinker list logic depends on the actual freeing of the dentry
to be delayed until the RCU grace period (already true for RCU-lookup
dentries)

In other words, that whole "may-free" thing goes away, the whole
shrink-list locking issues go away, there are no subtle rules. Nobody
else ever touches the shrink-list entries than the entity walking the
shrink-lists. Once DCACHE_SHRINK_LIST is set, nobody else is

It does require that the dentry shrinking code always hold the RCU
lock for reading, because others may actually be doing the final
dput() while the thing is on the shrinking list (and holding the RCU
lock is what protects the entry from actually being free'd).

NOTE! I don't claim that this fixes anything, but I do think that it
makes that whole cross-thread complexity of that DCACHE_MAY_FREE go
away. I think it's easier to understand, and it removes code in the
process. Comments?

 Linus
 fs/dcache.c| 32 ++--
 include/linux/dcache.h |  2 --
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01eefc07..6821479a378e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -248,11 +248,7 @@ static void __d_free(struct rcu_head *head)
 
 static void dentry_free(struct dentry *dentry)
 {
-   /* if dentry was never visible to RCU, immediate free is OK */
-   if (!(dentry->d_flags & DCACHE_RCUACCESS))
-   __d_free(>d_u.d_rcu);
-   else
-   call_rcu(>d_u.d_rcu, __d_free);
+   call_rcu(>d_u.d_rcu, __d_free);
 }
 
 /**
@@ -453,12 +449,11 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
 {
struct inode *inode;
struct dentry *parent = NULL;
-   bool can_free = true;
 
-   if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
-   can_free = dentry->d_flags & DCACHE_MAY_FREE;
+   /* Did somebody else already free it? */
+   if (__lockref_is_dead(>d_lockref)) {
spin_unlock(>d_lock);
-   goto out;
+   return NULL;
}
 
inode = dentry->d_inode;
@@ -514,15 +509,7 @@ relock:
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
 
-   spin_lock(>d_lock);
-   if (dentry->d_flags & DCACHE_SHRINK_LIST) {
-   dentry->d_flags |= DCACHE_MAY_FREE;
-   can_free = false;
-   }
-   spin_unlock(>d_lock);
-out:
-   if (likely(can_free))
-   dentry_free(dentry);
+   dentry_free(dentry);
return parent;
 }
 
@@ -921,9 +908,11 @@ long prune_dcache_sb(struct super_block *sb, unsigned long 
nr_to_scan,
LIST_HEAD(dispose);
long freed;
 
+   rcu_read_lock();
freed = list_lru_walk_node(>s_dentry_lru, nid, dentry_lru_isolate,
   , _to_scan);
shrink_dentry_list();
+   rcu_read_unlock();
return freed;
 }
 
@@ -962,11 +951,13 @@ void shrink_dcache_sb(struct super_block *sb)
do {
LIST_HEAD(dispose);
 
+   rcu_read_lock();
freed = list_lru_walk(>s_dentry_lru,
dentry_lru_isolate_shrink, , UINT_MAX);
 
this_cpu_sub(nr_dentry_unused, freed);
shrink_dentry_list();
+   rcu_read_unlock();
} while (freed > 0);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
@@ -1230,13 +1221,16 @@ void shrink_dcache_parent(struct dentry *parent)
data.start = parent;
data.found = 0;
 
+   rcu_read_lock();
d_walk(parent, , select_collect, NULL);
if (!data.found)
break;
 
shrink_dentry_list();
+   rcu_read_unlock();
cond_resched();
}
+   rcu_read_unlock();
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
@@ -1338,11 +1332,13 @@ int check_submounts_and_drop(struct dentry *dentry)
data.start = dentry;
data.found = 0;
 
+   rcu_read_lock();
d_walk(dentry, , check_and_collect, check_and_drop);
ret = data.found;

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 03:19:37PM +0100, Al Viro wrote:

> OK, it's not ->i_lock, it's ->d_lock on parent being grabbed after that on
> child, while d_walk() keeps taking them in opposite order.  Hmm...
> 
> In principle we could do the following:
>   * split dentry_kill() into the part that is taking locks and
> the rest of it.
>   * in case of trylock failure have shrink_dentry_list() do
> read_seqlock_excl(_lock) (which will stabilize ->d_parent) and
> take ->d_lock in the right order, drop rename_lock and call __dentry_kill().
> 
> AFAICS, that would kill the livelock for good.  We still have ->i_lock
> trylock failures to deal with, but those are less of a problem - d_walk()
> won't step on ->i_lock at all.  I'm going to grab a couple of hours of sleep
> and try to put together something along those lines...

OK, the warnings about averting your eyes very much apply; the thing below
definitely needs more massage before it becomes acceptable (and no, it's
not a single commit; I'm not that insane), but it changes behaviour in the
way described above.  Could you check if the livelock persists with it?
No trace-generating code in there, so the logs should be compact enough...

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..ed0cc62 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_drop);
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
-   __releases(dentry->d_lock)
+static void __dentry_kill(struct dentry *dentry)
 {
-   struct inode *inode;
struct dentry *parent = NULL;
bool can_free = true;
-
-   if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
-   can_free = dentry->d_flags & DCACHE_MAY_FREE;
-   spin_unlock(>d_lock);
-   goto out;
-   }
-
-   inode = dentry->d_inode;
-   if (inode && !spin_trylock(>i_lock)) {
-relock:
-   if (unlock_on_failure) {
-   spin_unlock(>d_lock);
-   cpu_relax();
-   }
-   return dentry; /* try again with same dentry */
-   }
if (!IS_ROOT(dentry))
parent = dentry->d_parent;
-   if (parent && !spin_trylock(>d_lock)) {
-   if (inode)
-   spin_unlock(>i_lock);
-   goto relock;
-   }
 
/*
 * The dentry is now unrecoverably dead to the world.
@@ -520,10 +490,44 @@ relock:
can_free = false;
}
spin_unlock(>d_lock);
-out:
if (likely(can_free))
dentry_free(dentry);
+}
+
+/*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *
+dentry_kill(struct dentry *dentry, int unlock_on_failure)
+   __releases(dentry->d_lock)
+{
+   struct inode *inode = dentry->d_inode;
+   struct dentry *parent = NULL;
+
+   if (inode && unlikely(!spin_trylock(>i_lock)))
+   goto failed;
+
+   if (!IS_ROOT(dentry)) {
+   parent = dentry->d_parent;
+   if (unlikely(!spin_trylock(>d_lock))) {
+   if (inode)
+   spin_unlock(>i_lock);
+   goto failed;
+   }
+   }
+
+   __dentry_kill(dentry);
return parent;
+
+failed:
+   if (unlock_on_failure) {
+   spin_unlock(>d_lock);
+   cpu_relax();
+   }
+   return dentry; /* try again with same dentry */
 }
 
 /* 
@@ -797,6 +801,7 @@ static void shrink_dentry_list(struct list_head *list)
struct dentry *dentry, *parent;
 
while (!list_empty(list)) {
+   struct inode *inode;
dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(>d_lock);
/*
@@ -815,23 +820,52 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
-   /*
-* If dentry_kill returns NULL, we have nothing more to do.
-*/
-   if (!parent)
+
+   if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
+   bool can_free = dentry->d_flags & DCACHE_MAY_FREE;
+   spin_unlock(>d_lock);
+   if (can_free)
+   dentry_free(dentry);
continue;
+   }
 
-   if (unlikely(parent == dentry)) {
-

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 04:11:36PM +0300, Mika Westerberg wrote:

> I added this on top of your previus patch and unfortunately the livelock
> is still there :-(
> 
> >From the previous log, it looks like udev is doing exactly same operation
> (check_submounts_and_drop) on 4 of the CPUs in paraller:

4 threads trying to resolve some pathnames in that subtree for some reason.
All of them hitting ->d_revalidate() on the spot - basically, the first
invalidated directory...

OK, it's not ->i_lock, it's ->d_lock on parent being grabbed after that on
child, while d_walk() keeps taking them in opposite order.  Hmm...

In principle we could do the following:
* split dentry_kill() into the part that is taking locks and
the rest of it.
* in case of trylock failure have shrink_dentry_list() do
read_seqlock_excl(_lock) (which will stabilize ->d_parent) and
take ->d_lock in the right order, drop rename_lock and call __dentry_kill().

AFAICS, that would kill the livelock for good.  We still have ->i_lock
trylock failures to deal with, but those are less of a problem - d_walk()
won't step on ->i_lock at all.  I'm going to grab a couple of hours of sleep
and try to put together something along those lines...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Mika Westerberg

On Wed, May 28, 2014 at 12:57:01PM +0100, Al Viro wrote:
> On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote:
> 
> > I sent you the whole log privately so that I don't spam everyone.
> > 
> > Summary is here:
> > 
> > May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash 
> > Transcend 4GB8.07 PQ: 0 ANSI: 4
> > May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0
> > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical 
> > blocks: (4.02 GB/3.75 GiB)
> > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off
> > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00
> > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, 
> > read cache: enabled, doesn't support DPO or FUA
> > May 28 10:24:23 lahna kernel:  sdc: sdc1
> > May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable 
> > disk
> > 
> > Here I detached the USB stick:
> > 
> > May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6
> > May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; 
> > CPU 4 PID 576 [systemd-udevd]
> > May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 
> > [systemd-udevd]
> > May 28 10:24:32 lahna kernel: 
> > check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 
> > [systemd-udevd]
> > May 28 10:24:32 lahna kernel: 
> > shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 
> > [systemd-udevd]
> > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 
> > 1 PID 1683 [systemd-udevd]
> > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; 
> > CPU 1 PID 1683 [systemd-udevd]
> > May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; 
> > CPU 1 PID 1683 [systemd-udevd]
> > May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; 
> > CPU 4 PID 576 [systemd-udevd]
> > ...
> > May 28 10:24:32 lahna kernel: 
> > check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
> > [systemd-udevd]
> > May 28 10:24:32 lahna kernel: 
> > check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
> > [systemd-udevd]
> > May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 
> > PID 1685 [systemd-udevd]
> > May 28 10:24:32 lahna kernel: A
> ...
> 
> Hmm...  As it is, we have d_walk() trying _very_ hard in those situations.
> Could you add that on top of the previous and see if livelock changes?
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 42ae01e..4ce58d3 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, 
> struct dentry *dentry)
>* ensures forward progress). We'll be coming back to find
>* the rest.
>*/
> - if (!list_empty(>dispose))
> + if (data->found)
>   ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
>  out:
>   return ret;

I added this on top of your previus patch and unfortunately the livelock
is still there :-(

>From the previous log, it looks like udev is doing exactly same operation
(check_submounts_and_drop) on 4 of the CPUs in paraller:

May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 1 PID 1683 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 0 PID 1686 
[systemd-udevd]

I wonder why is that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote:

> I sent you the whole log privately so that I don't spam everyone.
> 
> Summary is here:
> 
> May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash 
> Transcend 4GB8.07 PQ: 0 ANSI: 4
> May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0
> May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical 
> blocks: (4.02 GB/3.75 GiB)
> May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off
> May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00
> May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read 
> cache: enabled, doesn't support DPO or FUA
> May 28 10:24:23 lahna kernel:  sdc: sdc1
> May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk
> 
> Here I detached the USB stick:
> 
> May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6
> May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 
> 4 PID 576 [systemd-udevd]
> May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 
> [systemd-udevd]
> May 28 10:24:32 lahna kernel: 
> check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 
> [systemd-udevd]
> May 28 10:24:32 lahna kernel: 
> shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 
> [systemd-udevd]
> May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 
> PID 1683 [systemd-udevd]
> May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; 
> CPU 1 PID 1683 [systemd-udevd]
> May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 
> 1 PID 1683 [systemd-udevd]
> May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 
> 4 PID 576 [systemd-udevd]
> ...
> May 28 10:24:32 lahna kernel: 
> check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
> [systemd-udevd]
> May 28 10:24:32 lahna kernel: 
> check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
> [systemd-udevd]
> May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
> 1685 [systemd-udevd]
> May 28 10:24:32 lahna kernel: A
...

Hmm...  As it is, we have d_walk() trying _very_ hard in those situations.
Could you add that on top of the previous and see if livelock changes?

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..4ce58d3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, struct 
dentry *dentry)
 * ensures forward progress). We'll be coming back to find
 * the rest.
 */
-   if (!list_empty(>dispose))
+   if (data->found)
ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
 out:
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Mika Westerberg

On Wed, May 28, 2014 at 04:19:55AM +0100, Al Viro wrote:
> On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote:
> > On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote:
> > > On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:
> > > 
> > > > As the matter of fact, let's try this instead - retry the same sucker
> > > > immediately in case if trylocks fail.  Comments?
> > > 
> > > Better yet, let's take "move back to shrink list" into dentry_kill()
> > > itself.  Then we get consistent locking rules for dentry_kill() and
> > > instead of unlock_on_failure we simply pass it NULL or the shrink
> > > list to put the sucker back.  Mika, could you test this one and see
> > > if it fixes that livelock?  The difference in behaviour is that in
> > > case of trylock failure we hit that sucker again without letting
> > > it ride all the way around the list, same as we do for other dentry_kill()
> > > callers.
> > 
> > I tried this patch and unfortunately it still results the same sort of
> > livelock. I've attached the dmesg.
> > 
> > I also tried the serialization patch from Linus and it seemed to fix the
> > problem. After several rounds of USB memory stick plug/unplug I haven't
> > seen a single "soft lockup" warning in dmesg.
> > 
> > I'm able to reproduce the problem pretty easily, so if you have
> > something else to try I'm more than happy to give it a try.
> 
> Could you try this and post the resulting log?  I'd really like to understand
> what's going on there - are we really hitting trylock failures there and what
> dentries are involved.

I sent you the whole log privately so that I don't spam everyone.

Summary is here:

May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash 
Transcend 4GB8.07 PQ: 0 ANSI: 4
May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical 
blocks: (4.02 GB/3.75 GiB)
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read 
cache: enabled, doesn't support DPO or FUA
May 28 10:24:23 lahna kernel:  sdc: sdc1
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk

Here I detached the USB stick:

May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6
May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 4 
PID 576 [systemd-udevd]
May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 
[systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; 
CPU 1 PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 
PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; CPU 
1 PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 1 
PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 4 
PID 576 [systemd-udevd]
...
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
[systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A

This goes on an on with a variable amount of 'A's printed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 12:43:24PM -0700, Linus Torvalds wrote:

 It does require that the dentry shrinking code always hold the RCU
 lock for reading, because others may actually be doing the final
 dput() while the thing is on the shrinking list (and holding the RCU
 lock is what protects the entry from actually being free'd).
 
 NOTE! I don't claim that this fixes anything, but I do think that it
 makes that whole cross-thread complexity of that DCACHE_MAY_FREE go
 away. I think it's easier to understand, and it removes code in the
 process. Comments?

Unless I'm badly misreading your patch, you are calling dentry_kill()
with rcu_read_lock() held.  And that can trigger all sorts of interesting
things, starting with iput() and tons of disk IO...

What am I missing here?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 01:02:23PM -0700, Linus Torvalds wrote:

 Anyway, one reason I looked at this is that most of the threads in
 Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole
 d_walk() thing, and I had a *really* hard time convincing myself that
 this was all safe without the RCU lock. I'm wondering if Mika perhaps
 has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the
 rename_lock sequence lock) do not end up being RCU-safe points.

d_walk() covers its arse with -d_lock (and it wants the starting point
to be pinned, obviously).  So AFAICS RCU is not a problem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 1:14 PM, Al Viro v...@zeniv.linux.org.uk wrote:

 Unless I'm badly misreading your patch, you are calling dentry_kill()
 with rcu_read_lock() held.  And that can trigger all sorts of interesting
 things, starting with iput() and tons of disk IO...

Yes, right you are.

As to my worry:

 d_walk() covers its arse with -d_lock (and it wants the starting point
 to be pinned, obviously).  So AFAICS RCU is not a problem.

It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets
-d_lock, but dentry_kill() drops d_lock in the middle, and the whole
dentry_kill() can get called multiple times for the same dentry just
makes me go Hmm. It is *not* obvious what happens the second vs
third time that gets called. If it can get called two times, why not
three times? And if it can get called three times, why can't _two_ of
them set can_free to true? That is absolutely *not* obvious to me.

The aim of my patch (and I agree that sadly holding the rcu lock is
not viable) was to make that if it gets called twice case _much_
more obvious. Because any but the first time will be very obviously a
no-op (apart from taking and releasing the lock, which is ok for a
rcu-delayed data structure). That's very much not the case with
DCACHE_MAY_FREE. The behavior of that things os very much non-obvious,
because different callers into dentry_kill() all do different things.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 02:11:51PM -0700, Linus Torvalds wrote:

  d_walk() covers its arse with -d_lock (and it wants the starting point
  to be pinned, obviously).  So AFAICS RCU is not a problem.
 
 It's not RCU itself, it's that DCACHE_MAY_FREE bit. Yes, d_walk() gets
 -d_lock, but dentry_kill() drops d_lock in the middle, and the whole
 dentry_kill() can get called multiple times for the same dentry just
 makes me go Hmm. It is *not* obvious what happens the second vs
 third time that gets called. If it can get called two times, why not
 three times?

dentry_kill(dentry, 1) can only be called when -d_count had been positive
up to the point of caller grabbing -d_lock.  And it either does
lockref_mark_dead() before dropping -d_lock or it fails trylock and buggers
off having done nothing, including changes of -d_count.

In case of dentry_kill(dentry, 0), -d_lock has been held since the moment
dentry had been found on shrink list.  It either does nothing and dentry gets
returned to the same shrink list without ever dropping -d_lock (or changing
refcount) or it does lockref_mark_dead() before dropping -d_lock.  In the
latter case dentry will *not* be put on any shrink lists again.

So we have at most one call of the first kind getting past the trylocks and
at most one call of the second kind doing the same.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:

 OK, the warnings about averting your eyes very much apply; the thing below
 definitely needs more massage before it becomes acceptable (and no, it's
 not a single commit; I'm not that insane), but it changes behaviour in the
 way described above.  Could you check if the livelock persists with it?
 No trace-generating code in there, so the logs should be compact enough...

Here's an updated patch, hopefully slightly less vomit-inducing.  Should
give the same behaviour as the previous one...  Again, it's a cumulative
diff - I'm still massaging the splitup here.

BTW, there's an interesting potential gotcha for -d_delete() instances -
it *is* possible to have it called several times (returning true every
time it's called) for the same dentry.  Suppose dput() is called and
races with d_find_alias().  We get the following sequence:
CPU1:   dput sees d_count about to reach 0
CPU1:   dput grabs -d_lock
CPU1:   dput calls -d_delete() and gets told that it shouldn't survive
CPU2:   d_find_alias() grabs -i_lock
CPU1:   dput calls dentry_kill()
CPU1:   dentry_kill() tries to grab -i_lock and fails
CPU2:   d_find_alias() calls __d_find_alias()
CPU1:   dentry_kill() drops -d_lock and returns the same dentry
CPU2:   __d_find_alias() finds our dentry, grabs -d_lock and bumps -d_count
CPU1:   dput() finds -d_count greater than 1, decrements it and returns
CPU2:   eventually calls dput(), leading to another call of -d_delete().
Moral: don't assume that -d_delete() returning 1 means that dentry won't
survive; it's not a good place to destroy any data structures, etc.
Fortunately, we have only 11 instances in the tree and only one has code
making that assumption, the code in question being ifdefed out (lustre ;-/)
Out-of-tree folks might easily step into that trap, so it's probably worth
documenting - the comment in ll_ddelete() seems to indicate that its authors
had only noticed the problem with locking environment, not that it is the
wrong place for such code in the first place...

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..6c2a92e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_drop);
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry-d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
-   __releases(dentry-d_lock)
+static void __dentry_kill(struct dentry *dentry)
 {
-   struct inode *inode;
struct dentry *parent = NULL;
bool can_free = true;
-
-   if (unlikely(dentry-d_flags  DCACHE_DENTRY_KILLED)) {
-   can_free = dentry-d_flags  DCACHE_MAY_FREE;
-   spin_unlock(dentry-d_lock);
-   goto out;
-   }
-
-   inode = dentry-d_inode;
-   if (inode  !spin_trylock(inode-i_lock)) {
-relock:
-   if (unlock_on_failure) {
-   spin_unlock(dentry-d_lock);
-   cpu_relax();
-   }
-   return dentry; /* try again with same dentry */
-   }
if (!IS_ROOT(dentry))
parent = dentry-d_parent;
-   if (parent  !spin_trylock(parent-d_lock)) {
-   if (inode)
-   spin_unlock(inode-i_lock);
-   goto relock;
-   }
 
/*
 * The dentry is now unrecoverably dead to the world.
@@ -520,10 +490,41 @@ relock:
can_free = false;
}
spin_unlock(dentry-d_lock);
-out:
if (likely(can_free))
dentry_free(dentry);
+}
+
+/*
+ * Finish off a dentry we've decided to kill.
+ * dentry-d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *dentry_kill(struct dentry *dentry)
+   __releases(dentry-d_lock)
+{
+   struct inode *inode = dentry-d_inode;
+   struct dentry *parent = NULL;
+
+   if (inode  unlikely(!spin_trylock(inode-i_lock)))
+   goto failed;
+
+   if (!IS_ROOT(dentry)) {
+   parent = dentry-d_parent;
+   if (unlikely(!spin_trylock(parent-d_lock))) {
+   if (inode)
+   spin_unlock(inode-i_lock);
+   goto failed;
+   }
+   }
+
+   __dentry_kill(dentry);
return parent;
+
+failed:
+   spin_unlock(dentry-d_lock);
+   cpu_relax();
+   return dentry; /* try again with same dentry */
 }
 
 /* 
@@ -579,7 +580,7 @@ repeat:
return;
 
 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   dentry = dentry_kill(dentry);
if (dentry)
goto repeat;
 }
@@ -797,8 +798,26 @@ static void

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote:
 On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:
 
  OK, the warnings about averting your eyes very much apply; the thing below
  definitely needs more massage before it becomes acceptable (and no, it's
  not a single commit; I'm not that insane), but it changes behaviour in the
  way described above.  Could you check if the livelock persists with it?
  No trace-generating code in there, so the logs should be compact enough...
 
 Here's an updated patch, hopefully slightly less vomit-inducing.  Should
 give the same behaviour as the previous one...  Again, it's a cumulative
 diff - I'm still massaging the splitup here.

BTW, it still leaves the proceed to parent case in shrink_dentry_list();
in theory, it's also vulnerable to the same livelock.  Can be dealt pretty
much the same way; I'd rather leave that one for right after -final, though,
if the already posted variant turns out to be sufficient...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 8:11 PM, Al Viro v...@zeniv.linux.org.uk wrote:

 Here's an updated patch, hopefully slightly less vomit-inducing.

Hmm. Less vomit-inducing, except for this part:

 dentry = list_entry(list-prev, struct dentry, d_lru);
 +
 +   parent = NULL;
 spin_lock(dentry-d_lock);
 +   if (!IS_ROOT(dentry)) {
 +   parent = dentry-d_parent;
 +   if (unlikely(!spin_trylock(parent-d_lock))) {
 +   spin_unlock(dentry-d_lock);
 +   parent = NULL;
 +   read_seqlock_excl(rename_lock);
 +   if (!IS_ROOT(dentry)) {
 +   parent = dentry-d_parent;
 +   spin_lock(parent-d_lock);
 +   }
 +   read_sequnlock_excl(rename_lock);
 +   spin_lock(dentry-d_lock);
 +   }
 +   }

Ugh, that just *screams* for a helper function. Something like

parent = get_parent_and_lock(dentry);

or whatever, with that trylock/renamelock dance separated out. The
rule would be that it would lock the dentry, and return the
(similarly locked) parent. Or NULL for a root dentry, of course.

Please?

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote:
 On Wed, May 28, 2014 at 8:11 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  Here's an updated patch, hopefully slightly less vomit-inducing.
 
 Hmm. Less vomit-inducing, except for this part:

 Ugh, that just *screams* for a helper function. Something like
 
 parent = get_parent_and_lock(dentry);
 
 or whatever, with that trylock/renamelock dance separated out. The
 rule would be that it would lock the dentry, and return the
 (similarly locked) parent. Or NULL for a root dentry, of course.
 
 Please?

Already done in my current tree; see below for the next commit in there...
I can separate that helper from the rest (dealing with the same livelock
for dentry_kill(dentry, 1) caller in shrink_dentry_list()).  All of that
is very much subject to reordering, resplitting, writing more decent commit
messages, etc.

commit 67e2554f6ca62e20363139f6f2968063410f0f3b
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Wed May 28 23:53:36 2014 -0400

deal with put the parent side of shrink_dentry_list()

Signed-off-by: Al Viro v...@zeniv.linux.org.uk

diff --git a/fs/dcache.c b/fs/dcache.c
index 6c2a92e..7434169 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -793,30 +793,31 @@ restart:
 }
 EXPORT_SYMBOL(d_prune_aliases);
 
-static void shrink_dentry_list(struct list_head *list)
+static inline struct dentry *lock_parent(struct dentry *dentry)
 {
-   struct dentry *dentry, *parent;
-
-   while (!list_empty(list)) {
-   struct inode *inode;
-   dentry = list_entry(list-prev, struct dentry, d_lru);
-
+   struct dentry *parent = dentry-d_parent;
+   if (IS_ROOT(dentry))
+   return NULL;
+   if (unlikely(!spin_trylock(parent-d_lock))) {
+   spin_unlock(dentry-d_lock);
+   read_seqlock_excl(rename_lock);
parent = NULL;
-   spin_lock(dentry-d_lock);
if (!IS_ROOT(dentry)) {
parent = dentry-d_parent;
-   if (unlikely(!spin_trylock(parent-d_lock))) {
-   spin_unlock(dentry-d_lock);
-   parent = NULL;
-   read_seqlock_excl(rename_lock);
-   if (!IS_ROOT(dentry)) {
-   parent = dentry-d_parent;
-   spin_lock(parent-d_lock);
-   }
-   read_sequnlock_excl(rename_lock);
-   spin_lock(dentry-d_lock);
-   }
+   spin_lock(parent-d_lock);
}
+   read_sequnlock_excl(rename_lock);
+   spin_lock(dentry-d_lock);
+   }
+   return parent;
+}
+
+static void shrink_dentry_list(struct list_head *list)
+{
+   while (!list_empty(list)) {
+   struct inode *inode;
+   struct dentry *dentry = list_entry(list-prev, struct dentry, 
d_lru);
+   struct dentry *parent = lock_parent(dentry);
 
/*
 * The dispose list is isolated and dentries are not accounted
@@ -864,8 +865,26 @@ static void shrink_dentry_list(struct list_head *list)
 * fragmentation.
 */
dentry = parent;
-   while (dentry  !lockref_put_or_lock(dentry-d_lockref))
-   dentry = dentry_kill(dentry);
+   while (dentry  !lockref_put_or_lock(dentry-d_lockref)) {
+   parent = lock_parent(dentry);
+   if (dentry-d_lockref.count != 1) {
+   dentry-d_lockref.count--;
+   spin_unlock(dentry-d_lock);
+   if (parent)
+   spin_unlock(parent-d_lock);
+   break;
+   }
+   inode = dentry-d_inode;/* can't be NULL */
+   if (unlikely(!spin_trylock(inode-i_lock))) {
+   spin_unlock(dentry-d_lock);
+   if (parent)
+   spin_unlock(parent-d_lock);
+   cpu_relax();
+   continue;
+   }
+   __dentry_kill(dentry);
+   dentry = parent;
+   }
}
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Thu, May 29, 2014 at 06:16:47AM +0100, Al Viro wrote:
 On Wed, May 28, 2014 at 09:21:00PM -0700, Linus Torvalds wrote:
  On Wed, May 28, 2014 at 8:11 PM, Al Viro v...@zeniv.linux.org.uk wrote:
  
   Here's an updated patch, hopefully slightly less vomit-inducing.
  
  Hmm. Less vomit-inducing, except for this part:
 
  Ugh, that just *screams* for a helper function. Something like
  
  parent = get_parent_and_lock(dentry);
  
  or whatever, with that trylock/renamelock dance separated out. The
  rule would be that it would lock the dentry, and return the
  (similarly locked) parent. Or NULL for a root dentry, of course.
  
  Please?
 
 Already done in my current tree; see below for the next commit in there...
 I can separate that helper from the rest (dealing with the same livelock
 for dentry_kill(dentry, 1) caller in shrink_dentry_list()).  All of that
 is very much subject to reordering, resplitting, writing more decent commit
 messages, etc.

Except that it's from the wrong branch ;-/  Correct one follows:

commit fb860957449c07c6f1a1dd911e83a635b6f11f21
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Thu May 29 00:37:49 2014 -0400

deal with put the parent side of shrink_dentry_list()

diff --git a/fs/dcache.c b/fs/dcache.c
index 6c2a92e..628a791 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -793,30 +793,34 @@ restart:
 }
 EXPORT_SYMBOL(d_prune_aliases);
 
-static void shrink_dentry_list(struct list_head *list)
+static inline struct dentry *lock_parent(struct dentry *dentry)
 {
-   struct dentry *dentry, *parent;
+   struct dentry *parent = dentry-d_parent;
+   if (IS_ROOT(dentry))
+   return NULL;
+   if (unlikely(!spin_trylock(parent-d_lock))) {
+   spin_unlock(dentry-d_lock);
+   read_seqlock_excl(rename_lock);
+   parent = NULL;
+   if (!IS_ROOT(dentry)) {
+   parent = dentry-d_parent;
+   spin_lock(parent-d_lock);
+   }
+   read_sequnlock_excl(rename_lock);
+   spin_lock(dentry-d_lock);
+   }
+   return parent;
+}
 
+static void shrink_dentry_list(struct list_head *list)
+{
while (!list_empty(list)) {
struct inode *inode;
-   dentry = list_entry(list-prev, struct dentry, d_lru);
+   struct dentry *dentry = list_entry(list-prev, struct dentry, 
d_lru);
+   struct dentry *parent;
 
-   parent = NULL;
spin_lock(dentry-d_lock);
-   if (!IS_ROOT(dentry)) {
-   parent = dentry-d_parent;
-   if (unlikely(!spin_trylock(parent-d_lock))) {
-   spin_unlock(dentry-d_lock);
-   parent = NULL;
-   read_seqlock_excl(rename_lock);
-   if (!IS_ROOT(dentry)) {
-   parent = dentry-d_parent;
-   spin_lock(parent-d_lock);
-   }
-   read_sequnlock_excl(rename_lock);
-   spin_lock(dentry-d_lock);
-   }
-   }
+   parent = lock_parent(dentry);
 
/*
 * The dispose list is isolated and dentries are not accounted
@@ -864,8 +868,26 @@ static void shrink_dentry_list(struct list_head *list)
 * fragmentation.
 */
dentry = parent;
-   while (dentry  !lockref_put_or_lock(dentry-d_lockref))
-   dentry = dentry_kill(dentry);
+   while (dentry  !lockref_put_or_lock(dentry-d_lockref)) {
+   parent = lock_parent(dentry);
+   if (dentry-d_lockref.count != 1) {
+   dentry-d_lockref.count--;
+   spin_unlock(dentry-d_lock);
+   if (parent)
+   spin_unlock(parent-d_lock);
+   break;
+   }
+   inode = dentry-d_inode;/* can't be NULL */
+   if (unlikely(!spin_trylock(inode-i_lock))) {
+   spin_unlock(dentry-d_lock);
+   if (parent)
+   spin_unlock(parent-d_lock);
+   cpu_relax();
+   continue;
+   }
+   __dentry_kill(dentry);
+   dentry = parent;
+   }
}
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Thu, May 29, 2014 at 04:52:33AM +0100, Al Viro wrote:
 On Thu, May 29, 2014 at 04:11:49AM +0100, Al Viro wrote:
  On Wed, May 28, 2014 at 07:39:54PM +0100, Al Viro wrote:
  
   OK, the warnings about averting your eyes very much apply; the thing below
   definitely needs more massage before it becomes acceptable (and no, it's
   not a single commit; I'm not that insane), but it changes behaviour in the
   way described above.  Could you check if the livelock persists with it?
   No trace-generating code in there, so the logs should be compact enough...
  
  Here's an updated patch, hopefully slightly less vomit-inducing.  Should
  give the same behaviour as the previous one...  Again, it's a cumulative
  diff - I'm still massaging the splitup here.
 
 BTW, it still leaves the proceed to parent case in shrink_dentry_list();
 in theory, it's also vulnerable to the same livelock.  Can be dealt pretty
 much the same way; I'd rather leave that one for right after -final, though,
 if the already posted variant turns out to be sufficient...

... which is (presumably) dealt with the incremental I'd just sent to Linus;
seeing what kind of dumb mistakes I'm making, I'd better call it quits for
tonight - it's 1:30am here and I didn't have anywhere near enough sleep
yesterday.  I'd appeciate if you could test the patch immediately
upthread (from Message-ID: 20140529031149.ge18...@zeniv.linux.org.uk)
and see if it helps.  There's an incremental on top of it (from
Message-ID: 20140529052621.gh18...@zeniv.linux.org.uk) that might or
might not be a good idea.

I'm crawling to bed right now; back in ~7 hours...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Mika Westerberg

On Wed, May 28, 2014 at 04:19:55AM +0100, Al Viro wrote:
 On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote:
  On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote:
   On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:
   
As the matter of fact, let's try this instead - retry the same sucker
immediately in case if trylocks fail.  Comments?
   
   Better yet, let's take move back to shrink list into dentry_kill()
   itself.  Then we get consistent locking rules for dentry_kill() and
   instead of unlock_on_failure we simply pass it NULL or the shrink
   list to put the sucker back.  Mika, could you test this one and see
   if it fixes that livelock?  The difference in behaviour is that in
   case of trylock failure we hit that sucker again without letting
   it ride all the way around the list, same as we do for other dentry_kill()
   callers.
  
  I tried this patch and unfortunately it still results the same sort of
  livelock. I've attached the dmesg.
  
  I also tried the serialization patch from Linus and it seemed to fix the
  problem. After several rounds of USB memory stick plug/unplug I haven't
  seen a single soft lockup warning in dmesg.
  
  I'm able to reproduce the problem pretty easily, so if you have
  something else to try I'm more than happy to give it a try.
 
 Could you try this and post the resulting log?  I'd really like to understand
 what's going on there - are we really hitting trylock failures there and what
 dentries are involved.

I sent you the whole log privately so that I don't spam everyone.

Summary is here:

May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash 
Transcend 4GB8.07 PQ: 0 ANSI: 4
May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical 
blocks: (4.02 GB/3.75 GiB)
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read 
cache: enabled, doesn't support DPO or FUA
May 28 10:24:23 lahna kernel:  sdc: sdc1
May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk

Here I detached the USB stick:

May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6
May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 4 
PID 576 [systemd-udevd]
May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 
[systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; 
CPU 1 PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 
PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; CPU 
1 PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 1 
PID 1683 [systemd-udevd]
May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 4 
PID 576 [systemd-udevd]
...
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
[systemd-udevd]
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A
May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
1685 [systemd-udevd]
May 28 10:24:32 lahna kernel: A

This goes on an on with a variable amount of 'A's printed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote:

 I sent you the whole log privately so that I don't spam everyone.
 
 Summary is here:
 
 May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash 
 Transcend 4GB8.07 PQ: 0 ANSI: 4
 May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0
 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical 
 blocks: (4.02 GB/3.75 GiB)
 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off
 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00
 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, read 
 cache: enabled, doesn't support DPO or FUA
 May 28 10:24:23 lahna kernel:  sdc: sdc1
 May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable disk
 
 Here I detached the USB stick:
 
 May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6
 May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; CPU 
 4 PID 576 [systemd-udevd]
 May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 
 [systemd-udevd]
 May 28 10:24:32 lahna kernel: 
 check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 
 [systemd-udevd]
 May 28 10:24:32 lahna kernel: 
 shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 
 [systemd-udevd]
 May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 1 
 PID 1683 [systemd-udevd]
 May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; 
 CPU 1 PID 1683 [systemd-udevd]
 May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; CPU 
 1 PID 1683 [systemd-udevd]
 May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; CPU 
 4 PID 576 [systemd-udevd]
 ...
 May 28 10:24:32 lahna kernel: 
 check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
 [systemd-udevd]
 May 28 10:24:32 lahna kernel: 
 check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
 [systemd-udevd]
 May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 PID 
 1685 [systemd-udevd]
 May 28 10:24:32 lahna kernel: A
...

Hmm...  As it is, we have d_walk() trying _very_ hard in those situations.
Could you add that on top of the previous and see if livelock changes?

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..4ce58d3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, struct 
dentry *dentry)
 * ensures forward progress). We'll be coming back to find
 * the rest.
 */
-   if (!list_empty(data-dispose))
+   if (data-found)
ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
 out:
return ret;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Mika Westerberg

On Wed, May 28, 2014 at 12:57:01PM +0100, Al Viro wrote:
 On Wed, May 28, 2014 at 10:37:51AM +0300, Mika Westerberg wrote:
 
  I sent you the whole log privately so that I don't spam everyone.
  
  Summary is here:
  
  May 28 10:24:23 lahna kernel: scsi 14:0:0:0: Direct-Access JetFlash 
  Transcend 4GB8.07 PQ: 0 ANSI: 4
  May 28 10:24:23 lahna kernel: sd 14:0:0:0: Attached scsi generic sg4 type 0
  May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] 7864320 512-byte logical 
  blocks: (4.02 GB/3.75 GiB)
  May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write Protect is off
  May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Mode Sense: 23 00 00 00
  May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Write cache: disabled, 
  read cache: enabled, doesn't support DPO or FUA
  May 28 10:24:23 lahna kernel:  sdc: sdc1
  May 28 10:24:23 lahna kernel: sd 14:0:0:0: [sdc] Attached SCSI removable 
  disk
  
  Here I detached the USB stick:
  
  May 28 10:24:32 lahna kernel: usb 3-10.4: USB disconnect, device number 6
  May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:33]; 
  CPU 4 PID 576 [systemd-udevd]
  May 28 10:24:32 lahna kernel: dput[/dev/block/8:33]; CPU 4 PID 576 
  [systemd-udevd]
  May 28 10:24:32 lahna kernel: 
  check_submounts_and_drop[3-10/3-10.4/3-10.4:1.0/host14]; CPU 1 PID 1683 
  [systemd-udevd]
  May 28 10:24:32 lahna kernel: 
  shrink[3-10.4:1.0/host14/target14:0:0/subsystem]; CPU 1 PID 1683 
  [systemd-udevd]
  May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/rev]; CPU 
  1 PID 1683 [systemd-udevd]
  May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/vendor]; 
  CPU 1 PID 1683 [systemd-udevd]
  May 28 10:24:32 lahna kernel: shrink[host14/target14:0:0/14:0:0:0/model]; 
  CPU 1 PID 1683 [systemd-udevd]
  May 28 10:24:32 lahna kernel: check_submounts_and_drop[/dev/block/8:32]; 
  CPU 4 PID 576 [systemd-udevd]
  ...
  May 28 10:24:32 lahna kernel: 
  check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
  [systemd-udevd]
  May 28 10:24:32 lahna kernel: 
  check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
  [systemd-udevd]
  May 28 10:24:32 lahna kernel: shrink[usb3/3-10/3-10.4/ltm_capable]; CPU 7 
  PID 1685 [systemd-udevd]
  May 28 10:24:32 lahna kernel: A
 ...
 
 Hmm...  As it is, we have d_walk() trying _very_ hard in those situations.
 Could you add that on top of the previous and see if livelock changes?
 
 diff --git a/fs/dcache.c b/fs/dcache.c
 index 42ae01e..4ce58d3 100644
 --- a/fs/dcache.c
 +++ b/fs/dcache.c
 @@ -1209,7 +1209,7 @@ static enum d_walk_ret select_collect(void *_data, 
 struct dentry *dentry)
* ensures forward progress). We'll be coming back to find
* the rest.
*/
 - if (!list_empty(data-dispose))
 + if (data-found)
   ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
  out:
   return ret;

I added this on top of your previus patch and unfortunately the livelock
is still there :-(

From the previous log, it looks like udev is doing exactly same operation
(check_submounts_and_drop) on 4 of the CPUs in paraller:

May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 6 PID 1684 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 7 PID 1685 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 1 PID 1683 
[systemd-udevd]
May 28 10:24:32 lahna kernel: 
check_submounts_and_drop[:00:14.0/usb3/3-10/3-10.4]; CPU 0 PID 1686 
[systemd-udevd]

I wonder why is that.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 04:11:36PM +0300, Mika Westerberg wrote:

 I added this on top of your previus patch and unfortunately the livelock
 is still there :-(
 
 From the previous log, it looks like udev is doing exactly same operation
 (check_submounts_and_drop) on 4 of the CPUs in paraller:

4 threads trying to resolve some pathnames in that subtree for some reason.
All of them hitting -d_revalidate() on the spot - basically, the first
invalidated directory...

OK, it's not -i_lock, it's -d_lock on parent being grabbed after that on
child, while d_walk() keeps taking them in opposite order.  Hmm...

In principle we could do the following:
* split dentry_kill() into the part that is taking locks and
the rest of it.
* in case of trylock failure have shrink_dentry_list() do
read_seqlock_excl(rename_lock) (which will stabilize -d_parent) and
take -d_lock in the right order, drop rename_lock and call __dentry_kill().

AFAICS, that would kill the livelock for good.  We still have -i_lock
trylock failures to deal with, but those are less of a problem - d_walk()
won't step on -i_lock at all.  I'm going to grab a couple of hours of sleep
and try to put together something along those lines...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Al Viro

On Wed, May 28, 2014 at 03:19:37PM +0100, Al Viro wrote:

 OK, it's not -i_lock, it's -d_lock on parent being grabbed after that on
 child, while d_walk() keeps taking them in opposite order.  Hmm...
 
 In principle we could do the following:
   * split dentry_kill() into the part that is taking locks and
 the rest of it.
   * in case of trylock failure have shrink_dentry_list() do
 read_seqlock_excl(rename_lock) (which will stabilize -d_parent) and
 take -d_lock in the right order, drop rename_lock and call __dentry_kill().
 
 AFAICS, that would kill the livelock for good.  We still have -i_lock
 trylock failures to deal with, but those are less of a problem - d_walk()
 won't step on -i_lock at all.  I'm going to grab a couple of hours of sleep
 and try to put together something along those lines...

OK, the warnings about averting your eyes very much apply; the thing below
definitely needs more massage before it becomes acceptable (and no, it's
not a single commit; I'm not that insane), but it changes behaviour in the
way described above.  Could you check if the livelock persists with it?
No trace-generating code in there, so the logs should be compact enough...

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..ed0cc62 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -441,42 +441,12 @@ void d_drop(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_drop);
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry-d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
-   __releases(dentry-d_lock)
+static void __dentry_kill(struct dentry *dentry)
 {
-   struct inode *inode;
struct dentry *parent = NULL;
bool can_free = true;
-
-   if (unlikely(dentry-d_flags  DCACHE_DENTRY_KILLED)) {
-   can_free = dentry-d_flags  DCACHE_MAY_FREE;
-   spin_unlock(dentry-d_lock);
-   goto out;
-   }
-
-   inode = dentry-d_inode;
-   if (inode  !spin_trylock(inode-i_lock)) {
-relock:
-   if (unlock_on_failure) {
-   spin_unlock(dentry-d_lock);
-   cpu_relax();
-   }
-   return dentry; /* try again with same dentry */
-   }
if (!IS_ROOT(dentry))
parent = dentry-d_parent;
-   if (parent  !spin_trylock(parent-d_lock)) {
-   if (inode)
-   spin_unlock(inode-i_lock);
-   goto relock;
-   }
 
/*
 * The dentry is now unrecoverably dead to the world.
@@ -520,10 +490,44 @@ relock:
can_free = false;
}
spin_unlock(dentry-d_lock);
-out:
if (likely(can_free))
dentry_free(dentry);
+}
+
+/*
+ * Finish off a dentry we've decided to kill.
+ * dentry-d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *
+dentry_kill(struct dentry *dentry, int unlock_on_failure)
+   __releases(dentry-d_lock)
+{
+   struct inode *inode = dentry-d_inode;
+   struct dentry *parent = NULL;
+
+   if (inode  unlikely(!spin_trylock(inode-i_lock)))
+   goto failed;
+
+   if (!IS_ROOT(dentry)) {
+   parent = dentry-d_parent;
+   if (unlikely(!spin_trylock(parent-d_lock))) {
+   if (inode)
+   spin_unlock(inode-i_lock);
+   goto failed;
+   }
+   }
+
+   __dentry_kill(dentry);
return parent;
+
+failed:
+   if (unlock_on_failure) {
+   spin_unlock(dentry-d_lock);
+   cpu_relax();
+   }
+   return dentry; /* try again with same dentry */
 }
 
 /* 
@@ -797,6 +801,7 @@ static void shrink_dentry_list(struct list_head *list)
struct dentry *dentry, *parent;
 
while (!list_empty(list)) {
+   struct inode *inode;
dentry = list_entry(list-prev, struct dentry, d_lru);
spin_lock(dentry-d_lock);
/*
@@ -815,23 +820,52 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
-   /*
-* If dentry_kill returns NULL, we have nothing more to do.
-*/
-   if (!parent)
+
+   if (unlikely(dentry-d_flags  DCACHE_DENTRY_KILLED)) {
+   bool can_free = dentry-d_flags  DCACHE_MAY_FREE;
+   spin_unlock(dentry-d_lock);
+   if (can_free)
+   dentry_free(dentry);
continue;
+   }
 
-   if

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 11:39 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 OK, the warnings about averting your eyes very much apply; the thing below
 definitely needs more massage before it becomes acceptable

I've been looking at the this too, and I have to say, I absolutely
hate your DCACHE_MAY_FREE logic. It makes it really hard to follow
what the heck is happening across threads.

So just to understand the code, how about this (UNTESTED!) patch? It
gets rid of the DCACHE_MAY_FREE logic entirely, and makes the rules
imho much more straightforward:

 - whoever calls dentry_kill() first is the one that frees things.
 - dentry_kill() does *not* touch the shrink list at all. In fact,
*nothing* touches the shrink list, except for the shrinker.
 - the shrinkers remove entries from their own lists
 - the shrinker list logic depends on the actual freeing of the dentry
to be delayed until the RCU grace period (already true for RCU-lookup
dentries)

In other words, that whole may-free thing goes away, the whole
shrink-list locking issues go away, there are no subtle rules. Nobody
else ever touches the shrink-list entries than the entity walking the
shrink-lists. Once DCACHE_SHRINK_LIST is set, nobody else is

It does require that the dentry shrinking code always hold the RCU
lock for reading, because others may actually be doing the final
dput() while the thing is on the shrinking list (and holding the RCU
lock is what protects the entry from actually being free'd).

NOTE! I don't claim that this fixes anything, but I do think that it
makes that whole cross-thread complexity of that DCACHE_MAY_FREE go
away. I think it's easier to understand, and it removes code in the
process. Comments?

 Linus
 fs/dcache.c| 32 ++--
 include/linux/dcache.h |  2 --
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01eefc07..6821479a378e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -248,11 +248,7 @@ static void __d_free(struct rcu_head *head)
 
 static void dentry_free(struct dentry *dentry)
 {
-   /* if dentry was never visible to RCU, immediate free is OK */
-   if (!(dentry-d_flags  DCACHE_RCUACCESS))
-   __d_free(dentry-d_u.d_rcu);
-   else
-   call_rcu(dentry-d_u.d_rcu, __d_free);
+   call_rcu(dentry-d_u.d_rcu, __d_free);
 }
 
 /**
@@ -453,12 +449,11 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
 {
struct inode *inode;
struct dentry *parent = NULL;
-   bool can_free = true;
 
-   if (unlikely(dentry-d_flags  DCACHE_DENTRY_KILLED)) {
-   can_free = dentry-d_flags  DCACHE_MAY_FREE;
+   /* Did somebody else already free it? */
+   if (__lockref_is_dead(dentry-d_lockref)) {
spin_unlock(dentry-d_lock);
-   goto out;
+   return NULL;
}
 
inode = dentry-d_inode;
@@ -514,15 +509,7 @@ relock:
if (dentry-d_op  dentry-d_op-d_release)
dentry-d_op-d_release(dentry);
 
-   spin_lock(dentry-d_lock);
-   if (dentry-d_flags  DCACHE_SHRINK_LIST) {
-   dentry-d_flags |= DCACHE_MAY_FREE;
-   can_free = false;
-   }
-   spin_unlock(dentry-d_lock);
-out:
-   if (likely(can_free))
-   dentry_free(dentry);
+   dentry_free(dentry);
return parent;
 }
 
@@ -921,9 +908,11 @@ long prune_dcache_sb(struct super_block *sb, unsigned long 
nr_to_scan,
LIST_HEAD(dispose);
long freed;
 
+   rcu_read_lock();
freed = list_lru_walk_node(sb-s_dentry_lru, nid, dentry_lru_isolate,
   dispose, nr_to_scan);
shrink_dentry_list(dispose);
+   rcu_read_unlock();
return freed;
 }
 
@@ -962,11 +951,13 @@ void shrink_dcache_sb(struct super_block *sb)
do {
LIST_HEAD(dispose);
 
+   rcu_read_lock();
freed = list_lru_walk(sb-s_dentry_lru,
dentry_lru_isolate_shrink, dispose, UINT_MAX);
 
this_cpu_sub(nr_dentry_unused, freed);
shrink_dentry_list(dispose);
+   rcu_read_unlock();
} while (freed  0);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
@@ -1230,13 +1221,16 @@ void shrink_dcache_parent(struct dentry *parent)
data.start = parent;
data.found = 0;
 
+   rcu_read_lock();
d_walk(parent, data, select_collect, NULL);
if (!data.found)
break;
 
shrink_dentry_list(data.dispose);
+   rcu_read_unlock();
cond_resched();
}
+   rcu_read_unlock();
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
@@ -1338,11 +1332,13 @@ int check_submounts_and_drop(struct dentry *dentry)
data.start = dentry;
data.found = 0;
 
+   rcu_read_lock();

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-28 Thread Linus Torvalds

On Wed, May 28, 2014 at 12:43 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
  - the shrinker list logic depends on the actual freeing of the dentry
 to be delayed until the RCU grace period (already true for RCU-lookup
 dentries)

Side note: I just unconditionally removed the immediate __d_free()
case, but it could as well have become

-   if (!(dentry-d_flags  DCACHE_RCUACCESS))
+   if (!(dentry-d_flags  (DCACHE_RCUACCESS|DCACHE_SHRINK_LIST))
__d_free(dentry-d_u.d_rcu);
else
call_rcu(dentry-d_u.d_rcu, __d_free);

instead.

Anyway, one reason I looked at this is that most of the threads in
Mika's NMI watchdog traces were stuck on _raw_spin_lock() in the whole
d_walk() thing, and I had a *really* hard time convincing myself that
this was all safe without the RCU lock. I'm wondering if Mika perhaps
has CONFIG_PREEMPT_RCU set, which means that spinlocks (or the
rename_lock sequence lock) do not end up being RCU-safe points.

And once I started worrying about that, the whole ok, who calls
dentry_kill() when shrinking races with dput() question started just
making me worry about that whole DCACHE_MAY_FREE thing.  Because one
of the things that Miklos/Al's series of patches did was to drop some
of the RCU locking, since it wasn't necessary any more as far as the
lru list itself was concerned..

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-27 Thread Al Viro

On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote:
> On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote:
> > On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:
> > 
> > > As the matter of fact, let's try this instead - retry the same sucker
> > > immediately in case if trylocks fail.  Comments?
> > 
> > Better yet, let's take "move back to shrink list" into dentry_kill()
> > itself.  Then we get consistent locking rules for dentry_kill() and
> > instead of unlock_on_failure we simply pass it NULL or the shrink
> > list to put the sucker back.  Mika, could you test this one and see
> > if it fixes that livelock?  The difference in behaviour is that in
> > case of trylock failure we hit that sucker again without letting
> > it ride all the way around the list, same as we do for other dentry_kill()
> > callers.
> 
> I tried this patch and unfortunately it still results the same sort of
> livelock. I've attached the dmesg.
> 
> I also tried the serialization patch from Linus and it seemed to fix the
> problem. After several rounds of USB memory stick plug/unplug I haven't
> seen a single "soft lockup" warning in dmesg.
> 
> I'm able to reproduce the problem pretty easily, so if you have
> something else to try I'm more than happy to give it a try.

Could you try this and post the resulting log?  I'd really like to understand
what's going on there - are we really hitting trylock failures there and what
dentries are involved.

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..75f56a6 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "mount.h"
 
@@ -448,7 +449,7 @@ EXPORT_SYMBOL(d_drop);
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
+dentry_kill(struct dentry *dentry, struct list_head *shrink_list)
__releases(dentry->d_lock)
 {
struct inode *inode;
@@ -464,10 +465,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
inode = dentry->d_inode;
if (inode && !spin_trylock(>i_lock)) {
 relock:
-   if (unlock_on_failure) {
-   spin_unlock(>d_lock);
-   cpu_relax();
-   }
+   if (shrink_list)
+   d_shrink_add(dentry, shrink_list);
+   spin_unlock(>d_lock);
+   cpu_relax();
return dentry; /* try again with same dentry */
}
if (!IS_ROOT(dentry))
@@ -542,6 +543,14 @@ out:
  * on the compiler to always get this right (gcc generally doesn't).
  * Real recursion would eat up our stack space.
  */
+static inline void dump(const char *s, struct dentry *dentry)
+{
+   if (unlikely(dentry->d_sb->s_magic == SYSFS_MAGIC)) {
+   printk(KERN_ERR "%s[%pd4]; CPU %d PID %d [%s]\n",
+   s, dentry, smp_processor_id(),
+   task_pid_nr(current), current->comm);
+   }
+}
 
 /*
  * dput - release a dentry
@@ -579,7 +588,9 @@ repeat:
return;
 
 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   if (dentry->d_inode)
+   dump("dput", dentry);
+   dentry = dentry_kill(dentry, NULL);
if (dentry)
goto repeat;
 }
@@ -798,6 +809,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list->prev, struct dentry, d_lru);
+again:
spin_lock(>d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -815,22 +827,19 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
+   dump("shrink", dentry);
+   parent = dentry_kill(dentry, list);
/*
 * If dentry_kill returns NULL, we have nothing more to do.
 */
if (!parent)
continue;
 
+/* if trylocks have failed; just do it again */
if (unlikely(parent == dentry)) {
-   /*
-* trylocks have failed and d_lock has been held the
-* whole time, so it could not have been added to any
-* other lists. Just add it back to the shrink list.
-*/
-   d_shrink_add(dentry, list);
-   spin_unlock(>d_lock);
-   continue;
+   if (dentry->d_sb->s_magic == SYSFS_MAGIC)
+   printk(KERN_ERR "A");
+   goto again;
}
/*
 * We need to prune ancestors too. This is necessary to prevent
@@ -839,8 +848,10 @@ static void

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-27 Thread Al Viro

On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote:
 On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote:
  On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:
  
   As the matter of fact, let's try this instead - retry the same sucker
   immediately in case if trylocks fail.  Comments?
  
  Better yet, let's take move back to shrink list into dentry_kill()
  itself.  Then we get consistent locking rules for dentry_kill() and
  instead of unlock_on_failure we simply pass it NULL or the shrink
  list to put the sucker back.  Mika, could you test this one and see
  if it fixes that livelock?  The difference in behaviour is that in
  case of trylock failure we hit that sucker again without letting
  it ride all the way around the list, same as we do for other dentry_kill()
  callers.
 
 I tried this patch and unfortunately it still results the same sort of
 livelock. I've attached the dmesg.
 
 I also tried the serialization patch from Linus and it seemed to fix the
 problem. After several rounds of USB memory stick plug/unplug I haven't
 seen a single soft lockup warning in dmesg.
 
 I'm able to reproduce the problem pretty easily, so if you have
 something else to try I'm more than happy to give it a try.

Could you try this and post the resulting log?  I'd really like to understand
what's going on there - are we really hitting trylock failures there and what
dentries are involved.

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..75f56a6 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -38,6 +38,7 @@
 #include linux/prefetch.h
 #include linux/ratelimit.h
 #include linux/list_lru.h
+#include linux/magic.h
 #include internal.h
 #include mount.h
 
@@ -448,7 +449,7 @@ EXPORT_SYMBOL(d_drop);
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
+dentry_kill(struct dentry *dentry, struct list_head *shrink_list)
__releases(dentry-d_lock)
 {
struct inode *inode;
@@ -464,10 +465,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
inode = dentry-d_inode;
if (inode  !spin_trylock(inode-i_lock)) {
 relock:
-   if (unlock_on_failure) {
-   spin_unlock(dentry-d_lock);
-   cpu_relax();
-   }
+   if (shrink_list)
+   d_shrink_add(dentry, shrink_list);
+   spin_unlock(dentry-d_lock);
+   cpu_relax();
return dentry; /* try again with same dentry */
}
if (!IS_ROOT(dentry))
@@ -542,6 +543,14 @@ out:
  * on the compiler to always get this right (gcc generally doesn't).
  * Real recursion would eat up our stack space.
  */
+static inline void dump(const char *s, struct dentry *dentry)
+{
+   if (unlikely(dentry-d_sb-s_magic == SYSFS_MAGIC)) {
+   printk(KERN_ERR %s[%pd4]; CPU %d PID %d [%s]\n,
+   s, dentry, smp_processor_id(),
+   task_pid_nr(current), current-comm);
+   }
+}
 
 /*
  * dput - release a dentry
@@ -579,7 +588,9 @@ repeat:
return;
 
 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   if (dentry-d_inode)
+   dump(dput, dentry);
+   dentry = dentry_kill(dentry, NULL);
if (dentry)
goto repeat;
 }
@@ -798,6 +809,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list-prev, struct dentry, d_lru);
+again:
spin_lock(dentry-d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -815,22 +827,19 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
+   dump(shrink, dentry);
+   parent = dentry_kill(dentry, list);
/*
 * If dentry_kill returns NULL, we have nothing more to do.
 */
if (!parent)
continue;
 
+/* if trylocks have failed; just do it again */
if (unlikely(parent == dentry)) {
-   /*
-* trylocks have failed and d_lock has been held the
-* whole time, so it could not have been added to any
-* other lists. Just add it back to the shrink list.
-*/
-   d_shrink_add(dentry, list);
-   spin_unlock(dentry-d_lock);
-   continue;
+   if (dentry-d_sb-s_magic == SYSFS_MAGIC)
+   printk(KERN_ERR A);
+   goto again;
}
/*
 * We need to prune ancestors too. This is necessary to prevent
@@ -839,8 +848,10 @@ static void

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:

> As the matter of fact, let's try this instead - retry the same sucker
> immediately in case if trylocks fail.  Comments?

Better yet, let's take "move back to shrink list" into dentry_kill()
itself.  Then we get consistent locking rules for dentry_kill() and
instead of unlock_on_failure we simply pass it NULL or the shrink
list to put the sucker back.  Mika, could you test this one and see
if it fixes that livelock?  The difference in behaviour is that in
case of trylock failure we hit that sucker again without letting
it ride all the way around the list, same as we do for other dentry_kill()
callers.

Signed-off-by: Al Viro 
---
diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..77c95e5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -448,7 +448,7 @@ EXPORT_SYMBOL(d_drop);
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
+dentry_kill(struct dentry *dentry, struct list_head *shrink_list)
__releases(dentry->d_lock)
 {
struct inode *inode;
@@ -464,10 +464,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
inode = dentry->d_inode;
if (inode && !spin_trylock(>i_lock)) {
 relock:
-   if (unlock_on_failure) {
-   spin_unlock(>d_lock);
-   cpu_relax();
-   }
+   if (shrink_list)
+   d_shrink_add(dentry, shrink_list);
+   spin_unlock(>d_lock);
+   cpu_relax();
return dentry; /* try again with same dentry */
}
if (!IS_ROOT(dentry))
@@ -579,7 +579,7 @@ repeat:
return;
 
 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   dentry = dentry_kill(dentry, NULL);
if (dentry)
goto repeat;
 }
@@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list->prev, struct dentry, d_lru);
+again:
spin_lock(>d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -815,23 +816,16 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
+   parent = dentry_kill(dentry, list);
/*
 * If dentry_kill returns NULL, we have nothing more to do.
 */
if (!parent)
continue;
 
-   if (unlikely(parent == dentry)) {
-   /*
-* trylocks have failed and d_lock has been held the
-* whole time, so it could not have been added to any
-* other lists. Just add it back to the shrink list.
-*/
-   d_shrink_add(dentry, list);
-   spin_unlock(>d_lock);
-   continue;
-   }
+/* if trylocks have failed; just do it again */
+   if (unlikely(parent == dentry))
+   goto again;
/*
 * We need to prune ancestors too. This is necessary to prevent
 * quadratic behavior of shrink_dcache_parent(), but is also
@@ -840,7 +834,7 @@ static void shrink_dentry_list(struct list_head *list)
 */
dentry = parent;
while (dentry && !lockref_put_or_lock(>d_lockref))
-   dentry = dentry_kill(dentry, 1);
+   dentry = dentry_kill(dentry, NULL);
}
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Tue, May 27, 2014 at 02:40:54AM +0100, Al Viro wrote:
> It looks plausible, but I doubt that serializing check_submounts_and_drop()
> will suffice - shrink_dcache_parent() is just as unpleasant and it *is*
> triggered in the same situations.  Moreover, the lack of loop in memory
> shrinkers doesn't help - we might get shrink_dentry_list() from one of
> those and loops that keep calling d_walk() from check_submounts_and_drop()
> or shrink_dcache_parent()...
> 
> > Anyway, I'd like Mika to test the stupid "let's serialize the dentry
> > shrinking in check_submounts_and_drop()" to see if his problem goes
> > away. I agree that it's not the _proper_ fix, but we're damn late in
> > the rc series..
> 
> That we are...  FWIW, if the nastiness matches the description above,
> the right place to do something probably would be when those two
> suckers get positive return value from d_walk() along with an empty
> shrink list.  I wonder if we should do down_read() in shrink_dentry_list()
> and down_write();up_write() in that case in shrink_dcache_parent() and
> check_submounts_and_drop().  How about the following?

As the matter of fact, let's try this instead - retry the same sucker
immediately in case if trylocks fail.  Comments?

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..d58d4cc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list->prev, struct dentry, d_lru);
+again:
spin_lock(>d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -830,7 +831,8 @@ static void shrink_dentry_list(struct list_head *list)
 */
d_shrink_add(dentry, list);
spin_unlock(>d_lock);
-   continue;
+   cpu_relax();
+   goto again;
}
/*
 * We need to prune ancestors too. This is necessary to prevent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 01:24:52PM -0700, Linus Torvalds wrote:
> Two things.
> 
> (1) The facts.
> 
> Just check the callchains on every single CPU in Mika's original email.

Point.

> (2) The code.
> 
> Yes, the whole looping over the dentry tree happens in other places
> too, but shrink_dcache_parents() is already called under s_umount

But that one's not true.  shrink_dcache_parent() is called from all kinds
of places, and it isn't guaranteed to be serialized at all.
For example, d_invalidate() will do it, and I wouldn't be surprised
to see it called in environment where we see shitloads of ->d_revalidate()
hitting dentries that ought to be invalidated.  In fact, unless we have
something mounted under sysfs, those calls of check_submounts_and_drop()
will be followed by d_invalidate().

> > I really, really wonder WTF is causing that - we have spent 20-odd
> > seconds spinning while dentries in there were being evicted by
> > something.  That - on sysfs, where dentry_kill() should be non-blocking
> > and very fast.  Something very fishy is going on and I'd really like
> > to understand the use pattern we are seeing there.
> 
> I think it literally is just a livelock. Just look at the NMI
> backtraces for each stuck CPU: most of them are waiting for the dentry
> lock in d_walk(). They have probably all a few dentries on their own
> list. One of the CPU is actually _in_ shrink_dentry_list().
> 
> Now, the way our ticket spinlocks work, they are actually fair: which
> means that I can easily imagine us getting into a pattern, where if
> you have the right insane starting conditions, each CPU will basically
> get their own dentry list.
> 
> That said, the only way I can see that nobody ever makes any progress
> is if somebody as the inode locked, and then dentry_kill() turns into
> a no-op. Otherwise one of those threads should always kill one or more
> dentries, afaik. We do have that "trylock on i_lock, then trylock on
> parent->d_lock", and if either of those fails, drop and re-try loop. I
> wonder if we can get into a situation where lots of people hold each
> others dentries locks sufficiently that dentry_kill() just ends up
> failing and looping..

Umm...  Let me see if I understood you correctly - you think that it's
shrink_dentry_list() cycling through a bunch of dentries, failing trylocks
on all of them due to d_walk() from other threads that keeps hitting ->d_lock
on parents (->i_lock is less likely, AFAICS).  Then we move the sucker
to the end of shrink list and try the next one, ad infinitum.  And those
d_walk() callers keep looping since they keep finding those dentries and
nothing else...  Right?

It looks plausible, but I doubt that serializing check_submounts_and_drop()
will suffice - shrink_dcache_parent() is just as unpleasant and it *is*
triggered in the same situations.  Moreover, the lack of loop in memory
shrinkers doesn't help - we might get shrink_dentry_list() from one of
those and loops that keep calling d_walk() from check_submounts_and_drop()
or shrink_dcache_parent()...

> Anyway, I'd like Mika to test the stupid "let's serialize the dentry
> shrinking in check_submounts_and_drop()" to see if his problem goes
> away. I agree that it's not the _proper_ fix, but we're damn late in
> the rc series..

That we are...  FWIW, if the nastiness matches the description above,
the right place to do something probably would be when those two
suckers get positive return value from d_walk() along with an empty
shrink list.  I wonder if we should do down_read() in shrink_dentry_list()
and down_write();up_write() in that case in shrink_dcache_parent() and
check_submounts_and_drop().  How about the following?

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..72f2c95 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -795,7 +795,14 @@ EXPORT_SYMBOL(d_prune_aliases);
 static void shrink_dentry_list(struct list_head *list)
 {
struct dentry *dentry, *parent;
+   static DECLARE_RWSEM(shrink_sem);

+   if (unlikely(list_empty(list))) {
+   down_write(_sem);
+   up_write(_sem);
+   return;
+   }
+   down_read(_sem);
while (!list_empty(list)) {
dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(>d_lock);
@@ -842,6 +849,7 @@ static void shrink_dentry_list(struct list_head *list)
while (dentry && !lockref_put_or_lock(>d_lockref))
dentry = dentry_kill(dentry, 1);
}
+   up_read(_sem);
 }

 static enum lru_status
@@ -923,7 +931,8 @@ long prune_dcache_sb(struct super_block *sb, unsigned long 
nr_to_scan,

freed = list_lru_walk_node(>s_dentry_lru, nid, dentry_lru_isolate,
   , _to_scan);
-   shrink_dentry_list();
+   if (!list_empty())
+   shrink_dentry_list();
return freed;
 }

@@ -966,7 +975,8 @@ void shrink_dcache_sb(struct super_block *sb)

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Linus Torvalds

On Mon, May 26, 2014 at 11:26 AM, Al Viro  wrote:
> On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote:
>> On Mon, May 26, 2014 at 8:27 AM, Al Viro  wrote:
>> >
>> > That's the livelock.  OK.
>>
>> Hmm. Is there any reason we don't have some exclusion around
>> "check_submounts_and_drop()"?
>>
>> That would seem to be the simplest way to avoid any livelock: just
>> don't allow concurrent calls (we could make the lock per-filesystem or
>> whatever). This whole case should all be for just exceptional cases
>> anyway.
>>
>> We already sleep in that thing (well, "cond_resched()"), so taking a
>> mutex should be fine.
>
> What makes you think that it's another check_submounts_and_drop()?

Two things.

(1) The facts.

Just check the callchains on every single CPU in Mika's original email.

It *is* another check_submounts_and_drop() (in Mika's case, always
through kernfs_dop_revalidate()). It's not me "thinking" so. You can
see it for yourself.

This seems to be triggered by systemd-udevd doing (in every single thread):

readlink():
  SyS_readlink ->
user_path_at_empty ->
  filename_lookup ->
path_lookupat ->
  link_path_walk ->
lookup_fast ->
  kernfs_dop_revalidate ->
check_submounts_and_drop

and I suspect it's the "kernfs_active()" check that basically causes
that storm of revalidate failures that leads to lots of
check_submounts_and_drop() cases.

(2) The code.

Yes, the whole looping over the dentry tree happens in other places
too, but shrink_dcache_parents() is already called under s_umount, and
the out-of-memory pruning isn't done in a for-loop.

So if it's a livelock, check_submounts_and_drop() really is pretty
special. I agree that it's not necessarily unique from a race
standpoint, but it does seem to be in a class of its own when it comes
to be able to *trigger* any potential livelock. In particular, the
fact that kernfs can generate that sudden storm of
check_submounts_and_drop() calls when something goes away.

> I really, really wonder WTF is causing that - we have spent 20-odd
> seconds spinning while dentries in there were being evicted by
> something.  That - on sysfs, where dentry_kill() should be non-blocking
> and very fast.  Something very fishy is going on and I'd really like
> to understand the use pattern we are seeing there.

I think it literally is just a livelock. Just look at the NMI
backtraces for each stuck CPU: most of them are waiting for the dentry
lock in d_walk(). They have probably all a few dentries on their own
list. One of the CPU is actually _in_ shrink_dentry_list().

Now, the way our ticket spinlocks work, they are actually fair: which
means that I can easily imagine us getting into a pattern, where if
you have the right insane starting conditions, each CPU will basically
get their own dentry list.

That said, the only way I can see that nobody ever makes any progress
is if somebody as the inode locked, and then dentry_kill() turns into
a no-op. Otherwise one of those threads should always kill one or more
dentries, afaik. We do have that "trylock on i_lock, then trylock on
parent->d_lock", and if either of those fails, drop and re-try loop. I
wonder if we can get into a situation where lots of people hold each
others dentries locks sufficiently that dentry_kill() just ends up
failing and looping..

Anyway, I'd like Mika to test the stupid "let's serialize the dentry
shrinking in check_submounts_and_drop()" to see if his problem goes
away. I agree that it's not the _proper_ fix, but we're damn late in
the rc series..

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote:
> On Mon, May 26, 2014 at 8:27 AM, Al Viro  wrote:
> >
> > That's the livelock.  OK.
> 
> Hmm. Is there any reason we don't have some exclusion around
> "check_submounts_and_drop()"?
> 
> That would seem to be the simplest way to avoid any livelock: just
> don't allow concurrent calls (we could make the lock per-filesystem or
> whatever). This whole case should all be for just exceptional cases
> anyway.
> 
> We already sleep in that thing (well, "cond_resched()"), so taking a
> mutex should be fine.

What makes you think that it's another check_submounts_and_drop()?
And not, e.g., shrink_dcache_parent().  Or memory shrinkers.  Or
some twit sitting in a subdirectory and doing stat(2) in a loop, for
that matter...

I really, really wonder WTF is causing that - we have spent 20-odd
seconds spinning while dentries in there were being evicted by
something.  That - on sysfs, where dentry_kill() should be non-blocking
and very fast.  Something very fishy is going on and I'd really like
to understand the use pattern we are seeing there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Linus Torvalds

On Mon, May 26, 2014 at 8:27 AM, Al Viro  wrote:
>
> That's the livelock.  OK.

Hmm. Is there any reason we don't have some exclusion around
"check_submounts_and_drop()"?

That would seem to be the simplest way to avoid any livelock: just
don't allow concurrent calls (we could make the lock per-filesystem or
whatever). This whole case should all be for just exceptional cases
anyway.

We already sleep in that thing (well, "cond_resched()"), so taking a
mutex should be fine.

The attached (TOTALLY UNTESTED!) patch as an example. Avert your eyes.
Mika, does this make any difference?

  Linus
 fs/dcache.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01eefc07..663fd04614cc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1333,16 +1333,19 @@ int check_submounts_and_drop(struct dentry *dentry)

for (;;) {
struct select_data data;
+   static DEFINE_MUTEX(mutex);

INIT_LIST_HEAD();
data.start = dentry;
data.found = 0;

+   mutex_lock();
d_walk(dentry, , check_and_collect, check_and_drop);
ret = data.found;

if (!list_empty())
shrink_dentry_list();
+   mutex_unlock();

if (ret <= 0)
break;

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 04:27:03PM +0100, Al Viro wrote:

> Do these livelocks keep happening indefinitely, once triggered?  IOW,
> is that a buggered state of dcache and/or kernfs, or is it a transient pileup
> that happens when we invalidate a subtree there?

Could you slap
if (dentry->d_sb->s_magic == SYSFS_MAGIC) {
printk(KERN_INFO "killing %p4d", dentry);
WARN_ON(1);
}
in the very beginning of dentry_kill(),
if (dentry->d_sb->s_magic == SYSFS_MAGIC) {
printk(KERN_INFO "invalidate %p4d", dentry);
WARN_ON(1);
}
right after the
if (!dentry->d_inode) {
d_drop(dentry);
goto out;
}
in check_submounts_and_drop(), reproduce that shite and see what
gets into the log between USB disconnect and soft lockup?  Warning:
it will produce an obscene amount of output.  If it gets _really_ excessive
(as in "can't even get through the boot without drowning in syslog traffic"),
we could add a sysctl turning those on, but let's try and see if it's
survivable in that form first...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 05:29:48PM +0300, Mika Westerberg wrote:

> I attached the dmesg with 'echo t > /proc/sysrq-trigger' included.


> [  133.826957] usb 3-10.3: USB disconnect, device number 7
> [  159.326769] BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:1824]
> [  159.326809] CPU: 6 PID: 1824 Comm: systemd-udevd Tainted: G  I   
> 3.15.0-rc7 #55
> [  159.326810] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 
> TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014
> [  159.326812] task: 880472854a80 ti: 8804747ec000 task.ti: 
> 8804747ec000
[snip]
> [  159.326834] Call Trace:
> [  159.326838]  [] dentry_kill+0x36/0x280
> [  159.326840]  [] shrink_dentry_list+0x8a/0x110
> [  159.326842]  [] check_submounts_and_drop+0x74/0xa0
> [  159.326845]  [] kernfs_dop_revalidate+0x5d/0xd0
> [  159.326847]  [] lookup_fast+0x26d/0x2c0
> [  159.326849]  [] path_lookupat+0x155/0x780
> [  159.326851]  [] ? final_putname+0x22/0x50
> [  159.326853]  [] ? user_path_at_empty+0x5f/0x90
> [  159.326856]  [] ? kmem_cache_alloc+0x35/0x1f0
> [  159.326858]  [] ? getname_flags+0x4f/0x1a0
> [  159.326860]  [] filename_lookup+0x2b/0xc0
> [  159.326862]  [] user_path_at_empty+0x54/0x90
> [  159.326865]  [] ? SYSC_newstat+0x1f/0x40
> [  159.326867]  [] SyS_readlink+0x4c/0x130
> [  159.326870]  [] ? __audit_syscall_exit+0x1f6/0x2a0
> [  159.326872]  [] system_call_fastpath+0x16/0x1b


That's the livelock.  OK.  But in the stack traces below
a) that systemd-udevd instance is happily running in userland
and
b) the only traces of dentry_kill() in the stack are noise
in stacks of gdbus and dbus-daemon.

*grumble*  I wonder if we should instrument d_shrink_add()/d_lru_shrink_mode()
so that they would tag dentry with pointer to current, allowing us to report
something useful in select_collect()...

What's really strange is that the same livelock seems to repeat at least
once more; dentries involved the first time around should've been dead
and buried by then.  And AFAICS, in the log you've originally posted
exactly that has happened - both times to the same process...

Do these livelocks keep happening indefinitely, once triggered?  IOW,
is that a buggered state of dcache and/or kernfs, or is it a transient pileup
that happens when we invalidate a subtree there?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

[fsdevel and folks who'd been on d_lru corruption thread Cc'd - that's
a continuation of the same mess]

On Mon, May 26, 2014 at 12:37:41PM +0300, Mika Westerberg wrote:
> Hi,
> 
> After v3.15-rc4 my Fedora 20 system with mainline kernel has been suffering
> from the above lockup.
> 
> This is easy to reproduce:
> 
>  1) Plug in USB memory stick (to xHCI port)
>  2) Unplug it
> 
> Typically only one iteration is needed and suddenly I can see
> systemd-udev taking 100% CPU and eventually the whole system becomes
> unusable.
> 
> I've tried to investigate and it looks like we just spin in
> shrink_dentry_list() forever. Reverting following fs/dcache.c commits
> the issue goes away:
> 
> 60942f2f235ce7b817166cdf355eed729094834d dcache: don't need rcu in 
> shrink_dentry_list()
> 9c8c10e262e0f62cb2530f1b076de979123183dd more graceful recovery in 
> umount_collect()
> fe91522a7ba82ca1a51b07e19954b3825e4aaa22 don't remove from shrink list in 
> select_collect()

Which means that we very likely have a reproducer for d_lru-corrupting
races in earlier kernels here.  I wonder if it can be simulated under KVM...

> (The first two commits themselves don't seem to be related but reverting
> them is needed so that the last one can be cleanly reverted).

What I really wonder is what else is going on there; it keeps finding a bunch
of dentries _already_ on shrink list(s) of somebody else.  And spins (with
eviction of everything worthy not already on shrink lists and cond_resched()
thrown in) to give whoever's trying to evict those suckers do their job.

This means that we either have somebody stuck trying to evict a dentry, or
that more and more dentries keep being added and evicted there.  Is somebody
sitting in a subdirectory of invalid one and trying to do lookups there,
perhaps?  But in that case we would have the same livelock in the older
kernels, possibly harder to hit, but still there...

FWIW, older kernels just went ahead, picked those already-on-shrink-list
dentries and did dentry_kill(), hopefully not at the time when the owner of
shrink list got around to removing the neighbor from that list.  With
list corruption in case it happened at just the wrong moment.

I don't have Fedora anywhere outside of KVM test images, and it'll take
a while to inflict it on actual hardware; in the meanwhile, could you
hit alt-sysrq-t after it gets stuck and post the results?  At least that
would give some idea whether it's somebody stuck on trying to evict a dentry
or a stream of new dentries being added and killed there.

AFAICS, kernfs ->d_release() isn't blocking and final iput() there also
doesn't look like it's likely to get stuck, but I'd rather have that
possibility excluded...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

[fsdevel and folks who'd been on d_lru corruption thread Cc'd - that's
a continuation of the same mess]

On Mon, May 26, 2014 at 12:37:41PM +0300, Mika Westerberg wrote:
 Hi,
 
 After v3.15-rc4 my Fedora 20 system with mainline kernel has been suffering
 from the above lockup.
 
 This is easy to reproduce:
 
  1) Plug in USB memory stick (to xHCI port)
  2) Unplug it
 
 Typically only one iteration is needed and suddenly I can see
 systemd-udev taking 100% CPU and eventually the whole system becomes
 unusable.
 
 I've tried to investigate and it looks like we just spin in
 shrink_dentry_list() forever. Reverting following fs/dcache.c commits
 the issue goes away:
 
 60942f2f235ce7b817166cdf355eed729094834d dcache: don't need rcu in 
 shrink_dentry_list()
 9c8c10e262e0f62cb2530f1b076de979123183dd more graceful recovery in 
 umount_collect()
 fe91522a7ba82ca1a51b07e19954b3825e4aaa22 don't remove from shrink list in 
 select_collect()

Which means that we very likely have a reproducer for d_lru-corrupting
races in earlier kernels here.  I wonder if it can be simulated under KVM...

 (The first two commits themselves don't seem to be related but reverting
 them is needed so that the last one can be cleanly reverted).

What I really wonder is what else is going on there; it keeps finding a bunch
of dentries _already_ on shrink list(s) of somebody else.  And spins (with
eviction of everything worthy not already on shrink lists and cond_resched()
thrown in) to give whoever's trying to evict those suckers do their job.

This means that we either have somebody stuck trying to evict a dentry, or
that more and more dentries keep being added and evicted there.  Is somebody
sitting in a subdirectory of invalid one and trying to do lookups there,
perhaps?  But in that case we would have the same livelock in the older
kernels, possibly harder to hit, but still there...

FWIW, older kernels just went ahead, picked those already-on-shrink-list
dentries and did dentry_kill(), hopefully not at the time when the owner of
shrink list got around to removing the neighbor from that list.  With
list corruption in case it happened at just the wrong moment.

I don't have Fedora anywhere outside of KVM test images, and it'll take
a while to inflict it on actual hardware; in the meanwhile, could you
hit alt-sysrq-t after it gets stuck and post the results?  At least that
would give some idea whether it's somebody stuck on trying to evict a dentry
or a stream of new dentries being added and killed there.

AFAICS, kernfs -d_release() isn't blocking and final iput() there also
doesn't look like it's likely to get stuck, but I'd rather have that
possibility excluded...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 05:29:48PM +0300, Mika Westerberg wrote:

 I attached the dmesg with 'echo t  /proc/sysrq-trigger' included.


 [  133.826957] usb 3-10.3: USB disconnect, device number 7
 [  159.326769] BUG: soft lockup - CPU#6 stuck for 22s! [systemd-udevd:1824]
 [  159.326809] CPU: 6 PID: 1824 Comm: systemd-udevd Tainted: G  I   
 3.15.0-rc7 #55
 [  159.326810] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 
 TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014
 [  159.326812] task: 880472854a80 ti: 8804747ec000 task.ti: 
 8804747ec000
[snip]
 [  159.326834] Call Trace:
 [  159.326838]  [811e74e6] dentry_kill+0x36/0x280
 [  159.326840]  [811e793a] shrink_dentry_list+0x8a/0x110
 [  159.326842]  [811e81c4] check_submounts_and_drop+0x74/0xa0
 [  159.326845]  [81245c5d] kernfs_dop_revalidate+0x5d/0xd0
 [  159.326847]  [811dba4d] lookup_fast+0x26d/0x2c0
 [  159.326849]  [811dd2b5] path_lookupat+0x155/0x780
 [  159.326851]  [811dc152] ? final_putname+0x22/0x50
 [  159.326853]  [811e198f] ? user_path_at_empty+0x5f/0x90
 [  159.326856]  [811b6eb5] ? kmem_cache_alloc+0x35/0x1f0
 [  159.326858]  [811dc1cf] ? getname_flags+0x4f/0x1a0
 [  159.326860]  [811dd90b] filename_lookup+0x2b/0xc0
 [  159.326862]  [811e1984] user_path_at_empty+0x54/0x90
 [  159.326865]  [811d65ff] ? SYSC_newstat+0x1f/0x40
 [  159.326867]  [811d69ec] SyS_readlink+0x4c/0x130
 [  159.326870]  [81113c76] ? __audit_syscall_exit+0x1f6/0x2a0
 [  159.326872]  [816ade69] system_call_fastpath+0x16/0x1b


That's the livelock.  OK.  But in the stack traces below
a) that systemd-udevd instance is happily running in userland
and
b) the only traces of dentry_kill() in the stack are noise
in stacks of gdbus and dbus-daemon.

*grumble*  I wonder if we should instrument d_shrink_add()/d_lru_shrink_mode()
so that they would tag dentry with pointer to current, allowing us to report
something useful in select_collect()...

What's really strange is that the same livelock seems to repeat at least
once more; dentries involved the first time around should've been dead
and buried by then.  And AFAICS, in the log you've originally posted
exactly that has happened - both times to the same process...

Do these livelocks keep happening indefinitely, once triggered?  IOW,
is that a buggered state of dcache and/or kernfs, or is it a transient pileup
that happens when we invalidate a subtree there?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 04:27:03PM +0100, Al Viro wrote:

 Do these livelocks keep happening indefinitely, once triggered?  IOW,
 is that a buggered state of dcache and/or kernfs, or is it a transient pileup
 that happens when we invalidate a subtree there?

Could you slap
if (dentry-d_sb-s_magic == SYSFS_MAGIC) {
printk(KERN_INFO killing %p4d, dentry);
WARN_ON(1);
}
in the very beginning of dentry_kill(),
if (dentry-d_sb-s_magic == SYSFS_MAGIC) {
printk(KERN_INFO invalidate %p4d, dentry);
WARN_ON(1);
}
right after the
if (!dentry-d_inode) {
d_drop(dentry);
goto out;
}
in check_submounts_and_drop(), reproduce that shite and see what
gets into the log between USB disconnect and soft lockup?  Warning:
it will produce an obscene amount of output.  If it gets _really_ excessive
(as in can't even get through the boot without drowning in syslog traffic),
we could add a sysctl turning those on, but let's try and see if it's
survivable in that form first...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Linus Torvalds

On Mon, May 26, 2014 at 8:27 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 That's the livelock.  OK.

Hmm. Is there any reason we don't have some exclusion around
check_submounts_and_drop()?

That would seem to be the simplest way to avoid any livelock: just
don't allow concurrent calls (we could make the lock per-filesystem or
whatever). This whole case should all be for just exceptional cases
anyway.

We already sleep in that thing (well, cond_resched()), so taking a
mutex should be fine.

The attached (TOTALLY UNTESTED!) patch as an example. Avert your eyes.
Mika, does this make any difference?

  Linus
 fs/dcache.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01eefc07..663fd04614cc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1333,16 +1333,19 @@ int check_submounts_and_drop(struct dentry *dentry)
 
for (;;) {
struct select_data data;
+   static DEFINE_MUTEX(mutex);
 
INIT_LIST_HEAD(data.dispose);
data.start = dentry;
data.found = 0;
 
+   mutex_lock(mutex);
d_walk(dentry, data, check_and_collect, check_and_drop);
ret = data.found;
 
if (!list_empty(data.dispose))
shrink_dentry_list(data.dispose);
+   mutex_unlock(mutex);
 
if (ret = 0)
break;

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote:
 On Mon, May 26, 2014 at 8:27 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  That's the livelock.  OK.
 
 Hmm. Is there any reason we don't have some exclusion around
 check_submounts_and_drop()?
 
 That would seem to be the simplest way to avoid any livelock: just
 don't allow concurrent calls (we could make the lock per-filesystem or
 whatever). This whole case should all be for just exceptional cases
 anyway.
 
 We already sleep in that thing (well, cond_resched()), so taking a
 mutex should be fine.

What makes you think that it's another check_submounts_and_drop()?
And not, e.g., shrink_dcache_parent().  Or memory shrinkers.  Or
some twit sitting in a subdirectory and doing stat(2) in a loop, for
that matter...

I really, really wonder WTF is causing that - we have spent 20-odd
seconds spinning while dentries in there were being evicted by
something.  That - on sysfs, where dentry_kill() should be non-blocking
and very fast.  Something very fishy is going on and I'd really like
to understand the use pattern we are seeing there.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Linus Torvalds

On Mon, May 26, 2014 at 11:26 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Mon, May 26, 2014 at 11:17:42AM -0700, Linus Torvalds wrote:
 On Mon, May 26, 2014 at 8:27 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  That's the livelock.  OK.

 Hmm. Is there any reason we don't have some exclusion around
 check_submounts_and_drop()?

 That would seem to be the simplest way to avoid any livelock: just
 don't allow concurrent calls (we could make the lock per-filesystem or
 whatever). This whole case should all be for just exceptional cases
 anyway.

 We already sleep in that thing (well, cond_resched()), so taking a
 mutex should be fine.

 What makes you think that it's another check_submounts_and_drop()?

Two things.

(1) The facts.

Just check the callchains on every single CPU in Mika's original email.

It *is* another check_submounts_and_drop() (in Mika's case, always
through kernfs_dop_revalidate()). It's not me thinking so. You can
see it for yourself.

This seems to be triggered by systemd-udevd doing (in every single thread):

readlink():
  SyS_readlink -
user_path_at_empty -
  filename_lookup -
path_lookupat -
  link_path_walk -
lookup_fast -
  kernfs_dop_revalidate -
check_submounts_and_drop

and I suspect it's the kernfs_active() check that basically causes
that storm of revalidate failures that leads to lots of
check_submounts_and_drop() cases.

(2) The code.

Yes, the whole looping over the dentry tree happens in other places
too, but shrink_dcache_parents() is already called under s_umount, and
the out-of-memory pruning isn't done in a for-loop.

So if it's a livelock, check_submounts_and_drop() really is pretty
special. I agree that it's not necessarily unique from a race
standpoint, but it does seem to be in a class of its own when it comes
to be able to *trigger* any potential livelock. In particular, the
fact that kernfs can generate that sudden storm of
check_submounts_and_drop() calls when something goes away.

 I really, really wonder WTF is causing that - we have spent 20-odd
 seconds spinning while dentries in there were being evicted by
 something.  That - on sysfs, where dentry_kill() should be non-blocking
 and very fast.  Something very fishy is going on and I'd really like
 to understand the use pattern we are seeing there.

I think it literally is just a livelock. Just look at the NMI
backtraces for each stuck CPU: most of them are waiting for the dentry
lock in d_walk(). They have probably all a few dentries on their own
list. One of the CPU is actually _in_ shrink_dentry_list().

Now, the way our ticket spinlocks work, they are actually fair: which
means that I can easily imagine us getting into a pattern, where if
you have the right insane starting conditions, each CPU will basically
get their own dentry list.

That said, the only way I can see that nobody ever makes any progress
is if somebody as the inode locked, and then dentry_kill() turns into
a no-op. Otherwise one of those threads should always kill one or more
dentries, afaik. We do have that trylock on i_lock, then trylock on
parent-d_lock, and if either of those fails, drop and re-try loop. I
wonder if we can get into a situation where lots of people hold each
others dentries locks sufficiently that dentry_kill() just ends up
failing and looping..

Anyway, I'd like Mika to test the stupid let's serialize the dentry
shrinking in check_submounts_and_drop() to see if his problem goes
away. I agree that it's not the _proper_ fix, but we're damn late in
the rc series..

   Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Mon, May 26, 2014 at 01:24:52PM -0700, Linus Torvalds wrote:
 Two things.
 
 (1) The facts.
 
 Just check the callchains on every single CPU in Mika's original email.

Point.

 (2) The code.
 
 Yes, the whole looping over the dentry tree happens in other places
 too, but shrink_dcache_parents() is already called under s_umount

But that one's not true.  shrink_dcache_parent() is called from all kinds
of places, and it isn't guaranteed to be serialized at all.
For example, d_invalidate() will do it, and I wouldn't be surprised
to see it called in environment where we see shitloads of -d_revalidate()
hitting dentries that ought to be invalidated.  In fact, unless we have
something mounted under sysfs, those calls of check_submounts_and_drop()
will be followed by d_invalidate().

  I really, really wonder WTF is causing that - we have spent 20-odd
  seconds spinning while dentries in there were being evicted by
  something.  That - on sysfs, where dentry_kill() should be non-blocking
  and very fast.  Something very fishy is going on and I'd really like
  to understand the use pattern we are seeing there.
 
 I think it literally is just a livelock. Just look at the NMI
 backtraces for each stuck CPU: most of them are waiting for the dentry
 lock in d_walk(). They have probably all a few dentries on their own
 list. One of the CPU is actually _in_ shrink_dentry_list().
 
 Now, the way our ticket spinlocks work, they are actually fair: which
 means that I can easily imagine us getting into a pattern, where if
 you have the right insane starting conditions, each CPU will basically
 get their own dentry list.
 
 That said, the only way I can see that nobody ever makes any progress
 is if somebody as the inode locked, and then dentry_kill() turns into
 a no-op. Otherwise one of those threads should always kill one or more
 dentries, afaik. We do have that trylock on i_lock, then trylock on
 parent-d_lock, and if either of those fails, drop and re-try loop. I
 wonder if we can get into a situation where lots of people hold each
 others dentries locks sufficiently that dentry_kill() just ends up
 failing and looping..

Umm...  Let me see if I understood you correctly - you think that it's
shrink_dentry_list() cycling through a bunch of dentries, failing trylocks
on all of them due to d_walk() from other threads that keeps hitting -d_lock
on parents (-i_lock is less likely, AFAICS).  Then we move the sucker
to the end of shrink list and try the next one, ad infinitum.  And those
d_walk() callers keep looping since they keep finding those dentries and
nothing else...  Right?

It looks plausible, but I doubt that serializing check_submounts_and_drop()
will suffice - shrink_dcache_parent() is just as unpleasant and it *is*
triggered in the same situations.  Moreover, the lack of loop in memory
shrinkers doesn't help - we might get shrink_dentry_list() from one of
those and loops that keep calling d_walk() from check_submounts_and_drop()
or shrink_dcache_parent()...

 Anyway, I'd like Mika to test the stupid let's serialize the dentry
 shrinking in check_submounts_and_drop() to see if his problem goes
 away. I agree that it's not the _proper_ fix, but we're damn late in
 the rc series..

That we are...  FWIW, if the nastiness matches the description above,
the right place to do something probably would be when those two
suckers get positive return value from d_walk() along with an empty
shrink list.  I wonder if we should do down_read() in shrink_dentry_list()
and down_write();up_write() in that case in shrink_dcache_parent() and
check_submounts_and_drop().  How about the following?

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..72f2c95 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -795,7 +795,14 @@ EXPORT_SYMBOL(d_prune_aliases);
 static void shrink_dentry_list(struct list_head *list)
 {
struct dentry *dentry, *parent;
+   static DECLARE_RWSEM(shrink_sem);
 
+   if (unlikely(list_empty(list))) {
+   down_write(shrink_sem);
+   up_write(shrink_sem);
+   return;
+   }
+   down_read(shrink_sem);
while (!list_empty(list)) {
dentry = list_entry(list-prev, struct dentry, d_lru);
spin_lock(dentry-d_lock);
@@ -842,6 +849,7 @@ static void shrink_dentry_list(struct list_head *list)
while (dentry  !lockref_put_or_lock(dentry-d_lockref))
dentry = dentry_kill(dentry, 1);
}
+   up_read(shrink_sem);
 }
 
 static enum lru_status
@@ -923,7 +931,8 @@ long prune_dcache_sb(struct super_block *sb, unsigned long 
nr_to_scan,
 
freed = list_lru_walk_node(sb-s_dentry_lru, nid, dentry_lru_isolate,
   dispose, nr_to_scan);
-   shrink_dentry_list(dispose);
+   if (!list_empty(dispose))
+   shrink_dentry_list(dispose);
return freed;
 }
 
@@ -966,7 +975,8 @@ void shrink_dcache_sb(struct super_block *sb)

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Tue, May 27, 2014 at 02:40:54AM +0100, Al Viro wrote:
 It looks plausible, but I doubt that serializing check_submounts_and_drop()
 will suffice - shrink_dcache_parent() is just as unpleasant and it *is*
 triggered in the same situations.  Moreover, the lack of loop in memory
 shrinkers doesn't help - we might get shrink_dentry_list() from one of
 those and loops that keep calling d_walk() from check_submounts_and_drop()
 or shrink_dcache_parent()...
 
  Anyway, I'd like Mika to test the stupid let's serialize the dentry
  shrinking in check_submounts_and_drop() to see if his problem goes
  away. I agree that it's not the _proper_ fix, but we're damn late in
  the rc series..
 
 That we are...  FWIW, if the nastiness matches the description above,
 the right place to do something probably would be when those two
 suckers get positive return value from d_walk() along with an empty
 shrink list.  I wonder if we should do down_read() in shrink_dentry_list()
 and down_write();up_write() in that case in shrink_dcache_parent() and
 check_submounts_and_drop().  How about the following?

As the matter of fact, let's try this instead - retry the same sucker
immediately in case if trylocks fail.  Comments?

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..d58d4cc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list-prev, struct dentry, d_lru);
+again:
spin_lock(dentry-d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -830,7 +831,8 @@ static void shrink_dentry_list(struct list_head *list)
 */
d_shrink_add(dentry, list);
spin_unlock(dentry-d_lock);
-   continue;
+   cpu_relax();
+   goto again;
}
/*
 * We need to prune ancestors too. This is necessary to prevent
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-26 Thread Al Viro

On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:

 As the matter of fact, let's try this instead - retry the same sucker
 immediately in case if trylocks fail.  Comments?

Better yet, let's take move back to shrink list into dentry_kill()
itself.  Then we get consistent locking rules for dentry_kill() and
instead of unlock_on_failure we simply pass it NULL or the shrink
list to put the sucker back.  Mika, could you test this one and see
if it fixes that livelock?  The difference in behaviour is that in
case of trylock failure we hit that sucker again without letting
it ride all the way around the list, same as we do for other dentry_kill()
callers.

Signed-off-by: Al Viro v...@zeniv.linux.org.uk
---
diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..77c95e5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -448,7 +448,7 @@ EXPORT_SYMBOL(d_drop);
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
+dentry_kill(struct dentry *dentry, struct list_head *shrink_list)
__releases(dentry-d_lock)
 {
struct inode *inode;
@@ -464,10 +464,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
inode = dentry-d_inode;
if (inode  !spin_trylock(inode-i_lock)) {
 relock:
-   if (unlock_on_failure) {
-   spin_unlock(dentry-d_lock);
-   cpu_relax();
-   }
+   if (shrink_list)
+   d_shrink_add(dentry, shrink_list);
+   spin_unlock(dentry-d_lock);
+   cpu_relax();
return dentry; /* try again with same dentry */
}
if (!IS_ROOT(dentry))
@@ -579,7 +579,7 @@ repeat:
return;
 
 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   dentry = dentry_kill(dentry, NULL);
if (dentry)
goto repeat;
 }
@@ -798,6 +798,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list-prev, struct dentry, d_lru);
+again:
spin_lock(dentry-d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -815,23 +816,16 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
+   parent = dentry_kill(dentry, list);
/*
 * If dentry_kill returns NULL, we have nothing more to do.
 */
if (!parent)
continue;
 
-   if (unlikely(parent == dentry)) {
-   /*
-* trylocks have failed and d_lock has been held the
-* whole time, so it could not have been added to any
-* other lists. Just add it back to the shrink list.
-*/
-   d_shrink_add(dentry, list);
-   spin_unlock(dentry-d_lock);
-   continue;
-   }
+/* if trylocks have failed; just do it again */
+   if (unlikely(parent == dentry))
+   goto again;
/*
 * We need to prune ancestors too. This is necessary to prevent
 * quadratic behavior of shrink_dcache_parent(), but is also
@@ -840,7 +834,7 @@ static void shrink_dentry_list(struct list_head *list)
 */
dentry = parent;
while (dentry  !lockref_put_or_lock(dentry-d_lockref))
-   dentry = dentry_kill(dentry, 1);
+   dentry = dentry_kill(dentry, NULL);
}
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

100 matches

Mail list logo