Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-09 Thread Boqun Feng
On Mon, Nov 09, 2020 at 09:57:05AM +, Filipe Manana wrote:
> 
> 
> On 09/11/20 08:44, Boqun Feng wrote:
> > Hi Filipe,
> > 
> > On Thu, Nov 05, 2020 at 09:10:12AM +0800, Boqun Feng wrote:
> >> On Wed, Nov 04, 2020 at 07:54:40PM +, Filipe Manana wrote:
> >> [...]
> >>>
> >>> Ok, so I ran 5.10-rc2 plus your two patches (the fix and the debug one):
> >>>
> >>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> >>> index b71ad8d9f1c9..b31d4ad482c7 100644
> >>> --- a/kernel/locking/lockdep.c
> >>> +++ b/kernel/locking/lockdep.c
> >>> @@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
> >>> LOCK_TRACE_SIZE_IN_LONGS;
> >>>
> >>> if (max_entries <= 0) {
> >>> -   if (!debug_locks_off_graph_unlock())
> >>> +   if (!debug_locks_off_graph_unlock()) {
> >>> +   WARN_ON_ONCE(1);
> >>> return NULL;
> >>> +   }
> >>>
> >>> print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too 
> >>> low!");
> >>> dump_stack();
> >>> @@ -5465,7 +5467,7 @@ noinstr int lock_is_held_type(const struct
> >>> lockdep_map *lock, int read)
> >>> unsigned long flags;
> >>> int ret = 0;
> >>>
> >>> -   if (unlikely(!lockdep_enabled()))
> >>> +   if (unlikely(debug_locks && !lockdep_enabled()))
> >>> return 1; /* avoid false negative lockdep_assert_held() */
> >>>
> >>> raw_local_irq_save(flags);
> >>>
> >>> With 3 runs of all fstests, the WARN_ON_ONCE(1) wasn't triggered.
> >>> Unexpected, right?
> >>>
> >>
> >> Kinda, that means we still don't know why lockdep was turned off.
> >>
> >>> Should I try something else?
> >>>
> >>
> >> Thanks for trying this. Let me set up the reproducer on my side, and see
> >> if I could get more information.
> >>
> > 
> > I could hit this with btrfs/187, and when we hit it, lockdep will report
> > the deadlock and turn off, and I think this is the root cause for your
> > hitting the original problem, I will add some analysis after the lockdep
> > splat.
> > 
> > [12295.973309] 
> > [12295.974770] WARNING: possible recursive locking detected
> > [12295.974770] 5.10.0-rc2-btrfs-next-71 #20 Not tainted
> > [12295.974770] 
> > [12295.974770] zsh/701247 is trying to acquire lock:
> > [12295.974770] 92cef43480b8 (&eb->lock){}-{2:2}, at: 
> > btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770] 
> >but task is already holding lock:
> > [12295.974770] 92cef434a038 (&eb->lock){}-{2:2}, at: 
> > btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770] 
> >other info that might help us debug this:
> > [12295.974770]  Possible unsafe locking scenario:
> > 
> > [12295.974770]CPU0
> > [12295.974770]
> > [12295.974770]   lock(&eb->lock);
> > [12295.974770]   lock(&eb->lock);
> > [12295.974770] 
> > *** DEADLOCK ***
> > 
> > [12295.974770]  May be due to missing lock nesting notation
> > 
> > [12295.974770] 2 locks held by zsh/701247:
> > [12295.974770]  #0: 92cef3d315b0 (&sig->cred_guard_mutex){+.+.}-{3:3}, 
> > at: bprm_execve+0xaa/0x920
> > [12295.974770]  #1: 92cef434a038 (&eb->lock){}-{2:2}, at: 
> > btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770] 
> >stack backtrace:
> > [12295.974770] CPU: 6 PID: 701247 Comm: zsh Not tainted 
> > 5.10.0-rc2-btrfs-next-71 #20
> > [12295.974770] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
> > Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
> > [12295.974770] Call Trace:
> > [12295.974770]  dump_stack+0x8b/0xb0
> > [12295.974770]  __lock_acquire.cold+0x175/0x2e9
> > [12295.974770]  lock_acquire+0x15b/0x490
> > [12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770]  ? read_block_for_search+0xf4/0x350 [btrfs]
> > [12295.974770]  _raw_read_lock+0x40/0xa0
> > [12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770]  btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770]  btrfs_search_slot+0x6ac/0xca0 [btrfs]
> > [12295.974770]  btrfs_lookup_xattr+0x7d/0xd0 [btrfs]
> > [12295.974770]  btrfs_getxattr+0x67/0x130 [btrfs]
> > [12295.974770]  __vfs_getxattr+0x53/0x70
> > [12295.974770]  get_vfs_caps_from_disk+0x68/0x1a0
> > [12295.974770]  ? sched_clock_cpu+0x114/0x180
> > [12295.974770]  cap_bprm_creds_from_file+0x181/0x6c0
> > [12295.974770]  security_bprm_creds_from_file+0x2a/0x40
> > [12295.974770]  begin_new_exec+0xf4/0xc40
> > [12295.974770]  ? load_elf_phdrs+0x6b/0xb0
> > [12295.974770]  load_elf_binary+0x66b/0x1620
> > [12295.974770]  ? read_hv_sched_clock_tsc+0x5/0x20
> > [12295.974770]  ? sched_clock+0x5/0x10
> > [12295.974770]  ? sched_clock_local+0x12/0x80
> > [12295.974770]  ? sched_clock_cpu+0x114/0x180
> > [12295.974770]  bprm_execve+0x3ce/0x920
> > [12

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-09 Thread Filipe Manana



On 09/11/20 08:44, Boqun Feng wrote:
> Hi Filipe,
> 
> On Thu, Nov 05, 2020 at 09:10:12AM +0800, Boqun Feng wrote:
>> On Wed, Nov 04, 2020 at 07:54:40PM +, Filipe Manana wrote:
>> [...]
>>>
>>> Ok, so I ran 5.10-rc2 plus your two patches (the fix and the debug one):
>>>
>>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
>>> index b71ad8d9f1c9..b31d4ad482c7 100644
>>> --- a/kernel/locking/lockdep.c
>>> +++ b/kernel/locking/lockdep.c
>>> @@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
>>> LOCK_TRACE_SIZE_IN_LONGS;
>>>
>>> if (max_entries <= 0) {
>>> -   if (!debug_locks_off_graph_unlock())
>>> +   if (!debug_locks_off_graph_unlock()) {
>>> +   WARN_ON_ONCE(1);
>>> return NULL;
>>> +   }
>>>
>>> print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
>>> dump_stack();
>>> @@ -5465,7 +5467,7 @@ noinstr int lock_is_held_type(const struct
>>> lockdep_map *lock, int read)
>>> unsigned long flags;
>>> int ret = 0;
>>>
>>> -   if (unlikely(!lockdep_enabled()))
>>> +   if (unlikely(debug_locks && !lockdep_enabled()))
>>> return 1; /* avoid false negative lockdep_assert_held() */
>>>
>>> raw_local_irq_save(flags);
>>>
>>> With 3 runs of all fstests, the WARN_ON_ONCE(1) wasn't triggered.
>>> Unexpected, right?
>>>
>>
>> Kinda, that means we still don't know why lockdep was turned off.
>>
>>> Should I try something else?
>>>
>>
>> Thanks for trying this. Let me set up the reproducer on my side, and see
>> if I could get more information.
>>
> 
> I could hit this with btrfs/187, and when we hit it, lockdep will report
> the deadlock and turn off, and I think this is the root cause for your
> hitting the original problem, I will add some analysis after the lockdep
> splat.
> 
> [12295.973309] 
> [12295.974770] WARNING: possible recursive locking detected
> [12295.974770] 5.10.0-rc2-btrfs-next-71 #20 Not tainted
> [12295.974770] 
> [12295.974770] zsh/701247 is trying to acquire lock:
> [12295.974770] 92cef43480b8 (&eb->lock){}-{2:2}, at: 
> btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> [12295.974770] 
>but task is already holding lock:
> [12295.974770] 92cef434a038 (&eb->lock){}-{2:2}, at: 
> btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> [12295.974770] 
>other info that might help us debug this:
> [12295.974770]  Possible unsafe locking scenario:
> 
> [12295.974770]CPU0
> [12295.974770]
> [12295.974770]   lock(&eb->lock);
> [12295.974770]   lock(&eb->lock);
> [12295.974770] 
> *** DEADLOCK ***
> 
> [12295.974770]  May be due to missing lock nesting notation
> 
> [12295.974770] 2 locks held by zsh/701247:
> [12295.974770]  #0: 92cef3d315b0 (&sig->cred_guard_mutex){+.+.}-{3:3}, 
> at: bprm_execve+0xaa/0x920
> [12295.974770]  #1: 92cef434a038 (&eb->lock){}-{2:2}, at: 
> btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> [12295.974770] 
>stack backtrace:
> [12295.974770] CPU: 6 PID: 701247 Comm: zsh Not tainted 
> 5.10.0-rc2-btrfs-next-71 #20
> [12295.974770] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
> Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
> [12295.974770] Call Trace:
> [12295.974770]  dump_stack+0x8b/0xb0
> [12295.974770]  __lock_acquire.cold+0x175/0x2e9
> [12295.974770]  lock_acquire+0x15b/0x490
> [12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> [12295.974770]  ? read_block_for_search+0xf4/0x350 [btrfs]
> [12295.974770]  _raw_read_lock+0x40/0xa0
> [12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> [12295.974770]  btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> [12295.974770]  btrfs_search_slot+0x6ac/0xca0 [btrfs]
> [12295.974770]  btrfs_lookup_xattr+0x7d/0xd0 [btrfs]
> [12295.974770]  btrfs_getxattr+0x67/0x130 [btrfs]
> [12295.974770]  __vfs_getxattr+0x53/0x70
> [12295.974770]  get_vfs_caps_from_disk+0x68/0x1a0
> [12295.974770]  ? sched_clock_cpu+0x114/0x180
> [12295.974770]  cap_bprm_creds_from_file+0x181/0x6c0
> [12295.974770]  security_bprm_creds_from_file+0x2a/0x40
> [12295.974770]  begin_new_exec+0xf4/0xc40
> [12295.974770]  ? load_elf_phdrs+0x6b/0xb0
> [12295.974770]  load_elf_binary+0x66b/0x1620
> [12295.974770]  ? read_hv_sched_clock_tsc+0x5/0x20
> [12295.974770]  ? sched_clock+0x5/0x10
> [12295.974770]  ? sched_clock_local+0x12/0x80
> [12295.974770]  ? sched_clock_cpu+0x114/0x180
> [12295.974770]  bprm_execve+0x3ce/0x920
> [12295.974770]  do_execveat_common+0x1b0/0x1f0
> [12295.974770]  __x64_sys_execve+0x39/0x50
> [12295.974770]  do_syscall_64+0x33/0x80
> [12295.974770]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [12295.974770] RIP: 0033:0x7f6aaefc13cb
> [12295.974770] Code: 48 3d 00 f0 ff ff 76 e7 f7 d8 64 41 89 00 eb df 

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-09 Thread Boqun Feng
Hi Filipe,

On Thu, Nov 05, 2020 at 09:10:12AM +0800, Boqun Feng wrote:
> On Wed, Nov 04, 2020 at 07:54:40PM +, Filipe Manana wrote:
> [...]
> > 
> > Ok, so I ran 5.10-rc2 plus your two patches (the fix and the debug one):
> > 
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index b71ad8d9f1c9..b31d4ad482c7 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
> > LOCK_TRACE_SIZE_IN_LONGS;
> > 
> > if (max_entries <= 0) {
> > -   if (!debug_locks_off_graph_unlock())
> > +   if (!debug_locks_off_graph_unlock()) {
> > +   WARN_ON_ONCE(1);
> > return NULL;
> > +   }
> > 
> > print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
> > dump_stack();
> > @@ -5465,7 +5467,7 @@ noinstr int lock_is_held_type(const struct
> > lockdep_map *lock, int read)
> > unsigned long flags;
> > int ret = 0;
> > 
> > -   if (unlikely(!lockdep_enabled()))
> > +   if (unlikely(debug_locks && !lockdep_enabled()))
> > return 1; /* avoid false negative lockdep_assert_held() */
> > 
> > raw_local_irq_save(flags);
> > 
> > With 3 runs of all fstests, the WARN_ON_ONCE(1) wasn't triggered.
> > Unexpected, right?
> > 
> 
> Kinda, that means we still don't know why lockdep was turned off.
> 
> > Should I try something else?
> > 
> 
> Thanks for trying this. Let me set up the reproducer on my side, and see
> if I could get more information.
> 

I could hit this with btrfs/187, and when we hit it, lockdep will report
the deadlock and turn off, and I think this is the root cause for your
hitting the original problem, I will add some analysis after the lockdep
splat.

[12295.973309] 
[12295.974770] WARNING: possible recursive locking detected
[12295.974770] 5.10.0-rc2-btrfs-next-71 #20 Not tainted
[12295.974770] 
[12295.974770] zsh/701247 is trying to acquire lock:
[12295.974770] 92cef43480b8 (&eb->lock){}-{2:2}, at: 
btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
[12295.974770] 
   but task is already holding lock:
[12295.974770] 92cef434a038 (&eb->lock){}-{2:2}, at: 
btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
[12295.974770] 
   other info that might help us debug this:
[12295.974770]  Possible unsafe locking scenario:

[12295.974770]CPU0
[12295.974770]
[12295.974770]   lock(&eb->lock);
[12295.974770]   lock(&eb->lock);
[12295.974770] 
*** DEADLOCK ***

[12295.974770]  May be due to missing lock nesting notation

[12295.974770] 2 locks held by zsh/701247:
[12295.974770]  #0: 92cef3d315b0 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: 
bprm_execve+0xaa/0x920
[12295.974770]  #1: 92cef434a038 (&eb->lock){}-{2:2}, at: 
btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
[12295.974770] 
   stack backtrace:
[12295.974770] CPU: 6 PID: 701247 Comm: zsh Not tainted 
5.10.0-rc2-btrfs-next-71 #20
[12295.974770] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
[12295.974770] Call Trace:
[12295.974770]  dump_stack+0x8b/0xb0
[12295.974770]  __lock_acquire.cold+0x175/0x2e9
[12295.974770]  lock_acquire+0x15b/0x490
[12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
[12295.974770]  ? read_block_for_search+0xf4/0x350 [btrfs]
[12295.974770]  _raw_read_lock+0x40/0xa0
[12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
[12295.974770]  btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
[12295.974770]  btrfs_search_slot+0x6ac/0xca0 [btrfs]
[12295.974770]  btrfs_lookup_xattr+0x7d/0xd0 [btrfs]
[12295.974770]  btrfs_getxattr+0x67/0x130 [btrfs]
[12295.974770]  __vfs_getxattr+0x53/0x70
[12295.974770]  get_vfs_caps_from_disk+0x68/0x1a0
[12295.974770]  ? sched_clock_cpu+0x114/0x180
[12295.974770]  cap_bprm_creds_from_file+0x181/0x6c0
[12295.974770]  security_bprm_creds_from_file+0x2a/0x40
[12295.974770]  begin_new_exec+0xf4/0xc40
[12295.974770]  ? load_elf_phdrs+0x6b/0xb0
[12295.974770]  load_elf_binary+0x66b/0x1620
[12295.974770]  ? read_hv_sched_clock_tsc+0x5/0x20
[12295.974770]  ? sched_clock+0x5/0x10
[12295.974770]  ? sched_clock_local+0x12/0x80
[12295.974770]  ? sched_clock_cpu+0x114/0x180
[12295.974770]  bprm_execve+0x3ce/0x920
[12295.974770]  do_execveat_common+0x1b0/0x1f0
[12295.974770]  __x64_sys_execve+0x39/0x50
[12295.974770]  do_syscall_64+0x33/0x80
[12295.974770]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[12295.974770] RIP: 0033:0x7f6aaefc13cb
[12295.974770] Code: 48 3d 00 f0 ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 
00 00 00 00 f7 d8 64 41 89 00 eb dc f3 0f 1e fa b8 3b 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 75 4a 0f 00 f7 d8 64 89 01 48
[12295.974770] RSP: 002b:7ffd33b54

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-04 Thread Boqun Feng
On Wed, Nov 04, 2020 at 07:54:40PM +, Filipe Manana wrote:
[...]
> 
> Ok, so I ran 5.10-rc2 plus your two patches (the fix and the debug one):
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index b71ad8d9f1c9..b31d4ad482c7 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
> LOCK_TRACE_SIZE_IN_LONGS;
> 
> if (max_entries <= 0) {
> -   if (!debug_locks_off_graph_unlock())
> +   if (!debug_locks_off_graph_unlock()) {
> +   WARN_ON_ONCE(1);
> return NULL;
> +   }
> 
> print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
> dump_stack();
> @@ -5465,7 +5467,7 @@ noinstr int lock_is_held_type(const struct
> lockdep_map *lock, int read)
> unsigned long flags;
> int ret = 0;
> 
> -   if (unlikely(!lockdep_enabled()))
> +   if (unlikely(debug_locks && !lockdep_enabled()))
> return 1; /* avoid false negative lockdep_assert_held() */
> 
> raw_local_irq_save(flags);
> 
> With 3 runs of all fstests, the WARN_ON_ONCE(1) wasn't triggered.
> Unexpected, right?
> 

Kinda, that means we still don't know why lockdep was turned off.

> Should I try something else?
> 

Thanks for trying this. Let me set up the reproducer on my side, and see
if I could get more information.

Regards,
Boqun

> Thanks.
> 
> 
> > 
> > Thanks!
> > 
> >>
> >> Alternatively, it's also helpful if you can try the following debug
> >> diff, with teh full set of xfstests:
> >>
> >> Thanks! Just trying to understand the real problem.
> >>
> >> Regards,
> >> Boqun
> >>
> >> -->8
> >> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> >> index b71ad8d9f1c9..9ae3e089e5c0 100644
> >> --- a/kernel/locking/lockdep.c
> >> +++ b/kernel/locking/lockdep.c
> >> @@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
> >>LOCK_TRACE_SIZE_IN_LONGS;
> >>  
> >>if (max_entries <= 0) {
> >> -  if (!debug_locks_off_graph_unlock())
> >> +  if (!debug_locks_off_graph_unlock()) {
> >> +  WARN_ON_ONCE(1);
> >>return NULL;
> >> +  }
> >>  
> >>print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
> >>dump_stack();
> >>
> >>> I guess I will have to reproduce this myself for further analysis, could
> >>> you share you .config?
> >>>
> >>> Anyway, I think this fix still makes a bit sense, I will send a proper
> >>> patch so that the problem won't block fs folks.
> >>>
> >>> Regards,
> >>> Boqun
> >>>
>  Thanks!
> 
> >
> >
> >
> >> What happens is percpu_rwsem_is_held() is apparently returning a false
> >> positive, so this makes __sb_start_write() do a
> >> percpu_down_read_trylock() on a percpu_rw_sem at a higher level, which
> >> is expected to always succeed, because if the calling task is holding a
> >> freeze percpu_rw_sem at level 1, it's supposed to be able to try_lock
> >> the semaphore at level 2, since the freeze semaphores are always
> >> acquired by increasing level order.
> >>
> >> But the try_lock fails, it triggers the warning at __sb_start_write(),
> >> then its caller sb_start_pagefault() ignores the return value and
> >> callers such as btrfs_page_mkwrite() make the assumption the freeze
> >> semaphore was taken, proceed to do their stuff, and later call
> >> sb_end_pagefault(), which which will do an up_read() on the 
> >> percpu_rwsem
> >> at level 2 despite not having not been able to down_read() the
> >> semaphore. This obviously corrupts the semaphore's read_count state, 
> >> and
> >> later causes any task trying to down_write() it to hang forever.
> >>
> >> After such a hang I ran a drgn script to confirm it:
> >>
> >> $ cat dump_freeze_sems.py
> >> import sys
> >> import drgn
> >> from drgn import NULL, Object, cast, container_of, execscript, \
> >> reinterpret, sizeof
> >> from drgn.helpers.linux import *
> >>
> >> mnt_path = b'/home/fdmanana/btrfs-tests/scratch_1'
> >>
> >> mnt = None
> >> for mnt in for_each_mount(prog, dst = mnt_path):
> >> pass
> >>
> >> if mnt is None:
> >> sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
> >> sys.exit(1)
> >>
> >> def dump_sem(level_enum):
> >> level = level_enum.value_()
> >> sem = mnt.mnt.mnt_sb.s_writers.rw_sem[level - 1]
> >> print(f'freeze semaphore at level {level}, {str(level_enum)}')
> >> print(f'block {sem.block.counter.value_()}')
> >> for i in for_each_possible_cpu(prog):
> >> read_count = per_cpu_ptr(sem.read_count, i)
> >> print(f'read_count at cpu {i} = {read_count}')
> >> print()

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-04 Thread Filipe Manana



On 04/11/20 09:49, Filipe Manana wrote:
> 
> 
> On 04/11/20 03:44, Boqun Feng wrote:
>> On Wed, Nov 04, 2020 at 10:22:36AM +0800, Boqun Feng wrote:
>>> On Tue, Nov 03, 2020 at 07:44:29PM +, Filipe Manana wrote:


 On 03/11/20 14:08, Boqun Feng wrote:
> Hi Filipe,
>
> On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
>> Hello,
>>
>> I've recently started to hit a warning followed by tasks hanging after
>> attempts to freeze a filesystem. A git bisection pointed to the
>> following commit:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra 
>> Date:   Fri Oct 2 11:04:21 2020 +0200
>>
>> lockdep: Fix lockdep recursion
>>
>> This happens very reliably when running all xfstests with lockdep
>> enabled, and the tested filesystem is btrfs (haven't tried other
>> filesystems, but it shouldn't matter). The warning and task hangs always
>> happen at either test generic/068 or test generic/390, and (oddly)
>> always have to run all tests for it to trigger, running those tests
>> individually on an infinite loop doesn't seem to trigger it (at least
>> for a couple hours).
>>
>> The warning triggered is at fs/super.c:__sb_start_write() which always
>> results later in several tasks hanging on a percpu rw_sem:
>>
>> https://pastebin.com/qnLvf94E
>>
>
> In your dmesg, I see line:
>
>   [ 9304.920151] INFO: lockdep is turned off.
>
> , that means debug_locks is 0, that usually happens when lockdep find a
> problem (i.e. a deadlock) and it turns itself off, because a problem is
> found and it's pointless for lockdep to continue to run.
>
> And I haven't found a lockdep splat in your dmesg, do you have a full
> dmesg so that I can have a look?
>
> This may be relevant because in commit 4d004099a66, we have
>
>   @@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
> lockdep_map *lock, int read)
>   unsigned long flags;
>   int ret = 0;
>
>   -   if (unlikely(current->lockdep_recursion))
>   +   if (unlikely(!lockdep_enabled()))
>   return 1; /* avoid false negative lockdep_assert_held() 
> */
>
> before this commit lock_is_held_type() and its friends may return false
> if debug_locks==0, after this commit lock_is_held_type() and its friends
> will always return true if debug_locks == 0. That could cause the
> behavior here.
>
> In case I'm correct, the following "fix" may be helpful. 
>
> Regards,
> Boqun
>
> --8
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 3e99dfef8408..c0e27fb949ff 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct 
> lockdep_map *lock, int read)
>   unsigned long flags;
>   int ret = 0;
>  
> - if (unlikely(!lockdep_enabled()))
> + if (unlikely(debug_locks && !lockdep_enabled()))
>   return 1; /* avoid false negative lockdep_assert_held() */
>  
>   raw_local_irq_save(flags);

 Boqun, the patch fixes the problem for me!
 You can have Tested-by: Filipe Manana 

>>>
>>> Thanks. Although I think it still means that we have a lock issue when
>>> running xfstests (because we don't know why debug_locks gets cleared),
>>
>> I might find a place where we could turn lockdep off silently:
>>
>> in print_circular_bug(), we turn off lockdep via
>> debug_locks_off_graph_unlock(), and then we try to save the trace for
>> lockdep splat, however, if we use up the stack_trace buffer (i.e.
>> nr_stack_trace_entries), save_trace() will return NULL and we return
>> silently.
>>
>> Filipe, in order to check whethter that happens, could you share me your
>> /proc/lockdep_stats after the full set of xfstests is finished?
> 
> Here it is:
> 
> $ cat /proc/lockdep_stats
>  lock-classes: 1831 [max: 8192]
>  direct dependencies: 17774 [max: 32768]
>  indirect dependencies:   75662
>  all direct dependencies:325284
>  dependency chains:   34223 [max: 65536]
>  dependency chain hlocks used:   158129 [max: 327680]
>  dependency chain hlocks lost:0
>  in-hardirq chains:  57
>  in-softirq chains: 658
>  in-process chains:   33508
>  stack-trace entries:160748 [max: 524288]
>  number of stack traces:   9237
>  number of stack hash chains:  7076
>  combined max dependencies:  1280780998
>  hardirq-safe locks: 43
>  hardirq-unsafe locks: 1337
>  softirq-safe locks:179
>  softirq-unsafe locks: 1236
>  irq-safe lo

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-04 Thread Filipe Manana



On 04/11/20 03:44, Boqun Feng wrote:
> On Wed, Nov 04, 2020 at 10:22:36AM +0800, Boqun Feng wrote:
>> On Tue, Nov 03, 2020 at 07:44:29PM +, Filipe Manana wrote:
>>>
>>>
>>> On 03/11/20 14:08, Boqun Feng wrote:
 Hi Filipe,

 On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
> Hello,
>
> I've recently started to hit a warning followed by tasks hanging after
> attempts to freeze a filesystem. A git bisection pointed to the
> following commit:
>
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra 
> Date:   Fri Oct 2 11:04:21 2020 +0200
>
> lockdep: Fix lockdep recursion
>
> This happens very reliably when running all xfstests with lockdep
> enabled, and the tested filesystem is btrfs (haven't tried other
> filesystems, but it shouldn't matter). The warning and task hangs always
> happen at either test generic/068 or test generic/390, and (oddly)
> always have to run all tests for it to trigger, running those tests
> individually on an infinite loop doesn't seem to trigger it (at least
> for a couple hours).
>
> The warning triggered is at fs/super.c:__sb_start_write() which always
> results later in several tasks hanging on a percpu rw_sem:
>
> https://pastebin.com/qnLvf94E
>

 In your dmesg, I see line:

[ 9304.920151] INFO: lockdep is turned off.

 , that means debug_locks is 0, that usually happens when lockdep find a
 problem (i.e. a deadlock) and it turns itself off, because a problem is
 found and it's pointless for lockdep to continue to run.

 And I haven't found a lockdep splat in your dmesg, do you have a full
 dmesg so that I can have a look?

 This may be relevant because in commit 4d004099a66, we have

@@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
 lockdep_map *lock, int read)
unsigned long flags;
int ret = 0;

-   if (unlikely(current->lockdep_recursion))
+   if (unlikely(!lockdep_enabled()))
return 1; /* avoid false negative lockdep_assert_held() 
 */

 before this commit lock_is_held_type() and its friends may return false
 if debug_locks==0, after this commit lock_is_held_type() and its friends
 will always return true if debug_locks == 0. That could cause the
 behavior here.

 In case I'm correct, the following "fix" may be helpful. 

 Regards,
 Boqun

 --8
 diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
 index 3e99dfef8408..c0e27fb949ff 100644
 --- a/kernel/locking/lockdep.c
 +++ b/kernel/locking/lockdep.c
 @@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct 
 lockdep_map *lock, int read)
unsigned long flags;
int ret = 0;
  
 -  if (unlikely(!lockdep_enabled()))
 +  if (unlikely(debug_locks && !lockdep_enabled()))
return 1; /* avoid false negative lockdep_assert_held() */
  
raw_local_irq_save(flags);
>>>
>>> Boqun, the patch fixes the problem for me!
>>> You can have Tested-by: Filipe Manana 
>>>
>>
>> Thanks. Although I think it still means that we have a lock issue when
>> running xfstests (because we don't know why debug_locks gets cleared),
> 
> I might find a place where we could turn lockdep off silently:
> 
> in print_circular_bug(), we turn off lockdep via
> debug_locks_off_graph_unlock(), and then we try to save the trace for
> lockdep splat, however, if we use up the stack_trace buffer (i.e.
> nr_stack_trace_entries), save_trace() will return NULL and we return
> silently.
> 
> Filipe, in order to check whethter that happens, could you share me your
> /proc/lockdep_stats after the full set of xfstests is finished?

Here it is:

$ cat /proc/lockdep_stats
 lock-classes: 1831 [max: 8192]
 direct dependencies: 17774 [max: 32768]
 indirect dependencies:   75662
 all direct dependencies:325284
 dependency chains:   34223 [max: 65536]
 dependency chain hlocks used:   158129 [max: 327680]
 dependency chain hlocks lost:0
 in-hardirq chains:  57
 in-softirq chains: 658
 in-process chains:   33508
 stack-trace entries:160748 [max: 524288]
 number of stack traces:   9237
 number of stack hash chains:  7076
 combined max dependencies:  1280780998
 hardirq-safe locks: 43
 hardirq-unsafe locks: 1337
 softirq-safe locks:179
 softirq-unsafe locks: 1236
 irq-safe locks:187
 irq-unsafe locks: 1337
 hardirq-read-safe locks: 2
 hardirq-read-unsafe locks: 209
 softirq-read-

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Boqun Feng
On Wed, Nov 04, 2020 at 10:22:36AM +0800, Boqun Feng wrote:
> On Tue, Nov 03, 2020 at 07:44:29PM +, Filipe Manana wrote:
> > 
> > 
> > On 03/11/20 14:08, Boqun Feng wrote:
> > > Hi Filipe,
> > > 
> > > On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
> > >> Hello,
> > >>
> > >> I've recently started to hit a warning followed by tasks hanging after
> > >> attempts to freeze a filesystem. A git bisection pointed to the
> > >> following commit:
> > >>
> > >> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> > >> Author: Peter Zijlstra 
> > >> Date:   Fri Oct 2 11:04:21 2020 +0200
> > >>
> > >> lockdep: Fix lockdep recursion
> > >>
> > >> This happens very reliably when running all xfstests with lockdep
> > >> enabled, and the tested filesystem is btrfs (haven't tried other
> > >> filesystems, but it shouldn't matter). The warning and task hangs always
> > >> happen at either test generic/068 or test generic/390, and (oddly)
> > >> always have to run all tests for it to trigger, running those tests
> > >> individually on an infinite loop doesn't seem to trigger it (at least
> > >> for a couple hours).
> > >>
> > >> The warning triggered is at fs/super.c:__sb_start_write() which always
> > >> results later in several tasks hanging on a percpu rw_sem:
> > >>
> > >> https://pastebin.com/qnLvf94E
> > >>
> > > 
> > > In your dmesg, I see line:
> > > 
> > >   [ 9304.920151] INFO: lockdep is turned off.
> > > 
> > > , that means debug_locks is 0, that usually happens when lockdep find a
> > > problem (i.e. a deadlock) and it turns itself off, because a problem is
> > > found and it's pointless for lockdep to continue to run.
> > > 
> > > And I haven't found a lockdep splat in your dmesg, do you have a full
> > > dmesg so that I can have a look?
> > > 
> > > This may be relevant because in commit 4d004099a66, we have
> > > 
> > >   @@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
> > > lockdep_map *lock, int read)
> > >   unsigned long flags;
> > >   int ret = 0;
> > > 
> > >   -   if (unlikely(current->lockdep_recursion))
> > >   +   if (unlikely(!lockdep_enabled()))
> > >   return 1; /* avoid false negative lockdep_assert_held() 
> > > */
> > > 
> > > before this commit lock_is_held_type() and its friends may return false
> > > if debug_locks==0, after this commit lock_is_held_type() and its friends
> > > will always return true if debug_locks == 0. That could cause the
> > > behavior here.
> > > 
> > > In case I'm correct, the following "fix" may be helpful. 
> > > 
> > > Regards,
> > > Boqun
> > > 
> > > --8
> > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > > index 3e99dfef8408..c0e27fb949ff 100644
> > > --- a/kernel/locking/lockdep.c
> > > +++ b/kernel/locking/lockdep.c
> > > @@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct 
> > > lockdep_map *lock, int read)
> > >   unsigned long flags;
> > >   int ret = 0;
> > >  
> > > - if (unlikely(!lockdep_enabled()))
> > > + if (unlikely(debug_locks && !lockdep_enabled()))
> > >   return 1; /* avoid false negative lockdep_assert_held() */
> > >  
> > >   raw_local_irq_save(flags);
> > 
> > Boqun, the patch fixes the problem for me!
> > You can have Tested-by: Filipe Manana 
> > 
> 
> Thanks. Although I think it still means that we have a lock issue when
> running xfstests (because we don't know why debug_locks gets cleared),

I might find a place where we could turn lockdep off silently:

in print_circular_bug(), we turn off lockdep via
debug_locks_off_graph_unlock(), and then we try to save the trace for
lockdep splat, however, if we use up the stack_trace buffer (i.e.
nr_stack_trace_entries), save_trace() will return NULL and we return
silently.

Filipe, in order to check whethter that happens, could you share me your
/proc/lockdep_stats after the full set of xfstests is finished?

Alternatively, it's also helpful if you can try the following debug
diff, with teh full set of xfstests:

Thanks! Just trying to understand the real problem.

Regards,
Boqun

-->8
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b71ad8d9f1c9..9ae3e089e5c0 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
LOCK_TRACE_SIZE_IN_LONGS;
 
if (max_entries <= 0) {
-   if (!debug_locks_off_graph_unlock())
+   if (!debug_locks_off_graph_unlock()) {
+   WARN_ON_ONCE(1);
return NULL;
+   }
 
print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
dump_stack();

> I guess I will have to reproduce this myself for further analysis, could
> you share you .config?
> 
> Anyway, I think this fix still makes a bit sense, I will send a proper
> patch so that the problem won't block fs folks.
> 
> Regards,
> Boqun
> 
> > Thanks!

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Boqun Feng
On Tue, Nov 03, 2020 at 07:44:29PM +, Filipe Manana wrote:
> 
> 
> On 03/11/20 14:08, Boqun Feng wrote:
> > Hi Filipe,
> > 
> > On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
> >> Hello,
> >>
> >> I've recently started to hit a warning followed by tasks hanging after
> >> attempts to freeze a filesystem. A git bisection pointed to the
> >> following commit:
> >>
> >> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> >> Author: Peter Zijlstra 
> >> Date:   Fri Oct 2 11:04:21 2020 +0200
> >>
> >> lockdep: Fix lockdep recursion
> >>
> >> This happens very reliably when running all xfstests with lockdep
> >> enabled, and the tested filesystem is btrfs (haven't tried other
> >> filesystems, but it shouldn't matter). The warning and task hangs always
> >> happen at either test generic/068 or test generic/390, and (oddly)
> >> always have to run all tests for it to trigger, running those tests
> >> individually on an infinite loop doesn't seem to trigger it (at least
> >> for a couple hours).
> >>
> >> The warning triggered is at fs/super.c:__sb_start_write() which always
> >> results later in several tasks hanging on a percpu rw_sem:
> >>
> >> https://pastebin.com/qnLvf94E
> >>
> > 
> > In your dmesg, I see line:
> > 
> > [ 9304.920151] INFO: lockdep is turned off.
> > 
> > , that means debug_locks is 0, that usually happens when lockdep find a
> > problem (i.e. a deadlock) and it turns itself off, because a problem is
> > found and it's pointless for lockdep to continue to run.
> > 
> > And I haven't found a lockdep splat in your dmesg, do you have a full
> > dmesg so that I can have a look?
> > 
> > This may be relevant because in commit 4d004099a66, we have
> > 
> > @@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
> > lockdep_map *lock, int read)
> > unsigned long flags;
> > int ret = 0;
> > 
> > -   if (unlikely(current->lockdep_recursion))
> > +   if (unlikely(!lockdep_enabled()))
> > return 1; /* avoid false negative lockdep_assert_held() 
> > */
> > 
> > before this commit lock_is_held_type() and its friends may return false
> > if debug_locks==0, after this commit lock_is_held_type() and its friends
> > will always return true if debug_locks == 0. That could cause the
> > behavior here.
> > 
> > In case I'm correct, the following "fix" may be helpful. 
> > 
> > Regards,
> > Boqun
> > 
> > --8
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 3e99dfef8408..c0e27fb949ff 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct 
> > lockdep_map *lock, int read)
> > unsigned long flags;
> > int ret = 0;
> >  
> > -   if (unlikely(!lockdep_enabled()))
> > +   if (unlikely(debug_locks && !lockdep_enabled()))
> > return 1; /* avoid false negative lockdep_assert_held() */
> >  
> > raw_local_irq_save(flags);
> 
> Boqun, the patch fixes the problem for me!
> You can have Tested-by: Filipe Manana 
> 

Thanks. Although I think it still means that we have a lock issue when
running xfstests (because we don't know why debug_locks gets cleared),
I guess I will have to reproduce this myself for further analysis, could
you share you .config?

Anyway, I think this fix still makes a bit sense, I will send a proper
patch so that the problem won't block fs folks.

Regards,
Boqun

> Thanks!
> 
> > 
> > 
> > 
> >> What happens is percpu_rwsem_is_held() is apparently returning a false
> >> positive, so this makes __sb_start_write() do a
> >> percpu_down_read_trylock() on a percpu_rw_sem at a higher level, which
> >> is expected to always succeed, because if the calling task is holding a
> >> freeze percpu_rw_sem at level 1, it's supposed to be able to try_lock
> >> the semaphore at level 2, since the freeze semaphores are always
> >> acquired by increasing level order.
> >>
> >> But the try_lock fails, it triggers the warning at __sb_start_write(),
> >> then its caller sb_start_pagefault() ignores the return value and
> >> callers such as btrfs_page_mkwrite() make the assumption the freeze
> >> semaphore was taken, proceed to do their stuff, and later call
> >> sb_end_pagefault(), which which will do an up_read() on the percpu_rwsem
> >> at level 2 despite not having not been able to down_read() the
> >> semaphore. This obviously corrupts the semaphore's read_count state, and
> >> later causes any task trying to down_write() it to hang forever.
> >>
> >> After such a hang I ran a drgn script to confirm it:
> >>
> >> $ cat dump_freeze_sems.py
> >> import sys
> >> import drgn
> >> from drgn import NULL, Object, cast, container_of, execscript, \
> >> reinterpret, sizeof
> >> from drgn.helpers.linux import *
> >>
> >> mnt_path = b'/home/fdmanana/btrfs-tests/scratch_1'
> >>
> >> mnt = None
> >> for mnt in for_each_mount(prog, dst = mnt_path):
> >> pass
> >>
> >> if mnt 

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Filipe Manana



On 03/11/20 14:08, Boqun Feng wrote:
> Hi Filipe,
> 
> On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
>> Hello,
>>
>> I've recently started to hit a warning followed by tasks hanging after
>> attempts to freeze a filesystem. A git bisection pointed to the
>> following commit:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra 
>> Date:   Fri Oct 2 11:04:21 2020 +0200
>>
>> lockdep: Fix lockdep recursion
>>
>> This happens very reliably when running all xfstests with lockdep
>> enabled, and the tested filesystem is btrfs (haven't tried other
>> filesystems, but it shouldn't matter). The warning and task hangs always
>> happen at either test generic/068 or test generic/390, and (oddly)
>> always have to run all tests for it to trigger, running those tests
>> individually on an infinite loop doesn't seem to trigger it (at least
>> for a couple hours).
>>
>> The warning triggered is at fs/super.c:__sb_start_write() which always
>> results later in several tasks hanging on a percpu rw_sem:
>>
>> https://pastebin.com/qnLvf94E
>>
> 
> In your dmesg, I see line:
> 
>   [ 9304.920151] INFO: lockdep is turned off.
> 
> , that means debug_locks is 0, that usually happens when lockdep find a
> problem (i.e. a deadlock) and it turns itself off, because a problem is
> found and it's pointless for lockdep to continue to run.
> 
> And I haven't found a lockdep splat in your dmesg, do you have a full
> dmesg so that I can have a look?
> 
> This may be relevant because in commit 4d004099a66, we have
> 
>   @@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
> lockdep_map *lock, int read)
>   unsigned long flags;
>   int ret = 0;
> 
>   -   if (unlikely(current->lockdep_recursion))
>   +   if (unlikely(!lockdep_enabled()))
>   return 1; /* avoid false negative lockdep_assert_held() 
> */
> 
> before this commit lock_is_held_type() and its friends may return false
> if debug_locks==0, after this commit lock_is_held_type() and its friends
> will always return true if debug_locks == 0. That could cause the
> behavior here.
> 
> In case I'm correct, the following "fix" may be helpful. 
> 
> Regards,
> Boqun
> 
> --8
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 3e99dfef8408..c0e27fb949ff 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct lockdep_map 
> *lock, int read)
>   unsigned long flags;
>   int ret = 0;
>  
> - if (unlikely(!lockdep_enabled()))
> + if (unlikely(debug_locks && !lockdep_enabled()))
>   return 1; /* avoid false negative lockdep_assert_held() */
>  
>   raw_local_irq_save(flags);

Boqun, the patch fixes the problem for me!
You can have Tested-by: Filipe Manana 

Thanks!

> 
> 
> 
>> What happens is percpu_rwsem_is_held() is apparently returning a false
>> positive, so this makes __sb_start_write() do a
>> percpu_down_read_trylock() on a percpu_rw_sem at a higher level, which
>> is expected to always succeed, because if the calling task is holding a
>> freeze percpu_rw_sem at level 1, it's supposed to be able to try_lock
>> the semaphore at level 2, since the freeze semaphores are always
>> acquired by increasing level order.
>>
>> But the try_lock fails, it triggers the warning at __sb_start_write(),
>> then its caller sb_start_pagefault() ignores the return value and
>> callers such as btrfs_page_mkwrite() make the assumption the freeze
>> semaphore was taken, proceed to do their stuff, and later call
>> sb_end_pagefault(), which which will do an up_read() on the percpu_rwsem
>> at level 2 despite not having not been able to down_read() the
>> semaphore. This obviously corrupts the semaphore's read_count state, and
>> later causes any task trying to down_write() it to hang forever.
>>
>> After such a hang I ran a drgn script to confirm it:
>>
>> $ cat dump_freeze_sems.py
>> import sys
>> import drgn
>> from drgn import NULL, Object, cast, container_of, execscript, \
>> reinterpret, sizeof
>> from drgn.helpers.linux import *
>>
>> mnt_path = b'/home/fdmanana/btrfs-tests/scratch_1'
>>
>> mnt = None
>> for mnt in for_each_mount(prog, dst = mnt_path):
>> pass
>>
>> if mnt is None:
>> sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
>> sys.exit(1)
>>
>> def dump_sem(level_enum):
>> level = level_enum.value_()
>> sem = mnt.mnt.mnt_sb.s_writers.rw_sem[level - 1]
>> print(f'freeze semaphore at level {level}, {str(level_enum)}')
>> print(f'block {sem.block.counter.value_()}')
>> for i in for_each_possible_cpu(prog):
>> read_count = per_cpu_ptr(sem.read_count, i)
>> print(f'read_count at cpu {i} = {read_count}')
>> print()
>>
>> # dump semaphore read counts for all freeze levels (fs.h)
>> dump_sem(prog['SB_FREEZE_WRITE'])
>> dump_sem(prog['SB_FREEZE_PAG

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Filipe Manana



On 03/11/20 14:08, Boqun Feng wrote:
> Hi Filipe,
> 
> On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
>> Hello,
>>
>> I've recently started to hit a warning followed by tasks hanging after
>> attempts to freeze a filesystem. A git bisection pointed to the
>> following commit:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra 
>> Date:   Fri Oct 2 11:04:21 2020 +0200
>>
>> lockdep: Fix lockdep recursion
>>
>> This happens very reliably when running all xfstests with lockdep
>> enabled, and the tested filesystem is btrfs (haven't tried other
>> filesystems, but it shouldn't matter). The warning and task hangs always
>> happen at either test generic/068 or test generic/390, and (oddly)
>> always have to run all tests for it to trigger, running those tests
>> individually on an infinite loop doesn't seem to trigger it (at least
>> for a couple hours).
>>
>> The warning triggered is at fs/super.c:__sb_start_write() which always
>> results later in several tasks hanging on a percpu rw_sem:
>>
>> https://pastebin.com/qnLvf94E
>>
> 
> In your dmesg, I see line:
> 
>   [ 9304.920151] INFO: lockdep is turned off.
> 
> , that means debug_locks is 0, that usually happens when lockdep find a
> problem (i.e. a deadlock) and it turns itself off, because a problem is
> found and it's pointless for lockdep to continue to run.
> 
> And I haven't found a lockdep splat in your dmesg, do you have a full
> dmesg so that I can have a look?

Everytime I run into the issue it produces similar results. It always
starts with the WARN_ON() at __sb_start_write(), followed by several
"hung task" traces, and then some time after lockdep is turned off.

> 
> This may be relevant because in commit 4d004099a66, we have
> 
>   @@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
> lockdep_map *lock, int read)
>   unsigned long flags;
>   int ret = 0;
> 
>   -   if (unlikely(current->lockdep_recursion))
>   +   if (unlikely(!lockdep_enabled()))
>   return 1; /* avoid false negative lockdep_assert_held() 
> */
> 
> before this commit lock_is_held_type() and its friends may return false
> if debug_locks==0, after this commit lock_is_held_type() and its friends
> will always return true if debug_locks == 0. That could cause the
> behavior here.
> 
> In case I'm correct, the following "fix" may be helpful. 

I'll try it and let you now.
Thanks!

> 
> Regards,
> Boqun
> 
> --8
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 3e99dfef8408..c0e27fb949ff 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct lockdep_map 
> *lock, int read)
>   unsigned long flags;
>   int ret = 0;
>  
> - if (unlikely(!lockdep_enabled()))
> + if (unlikely(debug_locks && !lockdep_enabled()))
>   return 1; /* avoid false negative lockdep_assert_held() */
>  
>   raw_local_irq_save(flags);
> 
> 
> 
>> What happens is percpu_rwsem_is_held() is apparently returning a false
>> positive, so this makes __sb_start_write() do a
>> percpu_down_read_trylock() on a percpu_rw_sem at a higher level, which
>> is expected to always succeed, because if the calling task is holding a
>> freeze percpu_rw_sem at level 1, it's supposed to be able to try_lock
>> the semaphore at level 2, since the freeze semaphores are always
>> acquired by increasing level order.
>>
>> But the try_lock fails, it triggers the warning at __sb_start_write(),
>> then its caller sb_start_pagefault() ignores the return value and
>> callers such as btrfs_page_mkwrite() make the assumption the freeze
>> semaphore was taken, proceed to do their stuff, and later call
>> sb_end_pagefault(), which which will do an up_read() on the percpu_rwsem
>> at level 2 despite not having not been able to down_read() the
>> semaphore. This obviously corrupts the semaphore's read_count state, and
>> later causes any task trying to down_write() it to hang forever.
>>
>> After such a hang I ran a drgn script to confirm it:
>>
>> $ cat dump_freeze_sems.py
>> import sys
>> import drgn
>> from drgn import NULL, Object, cast, container_of, execscript, \
>> reinterpret, sizeof
>> from drgn.helpers.linux import *
>>
>> mnt_path = b'/home/fdmanana/btrfs-tests/scratch_1'
>>
>> mnt = None
>> for mnt in for_each_mount(prog, dst = mnt_path):
>> pass
>>
>> if mnt is None:
>> sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
>> sys.exit(1)
>>
>> def dump_sem(level_enum):
>> level = level_enum.value_()
>> sem = mnt.mnt.mnt_sb.s_writers.rw_sem[level - 1]
>> print(f'freeze semaphore at level {level}, {str(level_enum)}')
>> print(f'block {sem.block.counter.value_()}')
>> for i in for_each_possible_cpu(prog):
>> read_count = per_cpu_ptr(sem.read_count, i)
>> print(f'read_count at cpu {i} = {read_c

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Boqun Feng
Hi Filipe,

On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
> Hello,
> 
> I've recently started to hit a warning followed by tasks hanging after
> attempts to freeze a filesystem. A git bisection pointed to the
> following commit:
> 
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra 
> Date:   Fri Oct 2 11:04:21 2020 +0200
> 
> lockdep: Fix lockdep recursion
> 
> This happens very reliably when running all xfstests with lockdep
> enabled, and the tested filesystem is btrfs (haven't tried other
> filesystems, but it shouldn't matter). The warning and task hangs always
> happen at either test generic/068 or test generic/390, and (oddly)
> always have to run all tests for it to trigger, running those tests
> individually on an infinite loop doesn't seem to trigger it (at least
> for a couple hours).
> 
> The warning triggered is at fs/super.c:__sb_start_write() which always
> results later in several tasks hanging on a percpu rw_sem:
> 
> https://pastebin.com/qnLvf94E
> 

In your dmesg, I see line:

[ 9304.920151] INFO: lockdep is turned off.

, that means debug_locks is 0, that usually happens when lockdep find a
problem (i.e. a deadlock) and it turns itself off, because a problem is
found and it's pointless for lockdep to continue to run.

And I haven't found a lockdep splat in your dmesg, do you have a full
dmesg so that I can have a look?

This may be relevant because in commit 4d004099a66, we have

@@ -5056,13 +5081,13 @@ noinstr int lock_is_held_type(const struct 
lockdep_map *lock, int read)
unsigned long flags;
int ret = 0;

-   if (unlikely(current->lockdep_recursion))
+   if (unlikely(!lockdep_enabled()))
return 1; /* avoid false negative lockdep_assert_held() 
*/

before this commit lock_is_held_type() and its friends may return false
if debug_locks==0, after this commit lock_is_held_type() and its friends
will always return true if debug_locks == 0. That could cause the
behavior here.

In case I'm correct, the following "fix" may be helpful. 

Regards,
Boqun

--8
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3e99dfef8408..c0e27fb949ff 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5471,7 +5464,7 @@ noinstr int lock_is_held_type(const struct lockdep_map 
*lock, int read)
unsigned long flags;
int ret = 0;
 
-   if (unlikely(!lockdep_enabled()))
+   if (unlikely(debug_locks && !lockdep_enabled()))
return 1; /* avoid false negative lockdep_assert_held() */
 
raw_local_irq_save(flags);



> What happens is percpu_rwsem_is_held() is apparently returning a false
> positive, so this makes __sb_start_write() do a
> percpu_down_read_trylock() on a percpu_rw_sem at a higher level, which
> is expected to always succeed, because if the calling task is holding a
> freeze percpu_rw_sem at level 1, it's supposed to be able to try_lock
> the semaphore at level 2, since the freeze semaphores are always
> acquired by increasing level order.
> 
> But the try_lock fails, it triggers the warning at __sb_start_write(),
> then its caller sb_start_pagefault() ignores the return value and
> callers such as btrfs_page_mkwrite() make the assumption the freeze
> semaphore was taken, proceed to do their stuff, and later call
> sb_end_pagefault(), which which will do an up_read() on the percpu_rwsem
> at level 2 despite not having not been able to down_read() the
> semaphore. This obviously corrupts the semaphore's read_count state, and
> later causes any task trying to down_write() it to hang forever.
> 
> After such a hang I ran a drgn script to confirm it:
> 
> $ cat dump_freeze_sems.py
> import sys
> import drgn
> from drgn import NULL, Object, cast, container_of, execscript, \
> reinterpret, sizeof
> from drgn.helpers.linux import *
> 
> mnt_path = b'/home/fdmanana/btrfs-tests/scratch_1'
> 
> mnt = None
> for mnt in for_each_mount(prog, dst = mnt_path):
> pass
> 
> if mnt is None:
> sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
> sys.exit(1)
> 
> def dump_sem(level_enum):
> level = level_enum.value_()
> sem = mnt.mnt.mnt_sb.s_writers.rw_sem[level - 1]
> print(f'freeze semaphore at level {level}, {str(level_enum)}')
> print(f'block {sem.block.counter.value_()}')
> for i in for_each_possible_cpu(prog):
> read_count = per_cpu_ptr(sem.read_count, i)
> print(f'read_count at cpu {i} = {read_count}')
> print()
> 
> # dump semaphore read counts for all freeze levels (fs.h)
> dump_sem(prog['SB_FREEZE_WRITE'])
> dump_sem(prog['SB_FREEZE_PAGEFAULT'])
> dump_sem(prog['SB_FREEZE_FS'])
> 
> 
> $ drgn dump_freeze_sems.py
> freeze semaphore at level 1, (enum )SB_FREEZE_WRITE
> block 1
> read_count at cpu 0 = *(unsigned int *)0xc2ec3ee00c74 = 3
> read_count at cpu 1 = *(unsigned int *)0xc2ec3f200c7

Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Filipe Manana



On 03/11/20 10:15, Jan Kara wrote:
> On Mon 02-11-20 17:58:54, Filipe Manana wrote:
>>
>>
>> On 26/10/20 15:22, Peter Zijlstra wrote:
>>> On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote:
 On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
>> That smells like the same issue reported here:
>>
>>   
>> https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
>>
>> Make sure you have commit:
>>
>>   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
>>
>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
>
> Yes, CONFIG_DEBUG_PREEMPT is enabled.

 Bummer :/

> I'll try with that commit and let you know, however it's gonna take a
> few hours to build a kernel and run all fstests (on that test box it
> takes over 3 hours) to confirm that fixes the issue.

 *ouch*, 3 hours is painful. How long to make it sick with the current
 kernel? quicker I would hope?

> Thanks for the quick reply!

 Anyway, I don't think that commit can actually explain the issue :/

 The false positive on lockdep_assert_held() happens when the recursion
 count is !0, however we _should_ be having IRQs disabled when
 lockdep_recursion > 0, so that should never be observable.

 My hope was that DEBUG_PREEMPT would trigger on one of the
 __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
 then be a clear violation.

 And you're seeing this on x86, right?

 Let me puzzle moar..
>>>
>>> So I might have an explanation for the Sparc64 fail, but that can't
>>> explain x86 :/
>>>
>>> I initially thought raw_cpu_read() was OK, since if it is !0 we have
>>> IRQs disabled and can't get migrated, so if we get migrated both CPUs
>>> must have 0 and it doesn't matter which 0 we read.
>>>
>>> And while that is true; it isn't the whole store, on pretty much all
>>> architectures (except x86) this can result in computing the address for
>>> one CPU, getting migrated, the old CPU continuing execution with another
>>> task (possibly setting recursion) and then the new CPU reading the value
>>> of the old CPU, which is no longer 0.
>>>
>>> I already fixed a bunch of that in:
>>>
>>>   baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu 
>>> variables"")
>>>
>>> but clearly this one got crossed.
>>>
>>> Still, that leaves me puzzled over you seeing this on x86 :/
>>
>> Hi Peter,
>>
>> I still get the same issue with 5.10-rc2.
>> Is there any non-merged patch I should try, or anything I can help with?
> 
> BTW, I've just hit the same deadlock issue with ext4 on generic/390 so I
> confirm this isn't btrfs specific issue (as we already knew from the
> analysis but still it's good to have that confirmed).

Indeed, yesterday Darrick was mentioning on IRC that he has run into it
too with fstests on XFS (5.10-rc).

> 
>   Honza
> 


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-03 Thread Jan Kara
On Mon 02-11-20 17:58:54, Filipe Manana wrote:
> 
> 
> On 26/10/20 15:22, Peter Zijlstra wrote:
> > On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote:
> >> On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
>  That smells like the same issue reported here:
> 
>    
>  https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> 
>  Make sure you have commit:
> 
>    f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> 
>  (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> >>>
> >>> Yes, CONFIG_DEBUG_PREEMPT is enabled.
> >>
> >> Bummer :/
> >>
> >>> I'll try with that commit and let you know, however it's gonna take a
> >>> few hours to build a kernel and run all fstests (on that test box it
> >>> takes over 3 hours) to confirm that fixes the issue.
> >>
> >> *ouch*, 3 hours is painful. How long to make it sick with the current
> >> kernel? quicker I would hope?
> >>
> >>> Thanks for the quick reply!
> >>
> >> Anyway, I don't think that commit can actually explain the issue :/
> >>
> >> The false positive on lockdep_assert_held() happens when the recursion
> >> count is !0, however we _should_ be having IRQs disabled when
> >> lockdep_recursion > 0, so that should never be observable.
> >>
> >> My hope was that DEBUG_PREEMPT would trigger on one of the
> >> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> >> then be a clear violation.
> >>
> >> And you're seeing this on x86, right?
> >>
> >> Let me puzzle moar..
> > 
> > So I might have an explanation for the Sparc64 fail, but that can't
> > explain x86 :/
> > 
> > I initially thought raw_cpu_read() was OK, since if it is !0 we have
> > IRQs disabled and can't get migrated, so if we get migrated both CPUs
> > must have 0 and it doesn't matter which 0 we read.
> > 
> > And while that is true; it isn't the whole store, on pretty much all
> > architectures (except x86) this can result in computing the address for
> > one CPU, getting migrated, the old CPU continuing execution with another
> > task (possibly setting recursion) and then the new CPU reading the value
> > of the old CPU, which is no longer 0.
> > 
> > I already fixed a bunch of that in:
> > 
> >   baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu 
> > variables"")
> > 
> > but clearly this one got crossed.
> > 
> > Still, that leaves me puzzled over you seeing this on x86 :/
> 
> Hi Peter,
> 
> I still get the same issue with 5.10-rc2.
> Is there any non-merged patch I should try, or anything I can help with?

BTW, I've just hit the same deadlock issue with ext4 on generic/390 so I
confirm this isn't btrfs specific issue (as we already knew from the
analysis but still it's good to have that confirmed).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-11-02 Thread Filipe Manana



On 26/10/20 15:22, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote:
>> On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
 That smells like the same issue reported here:

   
 https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net

 Make sure you have commit:

   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")

 (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
>>>
>>> Yes, CONFIG_DEBUG_PREEMPT is enabled.
>>
>> Bummer :/
>>
>>> I'll try with that commit and let you know, however it's gonna take a
>>> few hours to build a kernel and run all fstests (on that test box it
>>> takes over 3 hours) to confirm that fixes the issue.
>>
>> *ouch*, 3 hours is painful. How long to make it sick with the current
>> kernel? quicker I would hope?
>>
>>> Thanks for the quick reply!
>>
>> Anyway, I don't think that commit can actually explain the issue :/
>>
>> The false positive on lockdep_assert_held() happens when the recursion
>> count is !0, however we _should_ be having IRQs disabled when
>> lockdep_recursion > 0, so that should never be observable.
>>
>> My hope was that DEBUG_PREEMPT would trigger on one of the
>> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
>> then be a clear violation.
>>
>> And you're seeing this on x86, right?
>>
>> Let me puzzle moar..
> 
> So I might have an explanation for the Sparc64 fail, but that can't
> explain x86 :/
> 
> I initially thought raw_cpu_read() was OK, since if it is !0 we have
> IRQs disabled and can't get migrated, so if we get migrated both CPUs
> must have 0 and it doesn't matter which 0 we read.
> 
> And while that is true; it isn't the whole store, on pretty much all
> architectures (except x86) this can result in computing the address for
> one CPU, getting migrated, the old CPU continuing execution with another
> task (possibly setting recursion) and then the new CPU reading the value
> of the old CPU, which is no longer 0.
> 
> I already fixed a bunch of that in:
> 
>   baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu 
> variables"")
> 
> but clearly this one got crossed.
> 
> Still, that leaves me puzzled over you seeing this on x86 :/

Hi Peter,

I still get the same issue with 5.10-rc2.
Is there any non-merged patch I should try, or anything I can help with?

Thanks.

> 
> Anatoly, could you try linus+tip/locking/urgent and the below on your
> Sparc, please?
> 
> ---
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 3e99dfef8408..a3041463e42d 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -84,7 +84,7 @@ static inline bool lockdep_enabled(void)
>   if (!debug_locks)
>   return false;
>  
> - if (raw_cpu_read(lockdep_recursion))
> + if (this_cpu_read(lockdep_recursion))
>   return false;
>  
>   if (current->lockdep_recursion)
> 


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-27 Thread Anatoly Pugachev
On Mon, Oct 26, 2020 at 6:23 PM Peter Zijlstra  wrote:
> On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
> > > > That smells like the same issue reported here:
> > > >
> > > >   
> > > > https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> > > >
> > > > Make sure you have commit:
> > > >
> > > >   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> > > >
> > > > (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> > >
> > > Yes, CONFIG_DEBUG_PREEMPT is enabled.
> >
> > Bummer :/
> >
> > > I'll try with that commit and let you know, however it's gonna take a
> > > few hours to build a kernel and run all fstests (on that test box it
> > > takes over 3 hours) to confirm that fixes the issue.
> >
> > *ouch*, 3 hours is painful. How long to make it sick with the current
> > kernel? quicker I would hope?
> >
> > > Thanks for the quick reply!
> >
> > Anyway, I don't think that commit can actually explain the issue :/
> >
> > The false positive on lockdep_assert_held() happens when the recursion
> > count is !0, however we _should_ be having IRQs disabled when
> > lockdep_recursion > 0, so that should never be observable.
> >
> > My hope was that DEBUG_PREEMPT would trigger on one of the
> > __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> > then be a clear violation.
> >
> > And you're seeing this on x86, right?
> >
> > Let me puzzle moar..
>
> So I might have an explanation for the Sparc64 fail, but that can't
> explain x86 :/
>
> I initially thought raw_cpu_read() was OK, since if it is !0 we have
> IRQs disabled and can't get migrated, so if we get migrated both CPUs
> must have 0 and it doesn't matter which 0 we read.
>
> And while that is true; it isn't the whole store, on pretty much all
> architectures (except x86) this can result in computing the address for
> one CPU, getting migrated, the old CPU continuing execution with another
> task (possibly setting recursion) and then the new CPU reading the value
> of the old CPU, which is no longer 0.
>
> I already fixed a bunch of that in:
>
>   baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu 
> variables"")
>
> but clearly this one got crossed.
>
> Still, that leaves me puzzled over you seeing this on x86 :/
>
> Anatoly, could you try linus+tip/locking/urgent and the below on your
> Sparc, please?

Peter,
let me test first. Thanks.

PS: sorry for the delay, a weekend and got ill a bit ...


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread David Sterba
On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
> > > That smells like the same issue reported here:
> > > 
> > >   
> > > https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> > > 
> > > Make sure you have commit:
> > > 
> > >   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> > > 
> > > (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> > 
> > Yes, CONFIG_DEBUG_PREEMPT is enabled.
> 
> Bummer :/

My builds don't have that enabled (CONFIG_PREEMPT_NONE=y) but I still
see the warning (same scenario as for Filipe). That is with today's
master branch + your fix from locking/urgent.

> > I'll try with that commit and let you know, however it's gonna take a
> > few hours to build a kernel and run all fstests (on that test box it
> > takes over 3 hours) to confirm that fixes the issue.
> 
> *ouch*, 3 hours is painful. How long to make it sick with the current
> kernel? quicker I would hope?
> 
> > Thanks for the quick reply!
> 
> Anyway, I don't think that commit can actually explain the issue :/
> 
> The false positive on lockdep_assert_held() happens when the recursion
> count is !0, however we _should_ be having IRQs disabled when
> lockdep_recursion > 0, so that should never be observable.
> 
> My hope was that DEBUG_PREEMPT would trigger on one of the
> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> then be a clear violation.

I can start another round (in my case it's more than 4 hours to
reproduce it) with DEBUG_PREEMPT, unless you have something else to
test.


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Peter Zijlstra
On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
> > > That smells like the same issue reported here:
> > > 
> > >   
> > > https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> > > 
> > > Make sure you have commit:
> > > 
> > >   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> > > 
> > > (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> > 
> > Yes, CONFIG_DEBUG_PREEMPT is enabled.
> 
> Bummer :/
> 
> > I'll try with that commit and let you know, however it's gonna take a
> > few hours to build a kernel and run all fstests (on that test box it
> > takes over 3 hours) to confirm that fixes the issue.
> 
> *ouch*, 3 hours is painful. How long to make it sick with the current
> kernel? quicker I would hope?
> 
> > Thanks for the quick reply!
> 
> Anyway, I don't think that commit can actually explain the issue :/
> 
> The false positive on lockdep_assert_held() happens when the recursion
> count is !0, however we _should_ be having IRQs disabled when
> lockdep_recursion > 0, so that should never be observable.
> 
> My hope was that DEBUG_PREEMPT would trigger on one of the
> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> then be a clear violation.
> 
> And you're seeing this on x86, right?
> 
> Let me puzzle moar..

So I might have an explanation for the Sparc64 fail, but that can't
explain x86 :/

I initially thought raw_cpu_read() was OK, since if it is !0 we have
IRQs disabled and can't get migrated, so if we get migrated both CPUs
must have 0 and it doesn't matter which 0 we read.

And while that is true; it isn't the whole store, on pretty much all
architectures (except x86) this can result in computing the address for
one CPU, getting migrated, the old CPU continuing execution with another
task (possibly setting recursion) and then the new CPU reading the value
of the old CPU, which is no longer 0.

I already fixed a bunch of that in:

  baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu 
variables"")

but clearly this one got crossed.

Still, that leaves me puzzled over you seeing this on x86 :/

Anatoly, could you try linus+tip/locking/urgent and the below on your
Sparc, please?

---
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3e99dfef8408..a3041463e42d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -84,7 +84,7 @@ static inline bool lockdep_enabled(void)
if (!debug_locks)
return false;
 
-   if (raw_cpu_read(lockdep_recursion))
+   if (this_cpu_read(lockdep_recursion))
return false;
 
if (current->lockdep_recursion)


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Filipe Manana



On 26/10/20 12:55, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
>>> That smells like the same issue reported here:
>>>
>>>   
>>> https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
>>>
>>> Make sure you have commit:
>>>
>>>   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
>>>
>>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
>>
>> Yes, CONFIG_DEBUG_PREEMPT is enabled.
> 
> Bummer :/
> 
>> I'll try with that commit and let you know, however it's gonna take a
>> few hours to build a kernel and run all fstests (on that test box it
>> takes over 3 hours) to confirm that fixes the issue.
> 
> *ouch*, 3 hours is painful. How long to make it sick with the current
> kernel? quicker I would hope?

If generic/068 triggers the bug, than it's about 1 hour. If that passes,
which rarely happens, then have to wait to get into generic/390, which
is over 2 hours.

It sucks that running those tests alone never trigger the issue, but
running all fstests (first btrfs specific ones, followed by the generic
ones) reliably triggers the bug, almost always at generic/068, when that
passes, it's triggered by generic/390. To confirm everything is ok, I
let all tests run (last generic is 612).


> 
>> Thanks for the quick reply!
> 
> Anyway, I don't think that commit can actually explain the issue :/
> 
> The false positive on lockdep_assert_held() happens when the recursion
> count is !0, however we _should_ be having IRQs disabled when
> lockdep_recursion > 0, so that should never be observable.
> 
> My hope was that DEBUG_PREEMPT would trigger on one of the
> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
> then be a clear violation.
> 
> And you're seeing this on x86, right?

Right.
It's in a qemu vm on x86, with '-cpu host' passed to qemu and kvm enabled.

Thanks.


> 
> Let me puzzle moar..
> 


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Peter Zijlstra
On Mon, Oct 26, 2020 at 11:56:03AM +, Filipe Manana wrote:
> > That smells like the same issue reported here:
> > 
> >   
> > https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> > 
> > Make sure you have commit:
> > 
> >   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> > 
> > (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> 
> Yes, CONFIG_DEBUG_PREEMPT is enabled.

Bummer :/

> I'll try with that commit and let you know, however it's gonna take a
> few hours to build a kernel and run all fstests (on that test box it
> takes over 3 hours) to confirm that fixes the issue.

*ouch*, 3 hours is painful. How long to make it sick with the current
kernel? quicker I would hope?

> Thanks for the quick reply!

Anyway, I don't think that commit can actually explain the issue :/

The false positive on lockdep_assert_held() happens when the recursion
count is !0, however we _should_ be having IRQs disabled when
lockdep_recursion > 0, so that should never be observable.

My hope was that DEBUG_PREEMPT would trigger on one of the
__this_cpu_{inc,dec}(lockdep_recursion) instance, because that would
then be a clear violation.

And you're seeing this on x86, right?

Let me puzzle moar..


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Peter Zijlstra
On Mon, Oct 26, 2020 at 12:55:41PM +0100, Jan Kara wrote:
> > Make sure you have commit:
> > 
> >   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> > 
> > (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> 
> Hum, I am at 5.10-rc1 now and above mentioned commit doesn't appear to be
> there? Also googling for the title doesn't help...

*groan*... I seem to have forgotten to push it out to tip/locking/urgent on
Friday :/

Find it here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/urgent




Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Filipe Manana



On 26/10/20 11:55, Jan Kara wrote:
> On Mon 26-10-20 12:40:09, Peter Zijlstra wrote:
>> On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
>>> Hello,
>>>
>>> I've recently started to hit a warning followed by tasks hanging after
>>> attempts to freeze a filesystem. A git bisection pointed to the
>>> following commit:
>>>
>>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>>> Author: Peter Zijlstra 
>>> Date:   Fri Oct 2 11:04:21 2020 +0200
>>>
>>> lockdep: Fix lockdep recursion
>>>
>>> This happens very reliably when running all xfstests with lockdep
>>> enabled, and the tested filesystem is btrfs (haven't tried other
>>> filesystems, but it shouldn't matter). The warning and task hangs always
>>> happen at either test generic/068 or test generic/390, and (oddly)
>>> always have to run all tests for it to trigger, running those tests
>>> individually on an infinite loop doesn't seem to trigger it (at least
>>> for a couple hours).
>>>
>>> The warning triggered is at fs/super.c:__sb_start_write() which always
>>> results later in several tasks hanging on a percpu rw_sem:
>>>
>>> https://pastebin.com/qnLvf94E
>>>
>>> What happens is percpu_rwsem_is_held() is apparently returning a false
>>> positive,
>>
>> That smells like the same issue reported here:
>>
>>   
>> https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
>>
>> Make sure you have commit:
>>
>>   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
>>
>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
> 
> Hum, I am at 5.10-rc1 now and above mentioned commit doesn't appear to be
> there? Also googling for the title doesn't help...

Same here, can't find in Linus' tree neither through google.

> 
>   Honza
> 


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Filipe Manana



On 26/10/20 11:40, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
>> Hello,
>>
>> I've recently started to hit a warning followed by tasks hanging after
>> attempts to freeze a filesystem. A git bisection pointed to the
>> following commit:
>>
>> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
>> Author: Peter Zijlstra 
>> Date:   Fri Oct 2 11:04:21 2020 +0200
>>
>> lockdep: Fix lockdep recursion
>>
>> This happens very reliably when running all xfstests with lockdep
>> enabled, and the tested filesystem is btrfs (haven't tried other
>> filesystems, but it shouldn't matter). The warning and task hangs always
>> happen at either test generic/068 or test generic/390, and (oddly)
>> always have to run all tests for it to trigger, running those tests
>> individually on an infinite loop doesn't seem to trigger it (at least
>> for a couple hours).
>>
>> The warning triggered is at fs/super.c:__sb_start_write() which always
>> results later in several tasks hanging on a percpu rw_sem:
>>
>> https://pastebin.com/qnLvf94E
>>
>> What happens is percpu_rwsem_is_held() is apparently returning a false
>> positive,
> 
> That smells like the same issue reported here:
> 
>   
> https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> 
> Make sure you have commit:
> 
>   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> 
> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?

Yes, CONFIG_DEBUG_PREEMPT is enabled.
I'll try with that commit and let you know, however it's gonna take a
few hours to build a kernel and run all fstests (on that test box it
takes over 3 hours) to confirm that fixes the issue.

Thanks for the quick reply!

> 
> 
> 
> 


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Jan Kara
On Mon 26-10-20 12:40:09, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
> > Hello,
> > 
> > I've recently started to hit a warning followed by tasks hanging after
> > attempts to freeze a filesystem. A git bisection pointed to the
> > following commit:
> > 
> > commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> > Author: Peter Zijlstra 
> > Date:   Fri Oct 2 11:04:21 2020 +0200
> > 
> > lockdep: Fix lockdep recursion
> > 
> > This happens very reliably when running all xfstests with lockdep
> > enabled, and the tested filesystem is btrfs (haven't tried other
> > filesystems, but it shouldn't matter). The warning and task hangs always
> > happen at either test generic/068 or test generic/390, and (oddly)
> > always have to run all tests for it to trigger, running those tests
> > individually on an infinite loop doesn't seem to trigger it (at least
> > for a couple hours).
> > 
> > The warning triggered is at fs/super.c:__sb_start_write() which always
> > results later in several tasks hanging on a percpu rw_sem:
> > 
> > https://pastebin.com/qnLvf94E
> > 
> > What happens is percpu_rwsem_is_held() is apparently returning a false
> > positive,
> 
> That smells like the same issue reported here:
> 
>   
> https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net
> 
> Make sure you have commit:
> 
>   f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
> 
> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?

Hum, I am at 5.10-rc1 now and above mentioned commit doesn't appear to be
there? Also googling for the title doesn't help...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

2020-10-26 Thread Peter Zijlstra
On Mon, Oct 26, 2020 at 11:26:49AM +, Filipe Manana wrote:
> Hello,
> 
> I've recently started to hit a warning followed by tasks hanging after
> attempts to freeze a filesystem. A git bisection pointed to the
> following commit:
> 
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra 
> Date:   Fri Oct 2 11:04:21 2020 +0200
> 
> lockdep: Fix lockdep recursion
> 
> This happens very reliably when running all xfstests with lockdep
> enabled, and the tested filesystem is btrfs (haven't tried other
> filesystems, but it shouldn't matter). The warning and task hangs always
> happen at either test generic/068 or test generic/390, and (oddly)
> always have to run all tests for it to trigger, running those tests
> individually on an infinite loop doesn't seem to trigger it (at least
> for a couple hours).
> 
> The warning triggered is at fs/super.c:__sb_start_write() which always
> results later in several tasks hanging on a percpu rw_sem:
> 
> https://pastebin.com/qnLvf94E
> 
> What happens is percpu_rwsem_is_held() is apparently returning a false
> positive,

That smells like the same issue reported here:

  
https://lkml.kernel.org/r/20201022111700.gz2...@hirez.programming.kicks-ass.net

Make sure you have commit:

  f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")

(in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?