Re: [d_alloc_parallel] WARNING: bad unlock balance detected!
On Tue, Nov 07, 2017 at 12:03:29PM +0800, Fengguang Wu wrote: > >>[ 428.512005] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow > >>Control: RX > >>LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-8, MAC , kernel 4.14.0-rc8 158, > >>serial console /dev/ttyS0 > >>[ 429.798345] Kernel tests: Boot OK! > >>[ 430.761760] [ 430.766166] = > >>[ 430.775297] WARNING: bad unlock balance detected! > >>[ 430.784342] 4.14.0-rc8 #158 Not tainted > >>[ 430.792153] - > >>[ 430.801319] pidof/1024 is trying to release lock (rcu_preempt_state) at: > >>[ 430.813514] [] rcu_read_unlock_special+0x5f8/0x620 > >>[ 430.824041] but there are no more locks to release! > > > >Er... yes? What of that? Since when is rcu_read_lock() not allowed to > >be used under an rwsem? That's not what it says, it is.. The warning is about trying to release a lock that's not held. And its right, RCU was doing that. It would acquire a lock without lockdep knowing about it and then telling lockdep about freeing it. This is fixed by commit: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints") The problem is that RCU boosting was mixing futex and !futex rt_mutex ops.
Re: [d_alloc_parallel] WARNING: bad unlock balance detected!
CC locking people. On Tue, Nov 07, 2017 at 02:33:28AM +, Al Viro wrote: On Tue, Nov 07, 2017 at 10:01:13AM +0800, Fengguang Wu wrote: Hi, Here is a warning in v4.14-rc8 -- it's not necessarily a new bug. Why is it a bug at all? [ 428.512005] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-8, MAC , kernel 4.14.0-rc8 158, serial console /dev/ttyS0 [ 429.798345] Kernel tests: Boot OK! [ 430.761760] [ 430.766166] = [ 430.775297] WARNING: bad unlock balance detected! [ 430.784342] 4.14.0-rc8 #158 Not tainted [ 430.792153] - [ 430.801319] pidof/1024 is trying to release lock (rcu_preempt_state) at: [ 430.813514] [] rcu_read_unlock_special+0x5f8/0x620 [ 430.824041] but there are no more locks to release! Er... yes? What of that? Since when is rcu_read_lock() not allowed to be used under an rwsem? [ 430.833342] [ 430.833342] other info that might help us debug this: [ 430.845985] 2 locks held by pidof/1024: [ 430.853826] #0: (&sb->s_type->i_mutex_key){}, at: [] lookup_slow+0x8a/0x310 [ 430.869344] #1: (rcu_read_lock){}, at: [] d_alloc_parallel+0x7e/0xd10 No shit - we are doing RCU cache chain walk while holding ->i_rwsem. As in down_read(&rwsem); ... rcu_read_lock(); ... rcu_read_unlock(); Why is that a problem? If we are suddenly not allowed to have an RCU reader section while holding any kind of a blocking lock, a *lot* of places in the kernel are screwed. Please, explain. Good question! Actually it's not only you. There are dozens of occurrences for this "unlock balance" warning in RC8: ((console_sem).lock){-...}, at: up (gcov_lock){+.+.}, at: gcov_enable_events (&mm->mmap_sem){}, at: __do_page_fault (node_lock){+.+.}, at: gcov_event (&p->pi_lock){-.-.}, at: try_to_wake_up (rcu_read_lock){}, at: aa_file_perm (rcu_read_lock){}, at: copy_namespaces (rcu_read_lock){}, at: d_alloc_parallel (rcu_read_lock){}, at: __d_lookup (rcu_read_lock){}, at: dput (rcu_read_lock){}, at: find_get_entry (rcu_read_lock){}, at: insert_retry (rcu_read_lock){}, at: mntput_no_expire (rcu_read_lock){}, at: netlink_insert (rcu_read_lock){}, at: rcu_read_lock (rcu_read_lock){}, at: rcu_torture_read_lock (rcu_read_lock){}, at: rhashtable_insert_slow (rcu_read_lock){}, at: rhashtable_walk_start (rcu_read_lock){}, at: sock_def_readable (rcu_read_lock){}, at: SyS_setpriority (rcu_read_lock){}, at: T.947 (rcu_read_lock){}, at: T.949 (rcu_read_lock){}, at: test_rhashtable (rcu_read_lock){}, at: test_rht_lookup (rcu_read_lock){}, at: threadfunc (rcu_read_lock){}, at: thread_lookup_test (rcu_read_lock){}, at: watchdog (rcu_sched_state.exp_mutex){+.+.}, at: _synchronize_rcu_expedited (&sb->s_type->i_mutex_key#3){+.+.}, at: start_creating (&sb->s_type->i_mutex_key){}, at: lookup_slow (&sig->cred_guard_mutex){}, at: prepare_bprm_creds (sk_lock-AF_NETLINK){}, at: netlink_insert (tasklist_lock){}, at: debug_show_all_locks (tasklist_lock){.+.+}, at: debug_show_all_locks (tty_mutex){}, at: tty_open (tty_mutex){+.+.}, at: tty_open (tty_mutex){+.+.}, at: tty_release_struct Maybe some recent core locking changes triggered this warning. In particular, some warnings show up since this commit. commit cde50a67397c0da7d11795d4b4418384022ab8e6 Author: Levin, Alexander (Sasha Levin) AuthorDate: Sun Jun 18 14:06:01 2017 + Commit: Ingo Molnar CommitDate: Tue Jun 20 11:53:09 2017 +0200 locking/rtmutex: Don't initialize lockdep when not required pi_mutex isn't supposed to be tracked by lockdep, but just passing NULLs for name and key will cause lockdep to spew a warning and die, which is not what we want it to do. Skip lockdep initialization if the caller passed NULLs for name and key, suggesting such initialization isn't desired. Signed-off-by: Sasha Levin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: f5694788ad8d ("rt_mutex: Add lockdep annotations") Link: http://lkml.kernel.org/r/20170618140548.4763-1-alexander.le...@verizon.com Signed-off-by: Ingo Molnar --- kernel/locking/rtmutex.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 43123533e9b10..78069895032a9 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1661,6 +1661,7 @@ void __rt_mutex_init(struct rt_mutex *lock, const char *name, lock->waiters = RB_ROOT; lock->waiters_leftmost = NULL; + if (name && key) debug_rt_mutex_init(lock, name, key); } EXPORT_SYMBOL_GPL(__rt_mutex_init); 018956d641 locking/selftest: Add RT-mutex support cde50a6739 locking/rtmutex: Don't initialize lockdep when not requir
Re: [d_alloc_parallel] WARNING: bad unlock balance detected!
On Tue, Nov 07, 2017 at 10:01:13AM +0800, Fengguang Wu wrote: > Hi, > > Here is a warning in v4.14-rc8 -- it's not necessarily a new bug. Why is it a bug at all? > [ 428.512005] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: RX > LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-8, MAC , kernel 4.14.0-rc8 158, > serial console /dev/ttyS0 > [ 429.798345] Kernel tests: Boot OK! > [ 430.761760] [ 430.766166] = > [ 430.775297] WARNING: bad unlock balance detected! > [ 430.784342] 4.14.0-rc8 #158 Not tainted > [ 430.792153] - > [ 430.801319] pidof/1024 is trying to release lock (rcu_preempt_state) at: > [ 430.813514] [] rcu_read_unlock_special+0x5f8/0x620 > [ 430.824041] but there are no more locks to release! Er... yes? What of that? Since when is rcu_read_lock() not allowed to be used under an rwsem? > [ 430.833342] [ 430.833342] other info that might help us debug this: > [ 430.845985] 2 locks held by pidof/1024: > [ 430.853826] #0: (&sb->s_type->i_mutex_key){}, at: [] > lookup_slow+0x8a/0x310 > [ 430.869344] #1: (rcu_read_lock){}, at: [] > d_alloc_parallel+0x7e/0xd10 No shit - we are doing RCU cache chain walk while holding ->i_rwsem. As in down_read(&rwsem); ... rcu_read_lock(); ... rcu_read_unlock(); Why is that a problem? If we are suddenly not allowed to have an RCU reader section while holding any kind of a blocking lock, a *lot* of places in the kernel are screwed. Please, explain.
[d_alloc_parallel] WARNING: bad unlock balance detected!
Hi, Here is a warning in v4.14-rc8 -- it's not necessarily a new bug. [ 428.512005] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-8, MAC , kernel 4.14.0-rc8 158, serial console /dev/ttyS0 [ 429.798345] Kernel tests: Boot OK! [ 430.761760] [ 430.766166] = [ 430.775297] WARNING: bad unlock balance detected! [ 430.784342] 4.14.0-rc8 #158 Not tainted [ 430.792153] - [ 430.801319] pidof/1024 is trying to release lock (rcu_preempt_state) at: [ 430.813514] [] rcu_read_unlock_special+0x5f8/0x620 [ 430.824041] but there are no more locks to release! [ 430.833342] [ 430.833342] other info that might help us debug this: [ 430.845985] 2 locks held by pidof/1024: [ 430.853826] #0: (&sb->s_type->i_mutex_key){}, at: [] lookup_slow+0x8a/0x310 [ 430.869344] #1: (rcu_read_lock){}, at: [] d_alloc_parallel+0x7e/0xd10 [ 430.884215] [ 430.884215] stack backtrace: [ 430.893488] CPU: 0 PID: 1024 Comm: pidof Not tainted 4.14.0-rc8 #158 [ 430.905217] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 430.920682] Call Trace: [ 430.926481] dump_stack+0x16/0x1c [ 430.933406] print_unlock_imbalance_bug+0xb9/0xd0 [ 430.942512] ? rcu_read_unlock_special+0x5f8/0x620 [ 430.951690] ? rcu_read_unlock_special+0x5f8/0x620 [ 430.960928] lock_release+0x1cc/0x490 [ 430.968446] ? rcu_gp_kthread_wake+0x34/0x50 [ 430.976960] ? rcu_read_unlock_special+0x5f8/0x620 [ 430.986108] rt_mutex_unlock+0x1e/0xb0 [ 430.993754] rcu_read_unlock_special+0x5f8/0x620 [ 431.003016] __rcu_read_unlock+0xa7/0xb0 [ 431.010665] d_alloc_parallel+0x345/0xd10 [ 431.018441] lookup_slow+0xe4/0x310 [ 431.025500] walk_component+0x146/0x590 [ 431.031915] path_lookupat+0xcc/0x3f0 [ 431.038634] ? 0xc100 [ 431.042398] ? 0xc100 [ 431.046246] filename_lookup+0xd8/0x290 [ 431.051246] ? strncpy_from_user+0x13f/0x260 [ 431.056818] user_path_at_empty+0x25/0x30 [ 431.062068] SyS_readlink+0x52/0x130 [ 431.066929] do_int80_syscall_32+0x95/0x290 [ 431.072306] entry_INT80_32+0x2f/0x2f [ 431.077230] EIP: 0xb7f0e6a2 [ 431.081187] EFLAGS: 0296 CPU: 0 [ 431.085910] EAX: ffda EBX: bf82714b ECX: 080964b8 EDX: 0050 [ 431.095247] ESI: 0050 EDI: bf82714b EBP: bf827124 ESP: bf827100 [ 431.107404] DS: 007b ES: 007b FS: GS: 0033 SS: 007b [ 431.617153] 8021q: adding VLAN 0 to HW filter on device eth0 [ 433.538079] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Thanks, Fengguang