Re: [PATCH] Repair misuse of sv_lock in 5.10.16-rt30.
> On Feb 26, 2021, at 10:19 AM, Joe Korty wrote: > > On Fri, Feb 26, 2021 at 03:15:46PM +, Chuck Lever wrote: >> >> >>> On Feb 26, 2021, at 10:00 AM, J. Bruce Fields wrote: >>> >>> Adding Chuck, linux-nfs. >>> >>> Makes sense to me.--b. >> >> Joe, I can add this to nfsd-5.12-rc. Would it be appropriate to add: >> >> Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") > > Sure. > And thanks, everybody, for the quick response. > Joe Your patch has been added to the for-rc topic branch at git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git with minor edits to the patch description. -- Chuck Lever
Re: [PATCH] Repair misuse of sv_lock in 5.10.16-rt30.
On Fri, Feb 26, 2021 at 03:15:46PM +, Chuck Lever wrote: > > > > On Feb 26, 2021, at 10:00 AM, J. Bruce Fields wrote: > > > > Adding Chuck, linux-nfs. > > > > Makes sense to me.--b. > > Joe, I can add this to nfsd-5.12-rc. Would it be appropriate to add: > > Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") Sure. And thanks, everybody, for the quick response. Joe
Re: [PATCH] Repair misuse of sv_lock in 5.10.16-rt30.
> On Feb 26, 2021, at 10:00 AM, J. Bruce Fields wrote: > > Adding Chuck, linux-nfs. > > Makes sense to me.--b. Joe, I can add this to nfsd-5.12-rc. Would it be appropriate to add: Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") > On Fri, Feb 26, 2021 at 09:38:20AM -0500, Joe Korty wrote: >> Repair misuse of sv_lock in 5.10.16-rt30. >> >> [ This problem is in mainline, but only rt has the chops to be >> able to detect it. ] >> >> Lockdep reports a circular lock dependency between serv->sv_lock and >> softirq_ctl.lock on system shutdown, when using a kernel built with >> CONFIG_PREEMPT_RT=y, and a nfs mount exists. >> >> This is due to the definition of spin_lock_bh on rt: >> >> local_bh_disable(); >> rt_spin_lock(lock); >> >> which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is >> not a problem as long as _every_ lock of serv->sv_lock is a: >> >> spin_lock_bh(>sv_lock); >> >> but there is one of the form: >> >> spin_lock(>sv_lock); >> >> This is what is causing the circular dependency splat. The spin_lock() >> grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. >> If later on in the critical region, someone does a local_bh_disable, we >> get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. >> >> Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no >> exceptions. >> >> Signed-off-by: Joe Korty >> >> >> >> >> [ OK ] Stopped target NFS client services. >> Stopping Logout off all iSCSI sessions on shutdown... >> Stopping NFS server and services... >> [ 109.442380] >> [ 109.442385] == >> [ 109.442386] WARNING: possible circular locking dependency detected >> [ 109.442387] 5.10.16-rt30 #1 Not tainted >> [ 109.442389] -- >> [ 109.442390] nfsd/1032 is trying to acquire lock: >> [ 109.442392] 994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: >> __local_bh_disable_ip+0xd9/0x270 >> [ 109.442405] >> [ 109.442405] but task is already holding lock: >> [ 109.442406] 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: >> svc_close_list+0x1f/0x90 >> [ 109.442415] >> [ 109.442415] which lock already depends on the new lock. >> [ 109.442415] >> [ 109.442416] >> [ 109.442416] the existing dependency chain (in reverse order) is: >> [ 109.442417] >> [ 109.442417] -> #1 (>sv_lock){+.+.}-{0:0}: >> [ 109.442421]rt_spin_lock+0x2b/0xc0 >> [ 109.442428]svc_add_new_perm_xprt+0x42/0xa0 >> [ 109.442430]svc_addsock+0x135/0x220 >> [ 109.442434]write_ports+0x4b3/0x620 >> [ 109.442438]nfsctl_transaction_write+0x45/0x80 >> [ 109.442440]vfs_write+0xff/0x420 >> [ 109.442444]ksys_write+0x4f/0xc0 >> [ 109.442446]do_syscall_64+0x33/0x40 >> [ 109.442450]entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 109.442454] >> [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: >> [ 109.442457]__lock_acquire+0x1264/0x20b0 >> [ 109.442463]lock_acquire+0xc2/0x400 >> [ 109.442466]rt_spin_lock+0x2b/0xc0 >> [ 109.442469]__local_bh_disable_ip+0xd9/0x270 >> [ 109.442471]svc_xprt_do_enqueue+0xc0/0x4d0 >> [ 109.442474]svc_close_list+0x60/0x90 >> [ 109.442476]svc_close_net+0x49/0x1a0 >> [ 109.442478]svc_shutdown_net+0x12/0x40 >> [ 109.442480]nfsd_destroy+0xc5/0x180 >> [ 109.442482]nfsd+0x1bc/0x270 >> [ 109.442483]kthread+0x194/0x1b0 >> [ 109.442487]ret_from_fork+0x22/0x30 >> [ 109.442492] >> [ 109.442492] other info that might help us debug this: >> [ 109.442492] >> [ 109.442493] Possible unsafe locking scenario: >> [ 109.442493] >> [ 109.442493]CPU0CPU1 >> [ 109.442494] >> [ 109.442495] lock(>sv_lock); >> [ 109.442496]lock((softirq_ctrl.lock).lock); >> [ 109.442498]lock(>sv_lock); >> [ 109.442499] lock((softirq_ctrl.lock).lock); >> [ 109.442501] >> [ 109.442501] *** DEADLOCK *** >> [ 109.442501] >> [ 109.442501] 3 locks held by nfsd/1032: >> [ 109.442503] #0: 93b49258 (nfsd_mutex){+.+.}-{3:3}, at: >> nfsd+0x19a/0x270 >> [ 109.442508] #1: 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: >> svc_close_list+0x1f/0x90 >> [ 109.442512] #2: 93a81b20 (rcu_read_lock){}-{1:2}, at: >> rt_spin_lock+0x5/0xc0 >> [ 109.442518] >> [ 109.442518] stack backtrace: >> [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 >> [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 >> 09/22/2015 >> [ 109.442524] Call Trace: >> [ 109.442527] dump_stack+0x77/0x97 >> [ 109.442533] check_noncircular+0xdc/0xf0 >> [ 109.442546] __lock_acquire+0x1264/0x20b0 >> [ 109.442553]
Re: [PATCH] Repair misuse of sv_lock in 5.10.16-rt30.
Adding Chuck, linux-nfs. Makes sense to me.--b. On Fri, Feb 26, 2021 at 09:38:20AM -0500, Joe Korty wrote: > Repair misuse of sv_lock in 5.10.16-rt30. > > [ This problem is in mainline, but only rt has the chops to be > able to detect it. ] > > Lockdep reports a circular lock dependency between serv->sv_lock and > softirq_ctl.lock on system shutdown, when using a kernel built with > CONFIG_PREEMPT_RT=y, and a nfs mount exists. > > This is due to the definition of spin_lock_bh on rt: > > local_bh_disable(); > rt_spin_lock(lock); > > which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is > not a problem as long as _every_ lock of serv->sv_lock is a: > > spin_lock_bh(>sv_lock); > > but there is one of the form: > > spin_lock(>sv_lock); > > This is what is causing the circular dependency splat. The spin_lock() > grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. > If later on in the critical region, someone does a local_bh_disable, we > get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. > > Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no > exceptions. > > Signed-off-by: Joe Korty > > > > > [ OK ] Stopped target NFS client services. > Stopping Logout off all iSCSI sessions on shutdown... > Stopping NFS server and services... > [ 109.442380] > [ 109.442385] == > [ 109.442386] WARNING: possible circular locking dependency detected > [ 109.442387] 5.10.16-rt30 #1 Not tainted > [ 109.442389] -- > [ 109.442390] nfsd/1032 is trying to acquire lock: > [ 109.442392] 994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: > __local_bh_disable_ip+0xd9/0x270 > [ 109.442405] > [ 109.442405] but task is already holding lock: > [ 109.442406] 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: > svc_close_list+0x1f/0x90 > [ 109.442415] > [ 109.442415] which lock already depends on the new lock. > [ 109.442415] > [ 109.442416] > [ 109.442416] the existing dependency chain (in reverse order) is: > [ 109.442417] > [ 109.442417] -> #1 (>sv_lock){+.+.}-{0:0}: > [ 109.442421]rt_spin_lock+0x2b/0xc0 > [ 109.442428]svc_add_new_perm_xprt+0x42/0xa0 > [ 109.442430]svc_addsock+0x135/0x220 > [ 109.442434]write_ports+0x4b3/0x620 > [ 109.442438]nfsctl_transaction_write+0x45/0x80 > [ 109.442440]vfs_write+0xff/0x420 > [ 109.442444]ksys_write+0x4f/0xc0 > [ 109.442446]do_syscall_64+0x33/0x40 > [ 109.442450]entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 109.442454] > [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: > [ 109.442457]__lock_acquire+0x1264/0x20b0 > [ 109.442463]lock_acquire+0xc2/0x400 > [ 109.442466]rt_spin_lock+0x2b/0xc0 > [ 109.442469]__local_bh_disable_ip+0xd9/0x270 > [ 109.442471]svc_xprt_do_enqueue+0xc0/0x4d0 > [ 109.442474]svc_close_list+0x60/0x90 > [ 109.442476]svc_close_net+0x49/0x1a0 > [ 109.442478]svc_shutdown_net+0x12/0x40 > [ 109.442480]nfsd_destroy+0xc5/0x180 > [ 109.442482]nfsd+0x1bc/0x270 > [ 109.442483]kthread+0x194/0x1b0 > [ 109.442487]ret_from_fork+0x22/0x30 > [ 109.442492] > [ 109.442492] other info that might help us debug this: > [ 109.442492] > [ 109.442493] Possible unsafe locking scenario: > [ 109.442493] > [ 109.442493]CPU0CPU1 > [ 109.442494] > [ 109.442495] lock(>sv_lock); > [ 109.442496]lock((softirq_ctrl.lock).lock); > [ 109.442498]lock(>sv_lock); > [ 109.442499] lock((softirq_ctrl.lock).lock); > [ 109.442501] > [ 109.442501] *** DEADLOCK *** > [ 109.442501] > [ 109.442501] 3 locks held by nfsd/1032: > [ 109.442503] #0: 93b49258 (nfsd_mutex){+.+.}-{3:3}, at: > nfsd+0x19a/0x270 > [ 109.442508] #1: 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: > svc_close_list+0x1f/0x90 > [ 109.442512] #2: 93a81b20 (rcu_read_lock){}-{1:2}, at: > rt_spin_lock+0x5/0xc0 > [ 109.442518] > [ 109.442518] stack backtrace: > [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 > [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 > 09/22/2015 > [ 109.442524] Call Trace: > [ 109.442527] dump_stack+0x77/0x97 > [ 109.442533] check_noncircular+0xdc/0xf0 > [ 109.442546] __lock_acquire+0x1264/0x20b0 > [ 109.442553] lock_acquire+0xc2/0x400 > [ 109.442564] rt_spin_lock+0x2b/0xc0 > [ 109.442570] __local_bh_disable_ip+0xd9/0x270 > [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 > [ 109.442577] svc_close_list+0x60/0x90 > [ 109.442581] svc_close_net+0x49/0x1a0 > [ 109.442585] svc_shutdown_net+0x12/0x40 > [ 109.442588]
[PATCH] Repair misuse of sv_lock in 5.10.16-rt30.
Repair misuse of sv_lock in 5.10.16-rt30. [ This problem is in mainline, but only rt has the chops to be able to detect it. ] Lockdep reports a circular lock dependency between serv->sv_lock and softirq_ctl.lock on system shutdown, when using a kernel built with CONFIG_PREEMPT_RT=y, and a nfs mount exists. This is due to the definition of spin_lock_bh on rt: local_bh_disable(); rt_spin_lock(lock); which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is not a problem as long as _every_ lock of serv->sv_lock is a: spin_lock_bh(>sv_lock); but there is one of the form: spin_lock(>sv_lock); This is what is causing the circular dependency splat. The spin_lock() grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. If later on in the critical region, someone does a local_bh_disable, we get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no exceptions. Signed-off-by: Joe Korty [ OK ] Stopped target NFS client services. Stopping Logout off all iSCSI sessions on shutdown... Stopping NFS server and services... [ 109.442380] [ 109.442385] == [ 109.442386] WARNING: possible circular locking dependency detected [ 109.442387] 5.10.16-rt30 #1 Not tainted [ 109.442389] -- [ 109.442390] nfsd/1032 is trying to acquire lock: [ 109.442392] 994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270 [ 109.442405] [ 109.442405] but task is already holding lock: [ 109.442406] 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442415] [ 109.442415] which lock already depends on the new lock. [ 109.442415] [ 109.442416] [ 109.442416] the existing dependency chain (in reverse order) is: [ 109.442417] [ 109.442417] -> #1 (>sv_lock){+.+.}-{0:0}: [ 109.442421]rt_spin_lock+0x2b/0xc0 [ 109.442428]svc_add_new_perm_xprt+0x42/0xa0 [ 109.442430]svc_addsock+0x135/0x220 [ 109.442434]write_ports+0x4b3/0x620 [ 109.442438]nfsctl_transaction_write+0x45/0x80 [ 109.442440]vfs_write+0xff/0x420 [ 109.442444]ksys_write+0x4f/0xc0 [ 109.442446]do_syscall_64+0x33/0x40 [ 109.442450]entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 109.442454] [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 109.442457]__lock_acquire+0x1264/0x20b0 [ 109.442463]lock_acquire+0xc2/0x400 [ 109.442466]rt_spin_lock+0x2b/0xc0 [ 109.442469]__local_bh_disable_ip+0xd9/0x270 [ 109.442471]svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442474]svc_close_list+0x60/0x90 [ 109.442476]svc_close_net+0x49/0x1a0 [ 109.442478]svc_shutdown_net+0x12/0x40 [ 109.442480]nfsd_destroy+0xc5/0x180 [ 109.442482]nfsd+0x1bc/0x270 [ 109.442483]kthread+0x194/0x1b0 [ 109.442487]ret_from_fork+0x22/0x30 [ 109.442492] [ 109.442492] other info that might help us debug this: [ 109.442492] [ 109.442493] Possible unsafe locking scenario: [ 109.442493] [ 109.442493]CPU0CPU1 [ 109.442494] [ 109.442495] lock(>sv_lock); [ 109.442496]lock((softirq_ctrl.lock).lock); [ 109.442498]lock(>sv_lock); [ 109.442499] lock((softirq_ctrl.lock).lock); [ 109.442501] [ 109.442501] *** DEADLOCK *** [ 109.442501] [ 109.442501] 3 locks held by nfsd/1032: [ 109.442503] #0: 93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270 [ 109.442508] #1: 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442512] #2: 93a81b20 (rcu_read_lock){}-{1:2}, at: rt_spin_lock+0x5/0xc0 [ 109.442518] [ 109.442518] stack backtrace: [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015 [ 109.442524] Call Trace: [ 109.442527] dump_stack+0x77/0x97 [ 109.442533] check_noncircular+0xdc/0xf0 [ 109.442546] __lock_acquire+0x1264/0x20b0 [ 109.442553] lock_acquire+0xc2/0x400 [ 109.442564] rt_spin_lock+0x2b/0xc0 [ 109.442570] __local_bh_disable_ip+0xd9/0x270 [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442577] svc_close_list+0x60/0x90 [ 109.442581] svc_close_net+0x49/0x1a0 [ 109.442585] svc_shutdown_net+0x12/0x40 [ 109.442588] nfsd_destroy+0xc5/0x180 [ 109.442590] nfsd+0x1bc/0x270 [ 109.442595] kthread+0x194/0x1b0 [ 109.442600] ret_from_fork+0x22/0x30 [ 109.518225] nfsd: last server has exited, flushing export cache [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped GSSAPI Proxy Daemon. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped NFS status