Hi, On Thu, Oct 6, 2022 at 11:56 AM Alexander Aring <aahri...@redhat.com> wrote: > > This patch fixes the following false positive circular locking > dependency: > > [ 61.272390] ====================================================== > [ 61.273416] WARNING: possible circular locking dependency detected > [ 61.274474] 6.0.0+ #1949 Tainted: G W > [ 61.275318] ------------------------------------------------------ > [ 61.276336] umount/1205 is trying to acquire lock: > [ 61.277135] ffffa17e7d1d2158 ((work_completion)(&con->rwork)){+.+.}-{0:0}, > at: __flush_work+0x4d/0x4c0 > [ 61.278652] > but task is already holding lock: > [ 61.279615] ffffa17e07888c80 (&con->sock_mutex){+.+.}-{3:3}, at: > close_connection+0x67/0x180 > [ 61.281005] > which lock already depends on the new lock. > > [ 61.282375] > the existing dependency chain (in reverse order) is: > [ 61.283583] > -> #1 (&con->sock_mutex){+.+.}-{3:3}: > [ 61.284595] lock_acquire+0xd3/0x300 > [ 61.285280] __mutex_lock+0x99/0x1040 > [ 61.285955] mutex_lock_nested+0x1b/0x30 > [ 61.286679] receive_from_sock+0x40/0x350 > [ 61.287449] process_recv_sockets+0x15/0x20 > [ 61.288249] process_one_work+0x286/0x5f0 > [ 61.288989] worker_thread+0x44/0x390 > [ 61.289674] kthread+0x107/0x130 > [ 61.290310] ret_from_fork+0x1f/0x30 > [ 61.291006] > -> #0 ((work_completion)(&con->rwork)){+.+.}-{0:0}: > [ 61.292215] check_prevs_add+0x18b/0x1040 > [ 61.292980] __lock_acquire+0x11ec/0x1630 > [ 61.293721] lock_acquire+0xd3/0x300 > [ 61.294403] __flush_work+0x6d/0x4c0 > [ 61.295076] __cancel_work_timer+0x156/0x1e0 > [ 61.295855] cancel_work_sync+0x10/0x20 > [ 61.296581] close_connection+0x12a/0x180 > [ 61.297338] close_connection+0x150/0x180 > [ 61.298071] free_conn+0x21/0xc0 > [ 61.298682] foreach_conn+0x49/0x70 > [ 61.299347] dlm_lowcomms_stop+0x75/0xf0 > [ 61.300071] dlm_release_lockspace+0x3fa/0x520 > [ 61.300884] gdlm_unmount+0x64/0x90 > [ 61.301544] gfs2_lm_unmount+0x37/0x50 > [ 61.302262] gfs2_put_super+0x193/0x220 > [ 61.303002] generic_shutdown_super+0x77/0x130 > [ 61.303843] kill_block_super+0x27/0x50 > [ 61.304567] gfs2_kill_sb+0x68/0x80 > [ 61.305254] deactivate_locked_super+0x32/0x80 > [ 61.306054] deactivate_super+0x59/0x60 > [ 61.306760] cleanup_mnt+0xbd/0x150 > [ 61.307431] __cleanup_mnt+0x12/0x20 > [ 61.308109] task_work_run+0x6f/0xc0 > [ 61.308768] exit_to_user_mode_prepare+0x1c4/0x1d0 > [ 61.309633] syscall_exit_to_user_mode+0x1d/0x50 > [ 61.310469] do_syscall_64+0x46/0x90 > [ 61.311139] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 61.312036] > other info that might help us debug this: > > [ 61.313328] Possible unsafe locking scenario: > > [ 61.314316] CPU0 CPU1 > [ 61.315077] ---- ---- > [ 61.315814] lock(&con->sock_mutex); > [ 61.316432] > lock((work_completion)(&con->rwork)); > [ 61.317621] lock(&con->sock_mutex); > [ 61.318628] lock((work_completion)(&con->rwork)); > [ 61.319445] > *** DEADLOCK *** >
I got another one... I will try to make a different approach about the use of the socket lock. We need it so that a socket doesn't get closed when we are actually running some workqueue handling... The new approach will also allow more full duplex dlm lowcomms handling. Currently it's kind of half-duplex because the socket mutex hangs around in send and recv(incl. processing)... - Alex