Hi,

On Thu, Oct 6, 2022 at 11:56 AM Alexander Aring <aahri...@redhat.com> wrote:
>
> This patch fixes the following false positive circular locking
> dependency:
>
> [   61.272390] ======================================================
> [   61.273416] WARNING: possible circular locking dependency detected
> [   61.274474] 6.0.0+ #1949 Tainted: G        W
> [   61.275318] ------------------------------------------------------
> [   61.276336] umount/1205 is trying to acquire lock:
> [   61.277135] ffffa17e7d1d2158 ((work_completion)(&con->rwork)){+.+.}-{0:0}, 
> at: __flush_work+0x4d/0x4c0
> [   61.278652]
>                but task is already holding lock:
> [   61.279615] ffffa17e07888c80 (&con->sock_mutex){+.+.}-{3:3}, at: 
> close_connection+0x67/0x180
> [   61.281005]
>                which lock already depends on the new lock.
>
> [   61.282375]
>                the existing dependency chain (in reverse order) is:
> [   61.283583]
>                -> #1 (&con->sock_mutex){+.+.}-{3:3}:
> [   61.284595]        lock_acquire+0xd3/0x300
> [   61.285280]        __mutex_lock+0x99/0x1040
> [   61.285955]        mutex_lock_nested+0x1b/0x30
> [   61.286679]        receive_from_sock+0x40/0x350
> [   61.287449]        process_recv_sockets+0x15/0x20
> [   61.288249]        process_one_work+0x286/0x5f0
> [   61.288989]        worker_thread+0x44/0x390
> [   61.289674]        kthread+0x107/0x130
> [   61.290310]        ret_from_fork+0x1f/0x30
> [   61.291006]
>                -> #0 ((work_completion)(&con->rwork)){+.+.}-{0:0}:
> [   61.292215]        check_prevs_add+0x18b/0x1040
> [   61.292980]        __lock_acquire+0x11ec/0x1630
> [   61.293721]        lock_acquire+0xd3/0x300
> [   61.294403]        __flush_work+0x6d/0x4c0
> [   61.295076]        __cancel_work_timer+0x156/0x1e0
> [   61.295855]        cancel_work_sync+0x10/0x20
> [   61.296581]        close_connection+0x12a/0x180
> [   61.297338]        close_connection+0x150/0x180
> [   61.298071]        free_conn+0x21/0xc0
> [   61.298682]        foreach_conn+0x49/0x70
> [   61.299347]        dlm_lowcomms_stop+0x75/0xf0
> [   61.300071]        dlm_release_lockspace+0x3fa/0x520
> [   61.300884]        gdlm_unmount+0x64/0x90
> [   61.301544]        gfs2_lm_unmount+0x37/0x50
> [   61.302262]        gfs2_put_super+0x193/0x220
> [   61.303002]        generic_shutdown_super+0x77/0x130
> [   61.303843]        kill_block_super+0x27/0x50
> [   61.304567]        gfs2_kill_sb+0x68/0x80
> [   61.305254]        deactivate_locked_super+0x32/0x80
> [   61.306054]        deactivate_super+0x59/0x60
> [   61.306760]        cleanup_mnt+0xbd/0x150
> [   61.307431]        __cleanup_mnt+0x12/0x20
> [   61.308109]        task_work_run+0x6f/0xc0
> [   61.308768]        exit_to_user_mode_prepare+0x1c4/0x1d0
> [   61.309633]        syscall_exit_to_user_mode+0x1d/0x50
> [   61.310469]        do_syscall_64+0x46/0x90
> [   61.311139]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [   61.312036]
>                other info that might help us debug this:
>
> [   61.313328]  Possible unsafe locking scenario:
>
> [   61.314316]        CPU0                    CPU1
> [   61.315077]        ----                    ----
> [   61.315814]   lock(&con->sock_mutex);
> [   61.316432]                                
> lock((work_completion)(&con->rwork));
> [   61.317621]                                lock(&con->sock_mutex);
> [   61.318628]   lock((work_completion)(&con->rwork));
> [   61.319445]
>                 *** DEADLOCK ***
>

I got another one... I will try to make a different approach about the
use of the socket lock. We need it so that a socket doesn't get closed
when we are actually running some workqueue handling... The new
approach will also allow more full duplex dlm lowcomms handling.
Currently it's kind of half-duplex because the socket mutex hangs
around in send and recv(incl. processing)...

- Alex

Reply via email to