2] fs: dlm: fix possible circular locking dependency

Alexander Aring Thu, 06 Oct 2022 17:01:23 -0700

Hi,

On Thu, Oct 6, 2022 at 5:01 PM Alexander Aring <[email protected]> wrote:
>
> Hi,
>
> On Thu, Oct 6, 2022 at 11:56 AM Alexander Aring <[email protected]> wrote:
> >
> > This patch fixes the following false positive circular locking
> > dependency:
> >
> > [   61.272390] ======================================================
> > [   61.273416] WARNING: possible circular locking dependency detected
> > [   61.274474] 6.0.0+ #1949 Tainted: G        W
> > [   61.275318] ------------------------------------------------------
> > [   61.276336] umount/1205 is trying to acquire lock:
> > [   61.277135] ffffa17e7d1d2158 
> > ((work_completion)(&con->rwork)){+.+.}-{0:0}, at: __flush_work+0x4d/0x4c0
> > [   61.278652]
> >                but task is already holding lock:
> > [   61.279615] ffffa17e07888c80 (&con->sock_mutex){+.+.}-{3:3}, at: 
> > close_connection+0x67/0x180
> > [   61.281005]
> >                which lock already depends on the new lock.
> >
> > [   61.282375]
> >                the existing dependency chain (in reverse order) is:
> > [   61.283583]
> >                -> #1 (&con->sock_mutex){+.+.}-{3:3}:
> > [   61.284595]        lock_acquire+0xd3/0x300
> > [   61.285280]        __mutex_lock+0x99/0x1040
> > [   61.285955]        mutex_lock_nested+0x1b/0x30
> > [   61.286679]        receive_from_sock+0x40/0x350
> > [   61.287449]        process_recv_sockets+0x15/0x20
> > [   61.288249]        process_one_work+0x286/0x5f0
> > [   61.288989]        worker_thread+0x44/0x390
> > [   61.289674]        kthread+0x107/0x130
> > [   61.290310]        ret_from_fork+0x1f/0x30
> > [   61.291006]
> >                -> #0 ((work_completion)(&con->rwork)){+.+.}-{0:0}:
> > [   61.292215]        check_prevs_add+0x18b/0x1040
> > [   61.292980]        __lock_acquire+0x11ec/0x1630
> > [   61.293721]        lock_acquire+0xd3/0x300
> > [   61.294403]        __flush_work+0x6d/0x4c0
> > [   61.295076]        __cancel_work_timer+0x156/0x1e0
> > [   61.295855]        cancel_work_sync+0x10/0x20
> > [   61.296581]        close_connection+0x12a/0x180
> > [   61.297338]        close_connection+0x150/0x180
> > [   61.298071]        free_conn+0x21/0xc0
> > [   61.298682]        foreach_conn+0x49/0x70
> > [   61.299347]        dlm_lowcomms_stop+0x75/0xf0
> > [   61.300071]        dlm_release_lockspace+0x3fa/0x520
> > [   61.300884]        gdlm_unmount+0x64/0x90
> > [   61.301544]        gfs2_lm_unmount+0x37/0x50
> > [   61.302262]        gfs2_put_super+0x193/0x220
> > [   61.303002]        generic_shutdown_super+0x77/0x130
> > [   61.303843]        kill_block_super+0x27/0x50
> > [   61.304567]        gfs2_kill_sb+0x68/0x80
> > [   61.305254]        deactivate_locked_super+0x32/0x80
> > [   61.306054]        deactivate_super+0x59/0x60
> > [   61.306760]        cleanup_mnt+0xbd/0x150
> > [   61.307431]        __cleanup_mnt+0x12/0x20
> > [   61.308109]        task_work_run+0x6f/0xc0
> > [   61.308768]        exit_to_user_mode_prepare+0x1c4/0x1d0
> > [   61.309633]        syscall_exit_to_user_mode+0x1d/0x50
> > [   61.310469]        do_syscall_64+0x46/0x90
> > [   61.311139]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [   61.312036]
> >                other info that might help us debug this:
> >
> > [   61.313328]  Possible unsafe locking scenario:
> >
> > [   61.314316]        CPU0                    CPU1
> > [   61.315077]        ----                    ----
> > [   61.315814]   lock(&con->sock_mutex);
> > [   61.316432]                                
> > lock((work_completion)(&con->rwork));
> > [   61.317621]                                lock(&con->sock_mutex);
> > [   61.318628]   lock((work_completion)(&con->rwork));
> > [   61.319445]
> >                 *** DEADLOCK ***
> >
>
> I got another one... I will try to make a different approach about the
> use of the socket lock. We need it so that a socket doesn't get closed


s/closed/released/

> when we are actually running some workqueue handling... The new
> approach will also allow more full duplex dlm lowcomms handling.
> Currently it's kind of half-duplex because the socket mutex hangs
> around in send and recv(incl. processing)...

internal socket handling will lock as well, but the whole mutex lock
recv/send is around the whole workers loop.

- Alex

Re: [Cluster-devel] [PATCH dlm/next 1/2] fs: dlm: fix possible circular locking dependency

Reply via email to