Bob,
On Tue, Aug 2, 2022 at 7:58 PM Bob Peterson <[email protected]> wrote:
> There are a couple places in function do_xmote where normal processing
> is circumvented due to withdraws in progress. However, since we bypass
> most of do_xmote() we bypass telling dlm to lock the dlm lock, which
> means dlm will never respond with a completion callback. Since the
> completion callback ordinarily clears GLF_LOCK, this patch changes
> function do_xmote to handle those situations more gracefully so the
> file system may be unmounted after withdraw.
>
> A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
> which is cleared by function finish_xmote(). Since the withdraw causes
> us to skip the majority of do_xmote, it therefore also skips the call
> to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
> manually as well.
>
> Signed-off-by: Bob Peterson <[email protected]>
> ---
> fs/gfs2/glock.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 0bfecffd71f1..d508d8fa0838 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -59,6 +59,8 @@ typedef void (*glock_examiner) (struct gfs2_glock * gl);
>
> static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned
> int target);
> static void __gfs2_glock_dq(struct gfs2_holder *gh);
> +static void handle_callback(struct gfs2_glock *gl, unsigned int state,
> + unsigned long delay, bool remote);
>
> static struct dentry *gfs2_root;
> static struct workqueue_struct *glock_workqueue;
> @@ -762,8 +764,21 @@ __acquires(&gl->gl_lockref.lock)
> int ret;
>
> if (target != LM_ST_UNLOCKED && glock_blocked_by_withdraw(gl) &&
> - gh && !(gh->gh_flags & LM_FLAG_NOEXP))
> + gh && !(gh->gh_flags & LM_FLAG_NOEXP)) {
> + /*
> + * We won't tell dlm to perform the lock, so we won't get a
> + * reply that would otherwise clear GLF_LOCK. So we clear it.
> + */
> + handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> + clear_bit(GLF_LOCK, &gl->gl_flags);
> + clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
> + /*
> + * Don't increment lockref here. The next time the worker
> runs it will do
> + * glock_put, which will decrement it to 0, and free the
> glock.
> + */
I don't understand the reference counting logic here: where's the
alleged reference coming from that we're passing on to the work
function here?
Note that further below in do_xmote(), we're calling gfs2_glock_hold()
followed by gfs2_glock_queue_work(), so the reference counting logic
seems normal there -- except that when ->lm_lock returns an error,
we're apparently leaking a reference. So maybe the gfs2_glock_hold()
should be moved right in front of the gfs2_glock_queue_work() calls to
make the code less fragile?
> + __gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
> return;
> + }
> lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
> LM_FLAG_PRIORITY);
> GLOCK_BUG_ON(gl, gl->gl_state == target);
> @@ -848,6 +863,8 @@ __acquires(&gl->gl_lockref.lock)
> (target != LM_ST_UNLOCKED ||
> test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags))) {
> if (!is_system_glock(gl)) {
> + clear_bit(GLF_LOCK, &gl->gl_flags);
> + clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
> gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
> goto out;
> } else {
> --
> 2.36.1
>
Thanks,
Andreas