Hi Bob, On Mon, Jan 24, 2022 at 6:28 PM Bob Peterson <rpete...@redhat.com> wrote: > Sometimes when gfs2 cancels a glock request, dlm needs time to take the > request off its Conversion queue. During that time, we get -EBUSY from > dlm, which confuses the glock state machine. Ideally we want dlm to > not return -EBUSY but wait until the operation has completed. This is > a stop-gap measure until dlm has a solution in place.
I'm not going to hold my breath until then. We can do better with a more in-depth patch description here. Also, > Signed-off-by: Bob Peterson <rpete...@redhat.com> > --- > fs/gfs2/lock_dlm.c | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c > index 50578f881e6d..bf03c77b6757 100644 > --- a/fs/gfs2/lock_dlm.c > +++ b/fs/gfs2/lock_dlm.c > @@ -258,7 +258,7 @@ static int gdlm_lock(struct gfs2_glock *gl, unsigned int > req_state, > unsigned int flags) > { > struct lm_lockstruct *ls = &gl->gl_name.ln_sbd->sd_lockstruct; > - int req; > + int req, ret; > u32 lkf; > char strname[GDLM_STRNAME_BYTES] = ""; > > @@ -278,9 +278,12 @@ static int gdlm_lock(struct gfs2_glock *gl, unsigned int > req_state, > /* > * Submit the actual lock request. > */ > - > - return dlm_lock(ls->ls_dlm, req, &gl->gl_lksb, lkf, strname, > - GDLM_STRNAME_BYTES - 1, 0, gdlm_ast, gl, gdlm_bast); > + do { > + ret = dlm_lock(ls->ls_dlm, req, &gl->gl_lksb, lkf, strname, > + GDLM_STRNAME_BYTES - 1, 0, gdlm_ast, gl, > + gdlm_bast); we should sleep here and in gdlm_put_lock() so that in the rare cases when dlm ends up in that busy state, we won't eat up 100% cpu time at least. > + } while (ret == -EBUSY); > + return ret; > } > > static void gdlm_put_lock(struct gfs2_glock *gl) > @@ -312,8 +315,11 @@ static void gdlm_put_lock(struct gfs2_glock *gl) > return; > } > > - error = dlm_unlock(ls->ls_dlm, gl->gl_lksb.sb_lkid, DLM_LKF_VALBLK, > - NULL, gl); > + do { > + error = dlm_unlock(ls->ls_dlm, gl->gl_lksb.sb_lkid, > + DLM_LKF_VALBLK, NULL, gl); > + } while (error == -EBUSY); > + > if (error) { > fs_err(sdp, "gdlm_unlock %x,%llx err=%d\n", > gl->gl_name.ln_type, > @@ -506,7 +512,9 @@ static int sync_unlock(struct gfs2_sbd *sdp, struct > dlm_lksb *lksb, char *name) > struct lm_lockstruct *ls = &sdp->sd_lockstruct; > int error; > > - error = dlm_unlock(ls->ls_dlm, lksb->sb_lkid, 0, lksb, ls); > + do { > + error = dlm_unlock(ls->ls_dlm, lksb->sb_lkid, 0, lksb, ls); > + } while (error == -EBUSY); There's a sync_lock() as well as a sync_unlock(). But those locks are never canceled, so we don't really bother about -EBUSY. > if (error) { > fs_err(sdp, "%s lkid %x error %d\n", > name, lksb->sb_lkid, error); > -- > 2.34.1 Let me post an updated version. Thanks, Andreas