Bob, On Thu, May 23, 2019 at 3:05 PM Bob Peterson <rpete...@redhat.com> wrote: > When a node fails, user space informs dlm of the node failure, > and dlm instructs gfs2 on the surviving nodes to perform journal > recovery. It does this by calling various callback functions in > lock_dlm.c. To mark its progress, it keeps generation numbers > and recover bits in a dlm "control" lock lvb, which is seen by > all nodes to determine which journals need to be replayed. > > The gfs2 on all nodes get the same recovery requests from dlm, > so they all try to do the recovery, but only one will be > granted the exclusive lock on the journal. The others fail > with a "Busy" message on their "try lock." > > However, when a node is withdrawn, it cannot safely do any > recovery or safely replay any journals. To make matters worse, > gfs2 might withdraw as a result of attempting recovery. For > example, this might happen if the device goes offline, or if > an hba fails. But in today's gfs2 code, it doesn't check for > being withdrawn at any step in the recovery process. What's > worse if that these callbacks from dlm have no return code, > so there is no way to indicate failure back to dlm. We can > send a "Recovery failed" uevent eventually, but that tells > user space what happened, not dlm's kernel code. > > Before this patch, lock_dlm would perform its recovery steps but > ignore the result, and eventually it would still update its > generation number in the lvb, despite the fact that it may have > withdrawn or encountered an error. The other nodes would then > see the newer generation number in the lvb and conclude that > they don't need to do recovery because the generation number > is newer than the last one they saw. They think a different > node has already recovered the journal. > > This patch adds checks to several of the callbacks used by dlm > in its recovery state machine so that the functions are ignored > and skipped if an io error has occurred or if the file system > is withdrawn. That prevents the lvb bits from being updated, and > therefore dlm and user space still see the need for recovery to > take place. > > Signed-off-by: Bob Peterson <rpete...@redhat.com> > --- > fs/gfs2/lock_dlm.c | 18 ++++++++++++++++++ > fs/gfs2/recovery.c | 5 +++++ > 2 files changed, 23 insertions(+) > > diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c > index 31df26ed7854..9329f86ffcbe 100644 > --- a/fs/gfs2/lock_dlm.c > +++ b/fs/gfs2/lock_dlm.c > @@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg) > struct gfs2_sbd *sdp = arg; > struct lm_lockstruct *ls = &sdp->sd_lockstruct; > > + if (gfs2_withdrawn(sdp)) { > + fs_err(sdp, "recover_prep ignored due to withdraw.\n"); > + return; > + } > spin_lock(&ls->ls_recover_spin); > ls->ls_recover_block = ls->ls_recover_start; > set_bit(DFL_DLM_RECOVERY, &ls->ls_recover_flags); > @@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct > dlm_slot *slot) > struct lm_lockstruct *ls = &sdp->sd_lockstruct; > int jid = slot->slot - 1; > > + if (gfs2_withdrawn(sdp)) { > + fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n", > + jid); > + return; > + } > spin_lock(&ls->ls_recover_spin); > if (ls->ls_recover_size < jid + 1) { > fs_err(sdp, "recover_slot jid %d gen %u short size %d\n", > @@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct > dlm_slot *slots, int num_slots, > struct gfs2_sbd *sdp = arg; > struct lm_lockstruct *ls = &sdp->sd_lockstruct; > > + if (gfs2_withdrawn(sdp)) { > + fs_err(sdp, "recover_done ignored due to withdraw.\n"); > + return; > + }
In gdlm_recover_prep, we're setting the DFL_DLM_RECOVERY flag to indicate that we're in recovery. Assume that a withdraw occurs after that. We'll then skip clearing the DFL_DLM_RECOVERY flag because of the above check. Won't that leave waiters on DFL_DLM_RECOVERY stuck? > /* ensure the ls jid arrays are large enough */ > set_recover_size(sdp, slots, num_slots); > > @@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, > unsigned int jid, > { > struct lm_lockstruct *ls = &sdp->sd_lockstruct; > > + if (gfs2_withdrawn(sdp)) { > + fs_err(sdp, "recovery_result jid %d ignored due to > withdraw.\n", > + jid); > + return; > + } > if (test_bit(DFL_NO_DLM_OPS, &ls->ls_recover_flags)) > return; > > diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c > index 4ce2bfdbefdc..9e15b5aa2cfb 100644 > --- a/fs/gfs2/recovery.c > +++ b/fs/gfs2/recovery.c > @@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work) > int jlocked = 0; > > t_start = ktime_get(); > + if (gfs2_withdrawn(sdp)) { > + fs_err(sdp, "jid=%u: Recovery not attempted due to > withdraw.\n", > + jd->jd_jid); > + goto fail; > + } > if (sdp->sd_args.ar_spectator) > goto fail; > if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) { > -- > 2.21.0 > Thanks, Andreas