Bob,
On Thu, May 23, 2019 at 3:05 PM Bob Peterson wrote:
> When a node fails, user space informs dlm of the node failure,
> and dlm instructs gfs2 on the surviving nodes to perform journal
> recovery. It does this by calling various callback functions in
> lock_dlm.c. To mark its progress, it keeps generation numbers
> and recover bits in a dlm "control" lock lvb, which is seen by
> all nodes to determine which journals need to be replayed.
>
> The gfs2 on all nodes get the same recovery requests from dlm,
> so they all try to do the recovery, but only one will be
> granted the exclusive lock on the journal. The others fail
> with a "Busy" message on their "try lock."
>
> However, when a node is withdrawn, it cannot safely do any
> recovery or safely replay any journals. To make matters worse,
> gfs2 might withdraw as a result of attempting recovery. For
> example, this might happen if the device goes offline, or if
> an hba fails. But in today's gfs2 code, it doesn't check for
> being withdrawn at any step in the recovery process. What's
> worse if that these callbacks from dlm have no return code,
> so there is no way to indicate failure back to dlm. We can
> send a "Recovery failed" uevent eventually, but that tells
> user space what happened, not dlm's kernel code.
>
> Before this patch, lock_dlm would perform its recovery steps but
> ignore the result, and eventually it would still update its
> generation number in the lvb, despite the fact that it may have
> withdrawn or encountered an error. The other nodes would then
> see the newer generation number in the lvb and conclude that
> they don't need to do recovery because the generation number
> is newer than the last one they saw. They think a different
> node has already recovered the journal.
>
> This patch adds checks to several of the callbacks used by dlm
> in its recovery state machine so that the functions are ignored
> and skipped if an io error has occurred or if the file system
> is withdrawn. That prevents the lvb bits from being updated, and
> therefore dlm and user space still see the need for recovery to
> take place.
>
> Signed-off-by: Bob Peterson
> ---
> fs/gfs2/lock_dlm.c | 18 ++
> fs/gfs2/recovery.c | 5 +
> 2 files changed, 23 insertions(+)
>
> diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> index 31df26ed7854..9329f86ffcbe 100644
> --- a/fs/gfs2/lock_dlm.c
> +++ b/fs/gfs2/lock_dlm.c
> @@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg)
> struct gfs2_sbd *sdp = arg;
> struct lm_lockstruct *ls = >sd_lockstruct;
>
> + if (gfs2_withdrawn(sdp)) {
> + fs_err(sdp, "recover_prep ignored due to withdraw.\n");
> + return;
> + }
> spin_lock(>ls_recover_spin);
> ls->ls_recover_block = ls->ls_recover_start;
> set_bit(DFL_DLM_RECOVERY, >ls_recover_flags);
> @@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct
> dlm_slot *slot)
> struct lm_lockstruct *ls = >sd_lockstruct;
> int jid = slot->slot - 1;
>
> + if (gfs2_withdrawn(sdp)) {
> + fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
> + jid);
> + return;
> + }
> spin_lock(>ls_recover_spin);
> if (ls->ls_recover_size < jid + 1) {
> fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
> @@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct
> dlm_slot *slots, int num_slots,
> struct gfs2_sbd *sdp = arg;
> struct lm_lockstruct *ls = >sd_lockstruct;
>
> + if (gfs2_withdrawn(sdp)) {
> + fs_err(sdp, "recover_done ignored due to withdraw.\n");
> + return;
> + }
In gdlm_recover_prep, we're setting the DFL_DLM_RECOVERY flag to
indicate that we're in recovery. Assume that a withdraw occurs after
that. We'll then skip clearing the DFL_DLM_RECOVERY flag because of
the above check. Won't that leave waiters on DFL_DLM_RECOVERY stuck?
> /* ensure the ls jid arrays are large enough */
> set_recover_size(sdp, slots, num_slots);
>
> @@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp,
> unsigned int jid,
> {
> struct lm_lockstruct *ls = >sd_lockstruct;
>
> + if (gfs2_withdrawn(sdp)) {
> + fs_err(sdp, "recovery_result jid %d ignored due to
> withdraw.\n",
> + jid);
> + return;
> + }
> if (test_bit(DFL_NO_DLM_OPS, >ls_recover_flags))
> return;
>
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 4ce2bfdbefdc..9e15b5aa2cfb 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work)
> int jlocked = 0;
>
> t_start = ktime_get();
> + if (gfs2_withdrawn(sdp)) {
> + fs_err(sdp, "jid=%u: Recovery not attempted due to
>