Bob,

On Thu, May 23, 2019 at 3:05 PM Bob Peterson <rpete...@redhat.com> wrote:
> When a node fails, user space informs dlm of the node failure,
> and dlm instructs gfs2 on the surviving nodes to perform journal
> recovery. It does this by calling various callback functions in
> lock_dlm.c. To mark its progress, it keeps generation numbers
> and recover bits in a dlm "control" lock lvb, which is seen by
> all nodes to determine which journals need to be replayed.
>
> The gfs2 on all nodes get the same recovery requests from dlm,
> so they all try to do the recovery, but only one will be
> granted the exclusive lock on the journal. The others fail
> with a "Busy" message on their "try lock."
>
> However, when a node is withdrawn, it cannot safely do any
> recovery or safely replay any journals. To make matters worse,
> gfs2 might withdraw as a result of attempting recovery. For
> example, this might happen if the device goes offline, or if
> an hba fails. But in today's gfs2 code, it doesn't check for
> being withdrawn at any step in the recovery process. What's
> worse if that these callbacks from dlm have no return code,
> so there is no way to indicate failure back to dlm. We can
> send a "Recovery failed" uevent eventually, but that tells
> user space what happened, not dlm's kernel code.
>
> Before this patch, lock_dlm would perform its recovery steps but
> ignore the result, and eventually it would still update its
> generation number in the lvb, despite the fact that it may have
> withdrawn or encountered an error. The other nodes would then
> see the newer generation number in the lvb and conclude that
> they don't need to do recovery because the generation number
> is newer than the last one they saw. They think a different
> node has already recovered the journal.
>
> This patch adds checks to several of the callbacks used by dlm
> in its recovery state machine so that the functions are ignored
> and skipped if an io error has occurred or if the file system
> is withdrawn. That prevents the lvb bits from being updated, and
> therefore dlm and user space still see the need for recovery to
> take place.
>
> Signed-off-by: Bob Peterson <rpete...@redhat.com>
> ---
>  fs/gfs2/lock_dlm.c | 18 ++++++++++++++++++
>  fs/gfs2/recovery.c |  5 +++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> index 31df26ed7854..9329f86ffcbe 100644
> --- a/fs/gfs2/lock_dlm.c
> +++ b/fs/gfs2/lock_dlm.c
> @@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg)
>         struct gfs2_sbd *sdp = arg;
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recover_prep ignored due to withdraw.\n");
> +               return;
> +       }
>         spin_lock(&ls->ls_recover_spin);
>         ls->ls_recover_block = ls->ls_recover_start;
>         set_bit(DFL_DLM_RECOVERY, &ls->ls_recover_flags);
> @@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct 
> dlm_slot *slot)
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>         int jid = slot->slot - 1;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
> +                      jid);
> +               return;
> +       }
>         spin_lock(&ls->ls_recover_spin);
>         if (ls->ls_recover_size < jid + 1) {
>                 fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
> @@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct 
> dlm_slot *slots, int num_slots,
>         struct gfs2_sbd *sdp = arg;
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recover_done ignored due to withdraw.\n");
> +               return;
> +       }

In gdlm_recover_prep, we're setting the DFL_DLM_RECOVERY flag to
indicate that we're in recovery. Assume that a withdraw occurs after
that. We'll then skip clearing the DFL_DLM_RECOVERY flag because of
the above check. Won't that leave waiters on DFL_DLM_RECOVERY stuck?

>         /* ensure the ls jid arrays are large enough */
>         set_recover_size(sdp, slots, num_slots);
>
> @@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, 
> unsigned int jid,
>  {
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recovery_result jid %d ignored due to 
> withdraw.\n",
> +                      jid);
> +               return;
> +       }
>         if (test_bit(DFL_NO_DLM_OPS, &ls->ls_recover_flags))
>                 return;
>
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 4ce2bfdbefdc..9e15b5aa2cfb 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work)
>         int jlocked = 0;
>
>         t_start = ktime_get();
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "jid=%u: Recovery not attempted due to 
> withdraw.\n",
> +                      jd->jd_jid);
> +               goto fail;
> +       }
>         if (sdp->sd_args.ar_spectator)
>                 goto fail;
>         if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
> --
> 2.21.0
>

Thanks,
Andreas

Reply via email to