Re: [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn

2019-08-27 Thread Andreas Gruenbacher
Bob,

On Thu, May 23, 2019 at 3:05 PM Bob Peterson  wrote:
> When a node fails, user space informs dlm of the node failure,
> and dlm instructs gfs2 on the surviving nodes to perform journal
> recovery. It does this by calling various callback functions in
> lock_dlm.c. To mark its progress, it keeps generation numbers
> and recover bits in a dlm "control" lock lvb, which is seen by
> all nodes to determine which journals need to be replayed.
>
> The gfs2 on all nodes get the same recovery requests from dlm,
> so they all try to do the recovery, but only one will be
> granted the exclusive lock on the journal. The others fail
> with a "Busy" message on their "try lock."
>
> However, when a node is withdrawn, it cannot safely do any
> recovery or safely replay any journals. To make matters worse,
> gfs2 might withdraw as a result of attempting recovery. For
> example, this might happen if the device goes offline, or if
> an hba fails. But in today's gfs2 code, it doesn't check for
> being withdrawn at any step in the recovery process. What's
> worse if that these callbacks from dlm have no return code,
> so there is no way to indicate failure back to dlm. We can
> send a "Recovery failed" uevent eventually, but that tells
> user space what happened, not dlm's kernel code.
>
> Before this patch, lock_dlm would perform its recovery steps but
> ignore the result, and eventually it would still update its
> generation number in the lvb, despite the fact that it may have
> withdrawn or encountered an error. The other nodes would then
> see the newer generation number in the lvb and conclude that
> they don't need to do recovery because the generation number
> is newer than the last one they saw. They think a different
> node has already recovered the journal.
>
> This patch adds checks to several of the callbacks used by dlm
> in its recovery state machine so that the functions are ignored
> and skipped if an io error has occurred or if the file system
> is withdrawn. That prevents the lvb bits from being updated, and
> therefore dlm and user space still see the need for recovery to
> take place.
>
> Signed-off-by: Bob Peterson 
> ---
>  fs/gfs2/lock_dlm.c | 18 ++
>  fs/gfs2/recovery.c |  5 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> index 31df26ed7854..9329f86ffcbe 100644
> --- a/fs/gfs2/lock_dlm.c
> +++ b/fs/gfs2/lock_dlm.c
> @@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg)
> struct gfs2_sbd *sdp = arg;
> struct lm_lockstruct *ls = >sd_lockstruct;
>
> +   if (gfs2_withdrawn(sdp)) {
> +   fs_err(sdp, "recover_prep ignored due to withdraw.\n");
> +   return;
> +   }
> spin_lock(>ls_recover_spin);
> ls->ls_recover_block = ls->ls_recover_start;
> set_bit(DFL_DLM_RECOVERY, >ls_recover_flags);
> @@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct 
> dlm_slot *slot)
> struct lm_lockstruct *ls = >sd_lockstruct;
> int jid = slot->slot - 1;
>
> +   if (gfs2_withdrawn(sdp)) {
> +   fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
> +  jid);
> +   return;
> +   }
> spin_lock(>ls_recover_spin);
> if (ls->ls_recover_size < jid + 1) {
> fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
> @@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct 
> dlm_slot *slots, int num_slots,
> struct gfs2_sbd *sdp = arg;
> struct lm_lockstruct *ls = >sd_lockstruct;
>
> +   if (gfs2_withdrawn(sdp)) {
> +   fs_err(sdp, "recover_done ignored due to withdraw.\n");
> +   return;
> +   }

In gdlm_recover_prep, we're setting the DFL_DLM_RECOVERY flag to
indicate that we're in recovery. Assume that a withdraw occurs after
that. We'll then skip clearing the DFL_DLM_RECOVERY flag because of
the above check. Won't that leave waiters on DFL_DLM_RECOVERY stuck?

> /* ensure the ls jid arrays are large enough */
> set_recover_size(sdp, slots, num_slots);
>
> @@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, 
> unsigned int jid,
>  {
> struct lm_lockstruct *ls = >sd_lockstruct;
>
> +   if (gfs2_withdrawn(sdp)) {
> +   fs_err(sdp, "recovery_result jid %d ignored due to 
> withdraw.\n",
> +  jid);
> +   return;
> +   }
> if (test_bit(DFL_NO_DLM_OPS, >ls_recover_flags))
> return;
>
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 4ce2bfdbefdc..9e15b5aa2cfb 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work)
> int jlocked = 0;
>
> t_start = ktime_get();
> +   if (gfs2_withdrawn(sdp)) {
> +   fs_err(sdp, "jid=%u: Recovery not attempted due to 
> 

[Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn

2019-05-23 Thread Bob Peterson
When a node fails, user space informs dlm of the node failure,
and dlm instructs gfs2 on the surviving nodes to perform journal
recovery. It does this by calling various callback functions in
lock_dlm.c. To mark its progress, it keeps generation numbers
and recover bits in a dlm "control" lock lvb, which is seen by
all nodes to determine which journals need to be replayed.

The gfs2 on all nodes get the same recovery requests from dlm,
so they all try to do the recovery, but only one will be
granted the exclusive lock on the journal. The others fail
with a "Busy" message on their "try lock."

However, when a node is withdrawn, it cannot safely do any
recovery or safely replay any journals. To make matters worse,
gfs2 might withdraw as a result of attempting recovery. For
example, this might happen if the device goes offline, or if
an hba fails. But in today's gfs2 code, it doesn't check for
being withdrawn at any step in the recovery process. What's
worse if that these callbacks from dlm have no return code,
so there is no way to indicate failure back to dlm. We can
send a "Recovery failed" uevent eventually, but that tells
user space what happened, not dlm's kernel code.

Before this patch, lock_dlm would perform its recovery steps but
ignore the result, and eventually it would still update its
generation number in the lvb, despite the fact that it may have
withdrawn or encountered an error. The other nodes would then
see the newer generation number in the lvb and conclude that
they don't need to do recovery because the generation number
is newer than the last one they saw. They think a different
node has already recovered the journal.

This patch adds checks to several of the callbacks used by dlm
in its recovery state machine so that the functions are ignored
and skipped if an io error has occurred or if the file system
is withdrawn. That prevents the lvb bits from being updated, and
therefore dlm and user space still see the need for recovery to
take place.

Signed-off-by: Bob Peterson 
---
 fs/gfs2/lock_dlm.c | 18 ++
 fs/gfs2/recovery.c |  5 +
 2 files changed, 23 insertions(+)

diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 31df26ed7854..9329f86ffcbe 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg)
struct gfs2_sbd *sdp = arg;
struct lm_lockstruct *ls = >sd_lockstruct;
 
+   if (gfs2_withdrawn(sdp)) {
+   fs_err(sdp, "recover_prep ignored due to withdraw.\n");
+   return;
+   }
spin_lock(>ls_recover_spin);
ls->ls_recover_block = ls->ls_recover_start;
set_bit(DFL_DLM_RECOVERY, >ls_recover_flags);
@@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct dlm_slot 
*slot)
struct lm_lockstruct *ls = >sd_lockstruct;
int jid = slot->slot - 1;
 
+   if (gfs2_withdrawn(sdp)) {
+   fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
+  jid);
+   return;
+   }
spin_lock(>ls_recover_spin);
if (ls->ls_recover_size < jid + 1) {
fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
@@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct dlm_slot 
*slots, int num_slots,
struct gfs2_sbd *sdp = arg;
struct lm_lockstruct *ls = >sd_lockstruct;
 
+   if (gfs2_withdrawn(sdp)) {
+   fs_err(sdp, "recover_done ignored due to withdraw.\n");
+   return;
+   }
/* ensure the ls jid arrays are large enough */
set_recover_size(sdp, slots, num_slots);
 
@@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, 
unsigned int jid,
 {
struct lm_lockstruct *ls = >sd_lockstruct;
 
+   if (gfs2_withdrawn(sdp)) {
+   fs_err(sdp, "recovery_result jid %d ignored due to withdraw.\n",
+  jid);
+   return;
+   }
if (test_bit(DFL_NO_DLM_OPS, >ls_recover_flags))
return;
 
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 4ce2bfdbefdc..9e15b5aa2cfb 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work)
int jlocked = 0;
 
t_start = ktime_get();
+   if (gfs2_withdrawn(sdp)) {
+   fs_err(sdp, "jid=%u: Recovery not attempted due to withdraw.\n",
+  jd->jd_jid);
+   goto fail;
+   }
if (sdp->sd_args.ar_spectator)
goto fail;
if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
-- 
2.21.0