This patch adds some comment sections to make aware that the ls_recover()
function should never fail before membership handling. Membership
handling means to add/remove nodes from the lockspace ls_nodes
attribute in dlm_recover_members().

This is because there are functionality like dlm_midcomms_add_member(),
dlm_midcomms_remove_member() or dlm_lsop_recover_slot() which should
always get aware of any join or leave of lockspace members. If we add a
e.g. dlm_locking_stopped() before dlm_recover_members() to check if the
recovery was interrupted and abort it we might skip to call
dlm_midcomms_add_member(), dlm_midcomms_remove_member() or
dlm_lsop_recover_slot().

A reason because the recovery is interrupted could be that the cluster
manager notified about a new configuration .e.g. members joined or
leaved. It is fine to interrupt or fail the recovery handling after
the mentioned handling of dlm_recover_members() but never before.

Signed-off-by: Alexander Aring <aahri...@redhat.com>
---
 fs/dlm/member.c   | 6 +++++-
 fs/dlm/recoverd.c | 4 ++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/dlm/member.c b/fs/dlm/member.c
index 98084e0cfccf..7e5f5aefefb5 100644
--- a/fs/dlm/member.c
+++ b/fs/dlm/member.c
@@ -534,7 +534,11 @@ int dlm_recover_members(struct dlm_ls *ls, struct 
dlm_recover *rv, int *neg_out)
        int i, error, neg = 0, low = -1;
 
        /* previously removed members that we've not finished removing need to
-          count as a negative change so the "neg" recovery steps will happen */
+        * count as a negative change so the "neg" recovery steps will happen
+        *
+        * This functionality must report all member changes to lsops or
+        * midcomms layer and must never return before.
+        */
 
        list_for_each_entry(memb, &ls->ls_nodes_gone, list) {
                log_rinfo(ls, "prev removed member %d", memb->nodeid);
diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c
index a55dfce705dd..b5b519cde20b 100644
--- a/fs/dlm/recoverd.c
+++ b/fs/dlm/recoverd.c
@@ -70,6 +70,10 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover 
*rv)
 
        /*
         * Add or remove nodes from the lockspace's ls_nodes list.
+        *
+        * Due the fact we must report all membership changes to lsops or
+        * midcomms layer it is not permitted to abort ls_recover() until
+        * this is done.
         */
 
        error = dlm_recover_members(ls, rv, &neg);
-- 
2.31.1

Reply via email to