This patch adds some comment sections to make aware that the ls_recover() function should never fail before membership handling. Membership handling means to add/remove nodes from the lockspace ls_nodes attribute in dlm_recover_members().
This is because there are functionality like dlm_midcomms_add_member(), dlm_midcomms_remove_member() or dlm_lsop_recover_slot() which should always get aware of any join or leave of lockspace members. If we add a e.g. dlm_locking_stopped() before dlm_recover_members() to check if the recovery was interrupted and abort it we might skip to call dlm_midcomms_add_member(), dlm_midcomms_remove_member() or dlm_lsop_recover_slot(). A reason because the recovery is interrupted could be that the cluster manager notified about a new configuration .e.g. members joined or leaved. It is fine to interrupt or fail the recovery handling after the mentioned handling of dlm_recover_members() but never before. Signed-off-by: Alexander Aring <aahri...@redhat.com> --- fs/dlm/member.c | 6 +++++- fs/dlm/recoverd.c | 4 ++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/dlm/member.c b/fs/dlm/member.c index 98084e0cfccf..7e5f5aefefb5 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -534,7 +534,11 @@ int dlm_recover_members(struct dlm_ls *ls, struct dlm_recover *rv, int *neg_out) int i, error, neg = 0, low = -1; /* previously removed members that we've not finished removing need to - count as a negative change so the "neg" recovery steps will happen */ + * count as a negative change so the "neg" recovery steps will happen + * + * This functionality must report all member changes to lsops or + * midcomms layer and must never return before. + */ list_for_each_entry(memb, &ls->ls_nodes_gone, list) { log_rinfo(ls, "prev removed member %d", memb->nodeid); diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c index a55dfce705dd..b5b519cde20b 100644 --- a/fs/dlm/recoverd.c +++ b/fs/dlm/recoverd.c @@ -70,6 +70,10 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv) /* * Add or remove nodes from the lockspace's ls_nodes list. + * + * Due the fact we must report all membership changes to lsops or + * midcomms layer it is not permitted to abort ls_recover() until + * this is done. */ error = dlm_recover_members(ls, rv, &neg); -- 2.31.1