From: Guoqing Jiang <gqji...@suse.com>

[ Upstream commit 010228e4a932ca1e8365e3b58c8e1e44c16ff793 ]

When one node leaves cluster or stops the resyncing
(resync or recovery) array, then other nodes need to
call recover_bitmaps to continue the unfinished task.

But we need to clear suspend_area later after other
nodes copy the resync information to their bitmap
(by call bitmap_copy_from_slot). Otherwise, all nodes
could write to the suspend_area even the suspend_area
is not handled by any node, because area_resyncing
returns 0 at the beginning of raid1_write_request.
Which means one node could write suspend_area while
another node is resyncing the same area, then data
could be inconsistent.

So let's clear suspend_area later to avoid above issue
with the protection of bm lock. Also it is straightforward
to clear suspend_area after nodes have copied the resync
info to bitmap.

Signed-off-by: Guoqing Jiang <gqji...@suse.com>
Reviewed-by: NeilBrown <ne...@suse.com>
Signed-off-by: Shaohua Li <s...@fb.com>
Signed-off-by: Sasha Levin <alexander.le...@microsoft.com>
---
 drivers/md/md-cluster.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index a7a561af05c9..617a0aefc1c4 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -239,15 +239,6 @@ static void recover_bitmaps(struct md_thread *thread)
        while (cinfo->recovery_map) {
                slot = fls64((u64)cinfo->recovery_map) - 1;
 
-               /* Clear suspend_area associated with the bitmap */
-               spin_lock_irq(&cinfo->suspend_lock);
-               list_for_each_entry_safe(s, tmp, &cinfo->suspend_list, list)
-                       if (slot == s->slot) {
-                               list_del(&s->list);
-                               kfree(s);
-                       }
-               spin_unlock_irq(&cinfo->suspend_lock);
-
                snprintf(str, 64, "bitmap%04d", slot);
                bm_lockres = lockres_init(mddev, str, NULL, 1);
                if (!bm_lockres) {
@@ -266,6 +257,16 @@ static void recover_bitmaps(struct md_thread *thread)
                        pr_err("md-cluster: Could not copy data from bitmap 
%d\n", slot);
                        goto dlm_unlock;
                }
+
+               /* Clear suspend_area associated with the bitmap */
+               spin_lock_irq(&cinfo->suspend_lock);
+               list_for_each_entry_safe(s, tmp, &cinfo->suspend_list, list)
+                       if (slot == s->slot) {
+                               list_del(&s->list);
+                               kfree(s);
+                       }
+               spin_unlock_irq(&cinfo->suspend_lock);
+
                if (hi > 0) {
                        /* TODO:Wait for current resync to get over */
                        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-- 
2.17.1

Reply via email to