On 3/25/19 3:29 PM, Bob Peterson wrote:
----- Original Message -----
I think I've found the cause of the assertion I was hitting. Recovery
sets sd_log_flush_head but does not take locks which means a concurrent
call to gfs2_log_flush() can result in sd_log_head being set to
sd_log_flush_head. A later call to gfs2_log_flush() will then hit an
assertion failure in log_pull_tail() because the mismatch between
sd_log_head and sd_log_tail means too many blocks are freed.

I've worked around it by taking the log_flush lock in the patch below
and it seems to avoid the problem. However, tracing the recovery process
I see that it sets sd_log_flush_head and then calls clean_journal() ->
gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This
has:

        BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
               (sdp->sd_log_flush_head != sdp->sd_log_head));

... but sd_log_tail and sd_log_head have not been set by
gfs2_recover_func() so it might still BUG_ON() during recovery if you're
particularly unlucky.

I had a look at your "GFS2: Withdraw corruption patches [V2]" series but
I didn't see anything that might fix this.

If you think this patch is useful then I can send it as a proper patch
to the list.

Thanks,
Ross

--------------

gfs2: Take log_flush lock during recovery

Recovery sets sd_log_flush_head but does not take any locks which means
a concurrent call to gfs2_log_flush can result in sd_log_head being set
to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
assertion failure in log_pull_tail because the mismatch between
sd_log_head and sd_log_tail means too many blocks are freed.

gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion
"atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
function = log_pull_tail, file = fs/gfs2/log.c, line = 510

Hi Ross,

I think you found a valid bug, but that's not the proper solution.
The reason is: I think journal replay and journal flushing should both be
protected by the exclusive (EX) glock taken on the journal itself.

I think the problem may actually be a regression with patch 588bff95c94ef.
Because of that patch, function clean_journal now sets sdp->sd_log_flush_head
but its caller, gfs2_recover_func, is used to recover any node's journal, not
just its own. The bug is that clean_journal should only set sd_log_flush_head
if (and only if) it's replaying its own journal, not someone else's.
If it sets sd_log_flush_head while replaying another node's journal, that
will only lead to a problem like this.

I'll try and whip up another patch and perhaps you can test it for me.

FWIW, I've never seen this problem manifest on my recovery tests, but it
still might be causing some of the weird problems I'm seeing.


(Replying on this thread since I didn't get CCed on the patch.)

I still hit the assertion with this patch. gfs2_log_write_header() is unconditionally called and it calls gfs2_log_bmap() which changes sd_log_flush_head in the success path.

FWIW we hit this assertion periodically during testing but I found that I could get it to trigger almost every time during recovery with some well-placed sleeps and a kthread that continuously calls gfs2_log_flush().

-----8<----------

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index ee20ea42e7b5..8bd87082a110 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -820,6 +820,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
        }
        lops_after_commit(sdp, tr);

+       if (flags & GFS2_LFC_LOGD_TEST) {
+               msleep(1000);
+       }
+
        gfs2_log_lock(sdp);
        sdp->sd_log_head = sdp->sd_log_flush_head;
        sdp->sd_log_blks_reserved = 0;
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 20241436126d..9d658efdb57a 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -79,6 +79,7 @@ extern void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)

 extern void gfs2_log_shutdown(struct gfs2_sbd *sdp);
 extern int gfs2_logd(void *data);
+extern int gfs2_logd2(void *data);
extern void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd);
 extern void gfs2_write_revokes(struct gfs2_sbd *sdp);

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a971862b186e..1d7f32c1960a 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -378,6 +378,7 @@ static int init_threads(struct gfs2_sbd *sdp)
        int error = 0;

        p = kthread_run(gfs2_logd, sdp, "gfs2_logd");
+       p = kthread_run(gfs2_logd2, sdp, "gfs2_logd2");
        if (IS_ERR(p)) {
                error = PTR_ERR(p);
                fs_err(sdp, "can't start logd thread: %d\n", error);
@@ -1763,3 +1764,17 @@ const struct super_operations gfs2_super_ops = {
        .show_options           = gfs2_show_options,
 };

+int gfs2_logd2(void *data)
+{
+       struct gfs2_sbd *sdp = data;
+
+       msleep(30000);
+
+       for (;;) {
+               msleep(100);
+               gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
+                              GFS2_LFC_LOGD_TEST);
+       }
+
+       return 0;
+}
diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h
index 2dc10a034de1..ebae3a21c98e 100644
--- a/include/uapi/linux/gfs2_ondisk.h
+++ b/include/uapi/linux/gfs2_ondisk.h
@@ -431,6 +431,7 @@ struct gfs2_ea_header {
 #define GFS2_LFC_TRANS_END             0x01000000
 #define GFS2_LFC_LOGD_JFLUSH_REQD      0x02000000
 #define GFS2_LFC_LOGD_AIL_FLUSH_REQD   0x04000000
+#define GFS2_LFC_LOGD_TEST     0x08000000

 #define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash))


Thanks,
--
Ross Lagerwall

Reply via email to