On Fri, Apr 21, 2023 at 9:07 PM Bob Peterson <[email protected]> wrote:
> Before this patch function gfs2_make_fs_ro called gfs2_log_flush once to
> finalize the log. However, if there's dirty metadata, log flushes tend
> to sync the metadata and formulate revokes. Before this patch, those
> revokes may not be written out to the journal immediately, which meant
> unresolved glocks could still have revokes in their ail lists. When the
> glock worker runs, it tries to transition the glock, but the unresolved
> revokes in the ail still need to be written, so it tries to start a
> transaction. It's impossible to start a transaction because at that
> point the GFS2_LOG_HEAD_FLUSH_SHUTDOWN has been set by gfs2_make_fs_ro.
Do you mean that at that point, the SDF_JOURNAL_LIVE flag has already
been cleared?
> That cause the glock worker to get an error, and unable to write the
> revokes. The calling sequence looked something like this:
>
> gfs2_make_fs_ro
> gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set
> if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN)
> clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> ...meanwhile...
> glock_work_func
> do_xmote
> rgrp_go_sync (or possibly inode_go_sync)
> ...
> gfs2_ail_empty_gl
> __gfs2_trans_begin
> if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) {
> ...
> return -EROFS;
>
> The previous patch in the series ("gfs2: return errors from
> gfs2_ail_empty_gl") now causes the transaction error to no longer be
> ignored, so it causes a warning from MOST of the xfstests:
>
> WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2]
>
> which corresponds to:
>
> WARN_ON(gfs2_withdrawing(sdp));
>
> The withdraw was triggered silently from do_xmote by:
>
> if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
> gfs2_withdraw_delayed(sdp);
>
> This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the
> data and one to sync any outstanding revokes and finalize the journal.
> Note that both of these log flushes need to be "special," in other
> words, not GFS2_LOG_HEAD_FLUSH_NORMAL.
>
> Signed-off-by: Bob Peterson <[email protected]>
> ---
> fs/gfs2/super.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index a83fa62106f0..5eed8c237500 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -552,6 +552,15 @@ void gfs2_make_fs_ro(struct gfs2_sbd *sdp)
> gfs2_quota_sync(sdp->sd_vfs, 0);
> gfs2_statfs_sync(sdp->sd_vfs, 0);
>
> + /* We do two log flushes here. The first one commits dirty
> inodes
> + * and rgrps to the journal, but queues up revokes to the ail
> list.
> + * The second flush writes out and removes the revokes.
> + *
> + * The first must be done before the FLUSH_SHUTDOWN code
> + * clears the LIVE flag, otherwise it will not be able to
> start
> + * a transaction to write its revokes, and the error will
> cause
> + * a withdraw of the file system. */
> + gfs2_log_flush(sdp, NULL, GFS2_LFC_MAKE_FS_RO);
> gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
> GFS2_LFC_MAKE_FS_RO);
> wait_event_timeout(sdp->sd_log_waitq,
> --
> 2.39.2
>
Thanks,
Andreas