On Fri, Apr 21, 2023 at 9:07 PM Bob Peterson <rpete...@redhat.com> wrote:
> Before this patch function gfs2_make_fs_ro called gfs2_log_flush once to
> finalize the log. However, if there's dirty metadata, log flushes tend
> to sync the metadata and formulate revokes. Before this patch, those
> revokes may not be written out to the journal immediately, which meant
> unresolved glocks could still have revokes in their ail lists. When the
> glock worker runs, it tries to transition the glock, but the unresolved
> revokes in the ail still need to be written, so it tries to start a
> transaction. It's impossible to start a transaction because at that
> point the GFS2_LOG_HEAD_FLUSH_SHUTDOWN has been set by gfs2_make_fs_ro.

Do you mean that at that point, the SDF_JOURNAL_LIVE flag has already
been cleared?

> That cause the glock worker to get an error, and unable to write the
> revokes. The calling sequence looked something like this:
>
> gfs2_make_fs_ro
>    gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set
>         if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN)
>                 clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> ...meanwhile...
> glock_work_func
>    do_xmote
>       rgrp_go_sync (or possibly inode_go_sync)
>          ...
>          gfs2_ail_empty_gl
>             __gfs2_trans_begin
>                if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) {
>                ...
>                   return -EROFS;
>
> The previous patch in the series ("gfs2: return errors from
> gfs2_ail_empty_gl") now causes the transaction error to no longer be
> ignored, so it causes a warning from MOST of the xfstests:
>
> WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2]
>
> which corresponds to:
>
> WARN_ON(gfs2_withdrawing(sdp));
>
> The withdraw was triggered silently from do_xmote by:
>
>         if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
>                 gfs2_withdraw_delayed(sdp);
>
> This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the
> data and one to sync any outstanding revokes and finalize the journal.
> Note that both of these log flushes need to be "special," in other
> words, not GFS2_LOG_HEAD_FLUSH_NORMAL.
>
> Signed-off-by: Bob Peterson <rpete...@redhat.com>
> ---
>  fs/gfs2/super.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index a83fa62106f0..5eed8c237500 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -552,6 +552,15 @@ void gfs2_make_fs_ro(struct gfs2_sbd *sdp)
>                 gfs2_quota_sync(sdp->sd_vfs, 0);
>                 gfs2_statfs_sync(sdp->sd_vfs, 0);
>
> +               /* We do two log flushes here. The first one commits dirty 
> inodes
> +                * and rgrps to the journal, but queues up revokes to the ail 
> list.
> +                * The second flush writes out and removes the revokes.
> +                *
> +                * The first must be done before the FLUSH_SHUTDOWN code
> +                * clears the LIVE flag, otherwise it will not be able to 
> start
> +                * a transaction to write its revokes, and the error will 
> cause
> +                * a withdraw of the file system. */
> +               gfs2_log_flush(sdp, NULL, GFS2_LFC_MAKE_FS_RO);
>                 gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
>                                GFS2_LFC_MAKE_FS_RO);
>                 wait_event_timeout(sdp->sd_log_waitq,
> --
> 2.39.2
>

Thanks,
Andreas

Reply via email to