On Fri, Apr 21, 2023 at 9:07 PM Bob Peterson <rpete...@redhat.com> wrote: > Before this patch function gfs2_make_fs_ro called gfs2_log_flush once to > finalize the log. However, if there's dirty metadata, log flushes tend > to sync the metadata and formulate revokes. Before this patch, those > revokes may not be written out to the journal immediately, which meant > unresolved glocks could still have revokes in their ail lists. When the > glock worker runs, it tries to transition the glock, but the unresolved > revokes in the ail still need to be written, so it tries to start a > transaction. It's impossible to start a transaction because at that > point the GFS2_LOG_HEAD_FLUSH_SHUTDOWN has been set by gfs2_make_fs_ro.
Do you mean that at that point, the SDF_JOURNAL_LIVE flag has already been cleared? > That cause the glock worker to get an error, and unable to write the > revokes. The calling sequence looked something like this: > > gfs2_make_fs_ro > gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set > if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN) > clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); > ...meanwhile... > glock_work_func > do_xmote > rgrp_go_sync (or possibly inode_go_sync) > ... > gfs2_ail_empty_gl > __gfs2_trans_begin > if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) { > ... > return -EROFS; > > The previous patch in the series ("gfs2: return errors from > gfs2_ail_empty_gl") now causes the transaction error to no longer be > ignored, so it causes a warning from MOST of the xfstests: > > WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2] > > which corresponds to: > > WARN_ON(gfs2_withdrawing(sdp)); > > The withdraw was triggered silently from do_xmote by: > > if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp))) > gfs2_withdraw_delayed(sdp); > > This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the > data and one to sync any outstanding revokes and finalize the journal. > Note that both of these log flushes need to be "special," in other > words, not GFS2_LOG_HEAD_FLUSH_NORMAL. > > Signed-off-by: Bob Peterson <rpete...@redhat.com> > --- > fs/gfs2/super.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c > index a83fa62106f0..5eed8c237500 100644 > --- a/fs/gfs2/super.c > +++ b/fs/gfs2/super.c > @@ -552,6 +552,15 @@ void gfs2_make_fs_ro(struct gfs2_sbd *sdp) > gfs2_quota_sync(sdp->sd_vfs, 0); > gfs2_statfs_sync(sdp->sd_vfs, 0); > > + /* We do two log flushes here. The first one commits dirty > inodes > + * and rgrps to the journal, but queues up revokes to the ail > list. > + * The second flush writes out and removes the revokes. > + * > + * The first must be done before the FLUSH_SHUTDOWN code > + * clears the LIVE flag, otherwise it will not be able to > start > + * a transaction to write its revokes, and the error will > cause > + * a withdraw of the file system. */ > + gfs2_log_flush(sdp, NULL, GFS2_LFC_MAKE_FS_RO); > gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN | > GFS2_LFC_MAKE_FS_RO); > wait_event_timeout(sdp->sd_log_waitq, > -- > 2.39.2 > Thanks, Andreas