On 17/12/2018 13:54, Bob Peterson wrote:
> Hi,
> 
> Before this patch, gfs2 would try to withdraw when it encountered
> io errors writing to its journal. That's incorrect behavior
> because if it can't write to the journal, it cannot write revokes
> for the metadata it sends down. A withdraw will cause gfs2 to
> unmount the file system from dlm, which is a controlled shutdown,
> but the io error means it cannot write the UNMOUNT log header
> to the journal. The controlled shutdown will cause dlm to release
> all its locks, allowing other nodes to update the metadata.
> When the node rejoins the cluster and sees no UNMOUNT log header
> it will see the journal is dirty and replay it, but after the
> other nodes may have changed the metadata, thus corrupting the
> file system.
> 
> If we get an io error writing to the journal, the only correct
> thing to do is to kernel panic. 

Hi,

That may be required for correctness, however are we sure there is no
other way to force the DLM recovery (or can another mechanism be
introduced)?
Consider that there might be multiple GFS2 filesystems mounted from
different iSCSI backends, just because one of them encountered an I/O
error the other ones may still be good to continue.
(Also the host might have other filesystems mounted: local, NFS, it
might still be able to perform I/O on those, so bringing the whole host
down would be best avoided).

Best regards,
--Edwin

> That will force dlm to go through
> its full recovery process on the other cluster nodes, freeze all
> locks, and make sure the journal is replayed by a node in the
> cluster before any other nodes get the affected locks and try to
> modify the metadata in the unfinished portion of the journal.
> 
> This patch changes the behavior so that io errors encountered
> in the journals cause an immediate kernel panic with a message.
> However, quota update errors are still allowed to withdraw as
> before.
> 
> Signed-off-by: Bob Peterson <rpete...@redhat.com>
> ---
>  fs/gfs2/lops.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 94dcab655bc0..44b85f7675d4 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -209,11 +209,9 @@ static void gfs2_end_log_write(struct bio *bio)
>       struct page *page;
>       int i;
>  
> -     if (bio->bi_status) {
> -             fs_err(sdp, "Error %d writing to journal, jid=%u\n",
> -                    bio->bi_status, sdp->sd_jdesc->jd_jid);
> -             wake_up(&sdp->sd_logd_waitq);
> -     }
> +     if (bio->bi_status)
> +             panic("Error %d writing to journal, jid=%u\n", bio->bi_status,
> +                   sdp->sd_jdesc->jd_jid);
>  
>       bio_for_each_segment_all(bvec, bio, i) {
>               page = bvec->bv_page;
> 

Reply via email to