Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock

2019-06-11 Thread Ross Lagerwall

On 6/8/19 1:16 PM, Andreas Gruenbacher wrote:

Hi Ross,

On Fri, 7 Jun 2019 at 18:21, Ross Lagerwall  wrote:

On 5/7/19 9:32 PM, Andreas Gruenbacher wrote:

Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

(1) Any allocations necessary for the write are done in iomap_begin, so when
the data aren't journaled, there is no need for keeping the transaction open
until iomap_end.

(2) Transactions keep the gfs2 log flush lock held.  When
iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
gfs2_write_inode, which will try to flush the log.  This requires taking the
log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.


Unfortunately, this patch broke growing gfs2 filesystems. It is easy to
reproduce:

$ mkfs.gfs2 -t xxx:yyy /dev/xvdb  4369065
$ mount /dev/xvdb /mnt
$ gfs2_grow /mnt (doesn't finish)
FS: Mount point: /mnt
FS: Device:  /dev/xvdb
FS: Size:4369062 (0x42aaa6)
DEV: Length: 13107200 (0xc8)
The file system will grow by 34133MB.

Looking at the kernel log, I see it hits the following assertion and
then hangs trying to withdraw the filesystem (which is a separate
problem, presumably):

gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) &&
(tr->tr_num_revoke <= tr->tr_revokes)" failed
 function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117
gfs2: fsid=xxx:yyy.0: about to withdraw this file system

Rearranging the code so that it prints information about the transaction
before the failed withdrawal attempt shows:
gfs2: fsid=xxx:yyy.0: Transaction created at:
iomap_write_begin.constprop.45+0xbc/0x380
gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1
gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0

Reverting this commit fixes the issue. Tested with git master as of
today (16d72dd4891fe).


thanks for the error report. This turns out to be a rounding error in
gfs2_iomap_page_prepare; the attached patch should help.


Yes, that fixes the problem. Thanks for the quick response.

--
Ross Lagerwall



Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock

2019-06-08 Thread Andreas Gruenbacher
Hi Ross,

On Fri, 7 Jun 2019 at 18:21, Ross Lagerwall  wrote:
> On 5/7/19 9:32 PM, Andreas Gruenbacher wrote:
> > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is 
> > doing
> > buffered writes by starting a transaction in iomap_begin, writing a range of
> > pages, and ending that transaction in iomap_end.  This approach suffers from
> > two problems:
> >
> >(1) Any allocations necessary for the write are done in iomap_begin, so 
> > when
> >the data aren't journaled, there is no need for keeping the transaction 
> > open
> >until iomap_end.
> >
> >(2) Transactions keep the gfs2 log flush lock held.  When
> >iomap_file_buffered_write calls balance_dirty_pages, this can end up 
> > calling
> >gfs2_write_inode, which will try to flush the log.  This requires taking 
> > the
> >log flush lock which is already held, resulting in a deadlock.
> >
> > Fix both of these issues by not keeping transactions open from iomap_begin 
> > to
> > iomap_end.  Instead, start a small transaction in page_prepare and end it in
> > page_done when necessary.
> >
> Unfortunately, this patch broke growing gfs2 filesystems. It is easy to
> reproduce:
>
> $ mkfs.gfs2 -t xxx:yyy /dev/xvdb  4369065
> $ mount /dev/xvdb /mnt
> $ gfs2_grow /mnt (doesn't finish)
> FS: Mount point: /mnt
> FS: Device:  /dev/xvdb
> FS: Size:4369062 (0x42aaa6)
> DEV: Length: 13107200 (0xc8)
> The file system will grow by 34133MB.
>
> Looking at the kernel log, I see it hits the following assertion and
> then hangs trying to withdraw the filesystem (which is a separate
> problem, presumably):
>
> gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) &&
> (tr->tr_num_revoke <= tr->tr_revokes)" failed
> function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117
> gfs2: fsid=xxx:yyy.0: about to withdraw this file system
>
> Rearranging the code so that it prints information about the transaction
> before the failed withdrawal attempt shows:
> gfs2: fsid=xxx:yyy.0: Transaction created at:
> iomap_write_begin.constprop.45+0xbc/0x380
> gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1
> gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0
>
> Reverting this commit fixes the issue. Tested with git master as of
> today (16d72dd4891fe).

thanks for the error report. This turns out to be a rounding error in
gfs2_iomap_page_prepare; the attached patch should help.

Thanks,
Andreas

---
 fs/gfs2/bmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index f42718d..d2a3f038 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -994,9 +994,11 @@ static void gfs2_write_unlock(struct inode *inode)
 static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
   unsigned len, struct iomap *iomap)
 {
+   unsigned int blocks;
struct gfs2_sbd *sdp = GFS2_SB(inode);
 
-   return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
+   blocks = (len + i_blocksize(inode) - 1) >> inode->i_blkbits;
+   return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
 }
 
 static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
-- 
1.8.3.1



Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock

2019-06-07 Thread Ross Lagerwall

On 5/7/19 9:32 PM, Andreas Gruenbacher wrote:

Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

   (1) Any allocations necessary for the write are done in iomap_begin, so when
   the data aren't journaled, there is no need for keeping the transaction open
   until iomap_end.

   (2) Transactions keep the gfs2 log flush lock held.  When
   iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
   gfs2_write_inode, which will try to flush the log.  This requires taking the
   log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Unfortunately, this patch broke growing gfs2 filesystems. It is easy to 
reproduce:


$ mkfs.gfs2 -t xxx:yyy /dev/xvdb  4369065
$ mount /dev/xvdb /mnt
$ gfs2_grow /mnt (doesn't finish)
FS: Mount point: /mnt
FS: Device:  /dev/xvdb
FS: Size:4369062 (0x42aaa6)
DEV: Length: 13107200 (0xc8)
The file system will grow by 34133MB.

Looking at the kernel log, I see it hits the following assertion and 
then hangs trying to withdraw the filesystem (which is a separate 
problem, presumably):


gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) && 
(tr->tr_num_revoke <= tr->tr_revokes)" failed

   function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117
gfs2: fsid=xxx:yyy.0: about to withdraw this file system

Rearranging the code so that it prints information about the transaction 
before the failed withdrawal attempt shows:
gfs2: fsid=xxx:yyy.0: Transaction created at: 
iomap_write_begin.constprop.45+0xbc/0x380

gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1
gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0

Reverting this commit fixes the issue. Tested with git master as of 
today (16d72dd4891fe).


Thanks,
--
Ross Lagerwall