Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock
On 6/8/19 1:16 PM, Andreas Gruenbacher wrote: Hi Ross, On Fri, 7 Jun 2019 at 18:21, Ross Lagerwall wrote: On 5/7/19 9:32 PM, Andreas Gruenbacher wrote: Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing buffered writes by starting a transaction in iomap_begin, writing a range of pages, and ending that transaction in iomap_end. This approach suffers from two problems: (1) Any allocations necessary for the write are done in iomap_begin, so when the data aren't journaled, there is no need for keeping the transaction open until iomap_end. (2) Transactions keep the gfs2 log flush lock held. When iomap_file_buffered_write calls balance_dirty_pages, this can end up calling gfs2_write_inode, which will try to flush the log. This requires taking the log flush lock which is already held, resulting in a deadlock. Fix both of these issues by not keeping transactions open from iomap_begin to iomap_end. Instead, start a small transaction in page_prepare and end it in page_done when necessary. Unfortunately, this patch broke growing gfs2 filesystems. It is easy to reproduce: $ mkfs.gfs2 -t xxx:yyy /dev/xvdb 4369065 $ mount /dev/xvdb /mnt $ gfs2_grow /mnt (doesn't finish) FS: Mount point: /mnt FS: Device: /dev/xvdb FS: Size:4369062 (0x42aaa6) DEV: Length: 13107200 (0xc8) The file system will grow by 34133MB. Looking at the kernel log, I see it hits the following assertion and then hangs trying to withdraw the filesystem (which is a separate problem, presumably): gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) && (tr->tr_num_revoke <= tr->tr_revokes)" failed function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117 gfs2: fsid=xxx:yyy.0: about to withdraw this file system Rearranging the code so that it prints information about the transaction before the failed withdrawal attempt shows: gfs2: fsid=xxx:yyy.0: Transaction created at: iomap_write_begin.constprop.45+0xbc/0x380 gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1 gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0 Reverting this commit fixes the issue. Tested with git master as of today (16d72dd4891fe). thanks for the error report. This turns out to be a rounding error in gfs2_iomap_page_prepare; the attached patch should help. Yes, that fixes the problem. Thanks for the quick response. -- Ross Lagerwall
Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock
Hi Ross, On Fri, 7 Jun 2019 at 18:21, Ross Lagerwall wrote: > On 5/7/19 9:32 PM, Andreas Gruenbacher wrote: > > Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is > > doing > > buffered writes by starting a transaction in iomap_begin, writing a range of > > pages, and ending that transaction in iomap_end. This approach suffers from > > two problems: > > > >(1) Any allocations necessary for the write are done in iomap_begin, so > > when > >the data aren't journaled, there is no need for keeping the transaction > > open > >until iomap_end. > > > >(2) Transactions keep the gfs2 log flush lock held. When > >iomap_file_buffered_write calls balance_dirty_pages, this can end up > > calling > >gfs2_write_inode, which will try to flush the log. This requires taking > > the > >log flush lock which is already held, resulting in a deadlock. > > > > Fix both of these issues by not keeping transactions open from iomap_begin > > to > > iomap_end. Instead, start a small transaction in page_prepare and end it in > > page_done when necessary. > > > Unfortunately, this patch broke growing gfs2 filesystems. It is easy to > reproduce: > > $ mkfs.gfs2 -t xxx:yyy /dev/xvdb 4369065 > $ mount /dev/xvdb /mnt > $ gfs2_grow /mnt (doesn't finish) > FS: Mount point: /mnt > FS: Device: /dev/xvdb > FS: Size:4369062 (0x42aaa6) > DEV: Length: 13107200 (0xc8) > The file system will grow by 34133MB. > > Looking at the kernel log, I see it hits the following assertion and > then hangs trying to withdraw the filesystem (which is a separate > problem, presumably): > > gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) && > (tr->tr_num_revoke <= tr->tr_revokes)" failed > function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117 > gfs2: fsid=xxx:yyy.0: about to withdraw this file system > > Rearranging the code so that it prints information about the transaction > before the failed withdrawal attempt shows: > gfs2: fsid=xxx:yyy.0: Transaction created at: > iomap_write_begin.constprop.45+0xbc/0x380 > gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1 > gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0 > > Reverting this commit fixes the issue. Tested with git master as of > today (16d72dd4891fe). thanks for the error report. This turns out to be a rounding error in gfs2_iomap_page_prepare; the attached patch should help. Thanks, Andreas --- fs/gfs2/bmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index f42718d..d2a3f038 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -994,9 +994,11 @@ static void gfs2_write_unlock(struct inode *inode) static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos, unsigned len, struct iomap *iomap) { + unsigned int blocks; struct gfs2_sbd *sdp = GFS2_SB(inode); - return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0); + blocks = (len + i_blocksize(inode) - 1) >> inode->i_blkbits; + return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0); } static void gfs2_iomap_page_done(struct inode *inode, loff_t pos, -- 1.8.3.1
Re: [Cluster-devel] [GFS2 PATCH 11/12] gfs2: Fix iomap write page reclaim deadlock
On 5/7/19 9:32 PM, Andreas Gruenbacher wrote: Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing buffered writes by starting a transaction in iomap_begin, writing a range of pages, and ending that transaction in iomap_end. This approach suffers from two problems: (1) Any allocations necessary for the write are done in iomap_begin, so when the data aren't journaled, there is no need for keeping the transaction open until iomap_end. (2) Transactions keep the gfs2 log flush lock held. When iomap_file_buffered_write calls balance_dirty_pages, this can end up calling gfs2_write_inode, which will try to flush the log. This requires taking the log flush lock which is already held, resulting in a deadlock. Fix both of these issues by not keeping transactions open from iomap_begin to iomap_end. Instead, start a small transaction in page_prepare and end it in page_done when necessary. Unfortunately, this patch broke growing gfs2 filesystems. It is easy to reproduce: $ mkfs.gfs2 -t xxx:yyy /dev/xvdb 4369065 $ mount /dev/xvdb /mnt $ gfs2_grow /mnt (doesn't finish) FS: Mount point: /mnt FS: Device: /dev/xvdb FS: Size:4369062 (0x42aaa6) DEV: Length: 13107200 (0xc8) The file system will grow by 34133MB. Looking at the kernel log, I see it hits the following assertion and then hangs trying to withdraw the filesystem (which is a separate problem, presumably): gfs2: fsid=xxx:yyy.0: fatal: assertion "(nbuf <= tr->tr_blocks) && (tr->tr_num_revoke <= tr->tr_revokes)" failed function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 117 gfs2: fsid=xxx:yyy.0: about to withdraw this file system Rearranging the code so that it prints information about the transaction before the failed withdrawal attempt shows: gfs2: fsid=xxx:yyy.0: Transaction created at: iomap_write_begin.constprop.45+0xbc/0x380 gfs2: fsid=xxx:yyy.0: blocks=1 revokes=0 reserved=8 touched=1 gfs2: fsid=xxx:yyy.0: Buf 1/0 Databuf 1/0 Revoke 0/0 Reverting this commit fixes the issue. Tested with git master as of today (16d72dd4891fe). Thanks, -- Ross Lagerwall