On 05/11, Chao Yu wrote: > On 2020/5/10 3:03, Jaegeuk Kim wrote: > > On 05/09, Chao Yu wrote: > >> On 2020/5/9 0:10, Jaegeuk Kim wrote: > >>> Hi Sayali, > >>> > >>> In order to address the perf regression, how about this? > >>> > >>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 > >>> From: Jaegeuk Kim <jaeg...@kernel.org> > >>> Date: Fri, 8 May 2020 09:08:37 -0700 > >>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint > >>> > >>> There could be a scenario where f2fs_sync_node_pages gets > >>> called during checkpoint, which in turn tries to flush > >>> inline data and calls iput(). This results in deadlock as > >>> iput() tries to hold cp_rwsem, which is already held at the > >>> beginning by checkpoint->block_operations(). > >>> > >>> Call stack : > >>> > >>> Thread A Thread B > >>> f2fs_write_checkpoint() > >>> - block_operations(sbi) > >>> - f2fs_lock_all(sbi); > >>> - down_write(&sbi->cp_rwsem); > >>> > >>> - open() > >>> - igrab() > >>> - write() write inline data > >>> - unlink() > >>> - f2fs_sync_node_pages() > >>> - if (is_inline_node(page)) > >>> - flush_inline_data() > >>> - ilookup() > >>> page = f2fs_pagecache_get_page() > >>> if (!page) > >>> goto iput_out; > >>> iput_out: > >>> -close() > >>> -iput() > >>> iput(inode); > >>> - f2fs_evict_inode() > >>> - f2fs_truncate_blocks() > >>> - f2fs_lock_op() > >>> - down_read(&sbi->cp_rwsem); > >>> > >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to > >>> inline_data") > >>> Signed-off-by: Sayali Lokhande <saya...@codeaurora.org> > >>> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> > >>> --- > >>> fs/f2fs/node.c | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > >>> index 1db8cabf727ef..626d7daca09de 100644 > >>> --- a/fs/f2fs/node.c > >>> +++ b/fs/f2fs/node.c > >>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > >>> goto continue_unlock; > >>> } > >>> > >>> - /* flush inline_data */ > >>> - if (is_inline_node(page)) { > >>> + /* flush inline_data, if it's not sync path. */ > >>> + if (do_balance && is_inline_node(page)) { > >> > >> IIRC, this flow was designed to avoid running out of free space issue > >> during checkpoint: > >> > >> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") > >> > >> The sceanrio is: > >> 1. create fully node blocks > >> 2. flush node blocks > >> 3. write inline_data for all the node blocks again > >> 4. flush node blocks redundantly > >> > >> I guess this may cause failing one case of fstest. > > > > Yeah, actually I was hitting 204 failure, and thus, revised like this. > > Now, I don't see any regression in fstest. > > > >>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 > > From: Sayali Lokhande <saya...@codeaurora.org> > > Date: Thu, 30 Apr 2020 16:28:29 +0530 > > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > > > > There could be a scenario where f2fs_sync_node_pages gets > > called during checkpoint, which in turn tries to flush > > inline data and calls iput(). This results in deadlock as > > iput() tries to hold cp_rwsem, which is already held at the > > beginning by checkpoint->block_operations(). > > > > Call stack : > > > > Thread A Thread B > > f2fs_write_checkpoint() > > - block_operations(sbi) > > - f2fs_lock_all(sbi); > > - down_write(&sbi->cp_rwsem); > > > > - open() > > - igrab() > > - write() write inline data > > - unlink() > > - f2fs_sync_node_pages() > > - if (is_inline_node(page)) > > - flush_inline_data() > > - ilookup() > > page = f2fs_pagecache_get_page() > > if (!page) > > goto iput_out; > > iput_out: > > -close() > > -iput() > > iput(inode); > > - f2fs_evict_inode() > > - f2fs_truncate_blocks() > > - f2fs_lock_op() > > - down_read(&sbi->cp_rwsem); > > > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to > > inline_data") > > Signed-off-by: Sayali Lokhande <saya...@codeaurora.org> > > Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> > > --- > > fs/f2fs/checkpoint.c | 9 ++++++++- > > fs/f2fs/f2fs.h | 4 ++-- > > fs/f2fs/node.c | 10 +++++----- > > 3 files changed, 15 insertions(+), 8 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index d49f7a01d8a26..928aea4ff663d 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) > > }; > > int err = 0, cnt = 0; > > > > + /* > > + * Let's flush node pages first to flush inline_data. > > + * We'll actually guarantee everything below under f2fs_lock_all. > > + */ > > + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO); > > It is possible that user write a large number of inline data in between > f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue > in > race condition. > > Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will > flush inode page twice which is unneeded. > > f2fs_sync_node_pages() --- flush dirty inode page > f2fs_lock_all() > ... > f2fs_sync_inode_meta() --- update dirty inode page > f2fs_sync_node_pages() --- flush dirty inode page again. >
Another version: >From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001 From: Sayali Lokhande <saya...@codeaurora.org> Date: Thu, 30 Apr 2020 16:28:29 +0530 Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint There could be a scenario where f2fs_sync_node_pages gets called during checkpoint, which in turn tries to flush inline data and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). Call stack : Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") Signed-off-by: Sayali Lokhande <saya...@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> --- fs/f2fs/checkpoint.c | 5 +++++ fs/f2fs/f2fs.h | 1 + fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d49f7a01d8a26..79e605f38f4fa 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi) }; int err = 0, cnt = 0; + /* + * Let's flush inline_data in dirty node pages. + */ + f2fs_flush_inline_data(sbi); + retry_flush_quotas: f2fs_lock_all(sbi); if (__need_flush_quota(sbi)) { diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 2a8ea81c52a15..7f3d259e7e376 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); struct page *f2fs_get_node_page_ra(struct page *parent, int start); int f2fs_move_node_page(struct page *node_page, int gc_type); +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi); int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, struct writeback_control *wbc, bool atomic, unsigned int *seq_id); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 1db8cabf727ef..e632de10aedab 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page) return true; } +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi) +{ + pgoff_t index = 0; + struct pagevec pvec; + int nr_pages; + int ret = 0; + + pagevec_init(&pvec); + + while ((nr_pages = pagevec_lookup_tag(&pvec, + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { + int i; + + for (i = 0; i < nr_pages; i++) { + struct page *page = pvec.pages[i]; + + if (!IS_DNODE(page)) + continue; + + lock_page(page); + + if (unlikely(page->mapping != NODE_MAPPING(sbi))) { +continue_unlock: + unlock_page(page); + continue; + } + + if (!PageDirty(page)) { + /* someone wrote it for us */ + goto continue_unlock; + } + + /* flush inline_data, if it's async context. */ + if (is_inline_node(page)) { + clear_inline_node(page); + unlock_page(page); + flush_inline_data(sbi, ino_of_node(page)); + continue; + } + unlock_page(page); + } + pagevec_release(&pvec); + cond_resched(); + } + return ret; +} + int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, bool do_balance, enum iostat_type io_type) @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, goto continue_unlock; } - /* flush inline_data */ - if (is_inline_node(page)) { + /* flush inline_data, if it's async context. */ + if (do_balance && is_inline_node(page)) { clear_inline_node(page); unlock_page(page); flush_inline_data(sbi, ino_of_node(page)); -- 2.26.2.645.ge9eca65c58-goog