Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/6/26 10:01, Zhiguo Niu wrote: Chao Yu 于2024年6月25日周二 22:29写道: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread AThread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete In LFS mode, if there is inflight dio, let's force read to buffered IO, this policy won't cover all race cases, however it is a tradeoff which avoids abusing lock around IO paths. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- fs/f2fs/file.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 278573974db4..866f1a34e92b 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -882,6 +882,10 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw) return true; if (is_sbi_flag_set(sbi, SBI_CP_DISABLED)) return true; + /* In LFS mode, if there is inflight dio, force read to buffered IO */ + if (rw == READ && f2fs_lfs_mode(sbi) && + atomic_read(&inode->i_dio_count)) + return false; Hi Chao, A little doubt:),force “buffered IO” should return "true"? Oops, too rush to send the patch... another want to confirm is, "thread B" in commit msg just doing buffer read, so this modification just cover direct read case? Oh, the fix is incorrect, will look into it soon. Thanks, thanks! return false; } -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
Chao Yu 于2024年6月25日周二 22:29写道: > > If lfs mode is on, buffered read may race w/ OPU dio write as below, > it may cause buffered read hits unwritten data unexpectly, and for > dio read, the race condition exists as well. > > Thread AThread B > - f2fs_file_write_iter > - f2fs_dio_write_iter > - __iomap_dio_rw >- f2fs_iomap_begin > - f2fs_map_blocks > - __allocate_data_block > - allocated blkaddr #x >- iomap_dio_submit_bio > - f2fs_file_read_iter > - filemap_read > - f2fs_read_data_folio >- f2fs_mpage_readpages > - f2fs_map_blocks > : get blkaddr #x > - f2fs_submit_read_bio > IRQ > - f2fs_read_end_io > : read IO on blkaddr #x complete > IRQ > - iomap_dio_bio_end_io > : direct write IO on blkaddr #x complete > > In LFS mode, if there is inflight dio, let's force read to buffered > IO, this policy won't cover all race cases, however it is a tradeoff > which avoids abusing lock around IO paths. > > Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") > Signed-off-by: Chao Yu > --- > fs/f2fs/file.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 278573974db4..866f1a34e92b 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -882,6 +882,10 @@ static bool f2fs_force_buffered_io(struct inode *inode, > int rw) > return true; > if (is_sbi_flag_set(sbi, SBI_CP_DISABLED)) > return true; > + /* In LFS mode, if there is inflight dio, force read to buffered IO */ > + if (rw == READ && f2fs_lfs_mode(sbi) && > + atomic_read(&inode->i_dio_count)) > + return false; Hi Chao, A little doubt:),force “buffered IO” should return "true"? another want to confirm is, "thread B" in commit msg just doing buffer read, so this modification just cover direct read case? thanks! > > return false; > } > -- > 2.40.1 > > > > ___ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/5/15 14:38, Chao Yu wrote: On 2024/5/15 12:42, Jaegeuk Kim wrote: On 05/15, Chao Yu wrote: On 2024/5/15 0:09, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Wasn't this supposed to be managed by user-land? Actually, the test case is: 1. mount w/ lfs mode 2. touch file; 3. initialize file w/ 4k zeroed data; fsync; 4. continue triggering dio write 4k zeroed data to file; 5. and meanwhile, continue triggering buf/dio 4k read in file, use md5sum to verify the 4k data; It expects data is all zero, however it turned out it's not. Can we check outstanding write bios instead of abusing locks? Jaegeuk, seems it can solve partial race cases, not all of them. Do you suggest to use this compromised solution? Thanks, I didn't figure out a way to solve this w/o lock, due to: - write bios can be issued after outstanding write bios check condition, result in the race. - once read() detects that there are outstanding write bios, we need to delay read flow rather than fail it, right? It looks using a lock is more proper here? Any suggestion? Thanks, Thanks, Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- v2: - fix to cover dio read path w/ i_opu_rwsem as well. fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 28 ++-- fs/f2fs/super.c | 1 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 30058e16a5d0..91cf4b3d6bc6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 72ce1a522fb2..4ec260af321f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) const loff_t pos = iocb->ki_pos; const size_t count = iov_iter_count(to); struct iomap_dio *dio; + bool do_opu = f2fs_lfs_mode(sbi); ssize_t ret; if (count == 0) @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_read_trylock(&fi->i_opu_rwsem)) { + f2fs_up_read(&fi->i_gc_rwsem[READ]); + ret = -EAGAIN; + goto out; + } } else { f2fs_down_read(&fi->i_gc_rwsem[READ]); + f2fs_down_read(&fi->i_opu_rwsem); } /* @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = iomap_dio_complete(dio); } + f2fs_up_read(&fi->i_opu_rwsem); f2fs_up_read(&fi->i_gc_rwsem[READ]); file_accessed(file); @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(&F2FS_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(&F2FS_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/5/15 16:32, Wu Bo wrote: On Fri, May 10, 2024 at 10:39:06AM +0800, Chao Yu wrote: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete Looks like every COW filesystem would meet this situation. What's the solution of other FS? I missed to reply this... Other cow filesystem like btrfs, it will update metadata after data IO completion, so it is safe. Thanks, This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
Hello, kernel test robot noticed "WARNING:at_kernel/locking/rwsem.c:#down_read" on: commit: abf7df61e5c60fed520a09102d576fd41ed4d5ee ("[PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write") url: https://github.com/intel-lab-lkp/linux/commits/Chao-Yu/f2fs-fix-to-avoid-racing-in-between-read-and-OPU-dio-write/20240510-104020 base: https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev patch link: https://lore.kernel.org/all/20240510023906.281700-1-c...@kernel.org/ patch subject: [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write in testcase: xfstests version: xfstests-x86_64-0e5c12df-1_20240511 with following parameters: disk: 4HDD fs: f2fs test: generic-617 compiler: gcc-13 test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-lkp/202405171532.760e22d3-oliver.s...@intel.com [ 160.985543][ T2255] [ cut here ] [ 160.990864][ T2255] WARNING: CPU: 3 PID: 2255 at kernel/locking/rwsem.c:245 down_read (kernel/locking/rwsem.c:245 (discriminator 1) kernel/locking/rwsem.c:1249 (discriminator 1) kernel/locking/rwsem.c:1263 (discriminator 1) kernel/locking/rwsem.c:1528 (discriminator 1)) [ 160.999715][ T2255] Modules linked in: f2fs crc32_generic intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal btrfs intel_powerclamp blake2b_generic coretemp xor zstd_compress kvm_intel raid6_pq libcrc32c i915 kvm sd_mod t10_pi crc64_rocksoft_generic crct10dif_pclmul crc32_pclmul crc64_rocksoft crc64 crc32c_intel ghash_clmulni_intel sha512_ssse3 sg drm_buddy rapl ipmi_devintf intel_cstate intel_gtt ipmi_msghandler mei_wdt wmi_bmof intel_uncore i2c_i801 drm_display_helper i2c_smbus ahci ttm drm_kms_helper libahci video libata mei_me mei acpi_pad intel_pch_thermal wmi binfmt_misc loop fuse drm dm_mod ip_tables [ 161.053654][ T2255] CPU: 3 PID: 2255 Comm: fsx Tainted: G I 6.9.0-rc1-00036-gabf7df61e5c6 #1 [ 161.063438][ T2255] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015 [ 161.071504][ T2255] RIP: 0010:down_read (kernel/locking/rwsem.c:245 (discriminator 1) kernel/locking/rwsem.c:1249 (discriminator 1) kernel/locking/rwsem.c:1263 (discriminator 1) kernel/locking/rwsem.c:1528 (discriminator 1)) [ 161.076376][ T2255] Code: b8 00 00 00 00 00 fc ff df 83 e3 02 48 c1 ea 03 4c 09 fb 48 83 cb 01 80 3c 02 00 0f 85 9a 00 00 00 48 89 5d 08 e9 50 ff ff ff <0f> 0b 4c 8d 7d 08 be 08 00 00 00 4c 89 ff e8 c1 59 e9 fd 4c 89 f8 All code 0: b8 00 00 00 00 mov$0x0,%eax 5: 00 fc add%bh,%ah 7: ff (bad) 8: df 83 e3 02 48 c1 filds -0x3eb7fd1d(%rbx) e: ea (bad) f: 03 4c 09 fb add-0x5(%rcx,%rcx,1),%ecx 13: 48 83 cb 01 or $0x1,%rbx 17: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 1b: 0f 85 9a 00 00 00 jne0xbb 21: 48 89 5d 08 mov%rbx,0x8(%rbp) 25: e9 50 ff ff ff jmpq 0xff7a 2a:* 0f 0b ud2 <-- trapping instruction 2c: 4c 8d 7d 08 lea0x8(%rbp),%r15 30: be 08 00 00 00 mov$0x8,%esi 35: 4c 89 ffmov%r15,%rdi 38: e8 c1 59 e9 fd callq 0xfde959fe 3d: 4c 89 f8mov%r15,%rax Code starting with the faulting instruction === 0: 0f 0b ud2 2: 4c 8d 7d 08 lea0x8(%rbp),%r15 6: be 08 00 00 00 mov$0x8,%esi b: 4c 89 ffmov%r15,%rdi e: e8 c1 59 e9 fd callq 0xfde959d4 13: 4c 89 f8mov%r15,%rax [ 161.095753][ T2255] RSP: 0018:c90002c0f8b8 EFLAGS: 00010286 [ 161.101662][ T2255] RAX: fe00 RBX: fe00 RCX: 83bdf543 [ 161.109469][ T2255] RDX: ed10236632f0 RSI: 0008 RDI: 88811b319778 [ 161.117276][ T2255] RBP: 88811b319778 R08: 0001 R09: ed10236632ef [ 161.125080][ T2255] R10: 88811b31977f R11: 85fe96a2 R12: 192000581f18 [ 161.132885][ T2255] R13: dc00 R14: c90002c0fa70 R15: [ 161.140691][ T2255] FS: 7f623f19d740() GS:8887e938() knlGS: [ 161.149446][ T2255] CS: 0010 DS: ES: CR0: 80050033 [ 161.155871][ T2255] CR2: 7f623f05b000 CR3: 0001b2026006 CR4: 003706f0 [ 161.163705][ T2255] DR0: DR1: DR2: [ 161.171512][ T2255] DR3:
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
… > This patch introduces a new … Please choose a corresponding imperative wording for an improved change description. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9#n94 Regards, Markus ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On Fri, May 10, 2024 at 10:39:06AM +0800, Chao Yu wrote: > If lfs mode is on, buffered read may race w/ OPU dio write as below, > it may cause buffered read hits unwritten data unexpectly, and for > dio read, the race condition exists as well. > > Thread A Thread B > - f2fs_file_write_iter > - f2fs_dio_write_iter > - __iomap_dio_rw >- f2fs_iomap_begin > - f2fs_map_blocks > - __allocate_data_block > - allocated blkaddr #x >- iomap_dio_submit_bio > - f2fs_file_read_iter >- filemap_read > - f2fs_read_data_folio > - f2fs_mpage_readpages > - f2fs_map_blocks >: get blkaddr #x > - f2fs_submit_read_bio > IRQ > - f2fs_read_end_io >: read IO on blkaddr #x complete > IRQ > - iomap_dio_bio_end_io > : direct write IO on blkaddr #x complete > Looks like every COW filesystem would meet this situation. What's the solution of other FS? > This patch introduces a new per-inode i_opu_rwsem lock to avoid > such race condition. > ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/5/15 12:42, Jaegeuk Kim wrote: On 05/15, Chao Yu wrote: On 2024/5/15 0:09, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Wasn't this supposed to be managed by user-land? Actually, the test case is: 1. mount w/ lfs mode 2. touch file; 3. initialize file w/ 4k zeroed data; fsync; 4. continue triggering dio write 4k zeroed data to file; 5. and meanwhile, continue triggering buf/dio 4k read in file, use md5sum to verify the 4k data; It expects data is all zero, however it turned out it's not. Can we check outstanding write bios instead of abusing locks? I didn't figure out a way to solve this w/o lock, due to: - write bios can be issued after outstanding write bios check condition, result in the race. - once read() detects that there are outstanding write bios, we need to delay read flow rather than fail it, right? It looks using a lock is more proper here? Any suggestion? Thanks, Thanks, Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- v2: - fix to cover dio read path w/ i_opu_rwsem as well. fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 28 ++-- fs/f2fs/super.c | 1 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 30058e16a5d0..91cf4b3d6bc6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 72ce1a522fb2..4ec260af321f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) const loff_t pos = iocb->ki_pos; const size_t count = iov_iter_count(to); struct iomap_dio *dio; + bool do_opu = f2fs_lfs_mode(sbi); ssize_t ret; if (count == 0) @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_read_trylock(&fi->i_opu_rwsem)) { + f2fs_up_read(&fi->i_gc_rwsem[READ]); + ret = -EAGAIN; + goto out; + } } else { f2fs_down_read(&fi->i_gc_rwsem[READ]); + f2fs_down_read(&fi->i_opu_rwsem); } /* @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = iomap_dio_complete(dio); } + f2fs_up_read(&fi->i_opu_rwsem); f2fs_up_read(&fi->i_gc_rwsem[READ]); file_accessed(file); @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(&F2FS_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(&F2FS_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_READ_IO, ret); @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 05/15, Chao Yu wrote: > On 2024/5/15 0:09, Jaegeuk Kim wrote: > > On 05/10, Chao Yu wrote: > > > If lfs mode is on, buffered read may race w/ OPU dio write as below, > > > it may cause buffered read hits unwritten data unexpectly, and for > > > dio read, the race condition exists as well. > > > > > > Thread A Thread B > > > - f2fs_file_write_iter > > > - f2fs_dio_write_iter > > >- __iomap_dio_rw > > > - f2fs_iomap_begin > > > - f2fs_map_blocks > > > - __allocate_data_block > > >- allocated blkaddr #x > > > - iomap_dio_submit_bio > > >- f2fs_file_read_iter > > > - filemap_read > > > - f2fs_read_data_folio > > > - f2fs_mpage_readpages > > >- f2fs_map_blocks > > > : get blkaddr #x > > >- f2fs_submit_read_bio > > >IRQ > > >- f2fs_read_end_io > > > : read IO on blkaddr #x complete > > > IRQ > > > - iomap_dio_bio_end_io > > > : direct write IO on blkaddr #x complete > > > > > > This patch introduces a new per-inode i_opu_rwsem lock to avoid > > > such race condition. > > > > Wasn't this supposed to be managed by user-land? > > Actually, the test case is: > > 1. mount w/ lfs mode > 2. touch file; > 3. initialize file w/ 4k zeroed data; fsync; > 4. continue triggering dio write 4k zeroed data to file; > 5. and meanwhile, continue triggering buf/dio 4k read in file, > use md5sum to verify the 4k data; > > It expects data is all zero, however it turned out it's not. Can we check outstanding write bios instead of abusing locks? > > Thanks, > > > > > > > > > Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS > > > mode") > > > Signed-off-by: Chao Yu > > > --- > > > v2: > > > - fix to cover dio read path w/ i_opu_rwsem as well. > > > fs/f2fs/f2fs.h | 1 + > > > fs/f2fs/file.c | 28 ++-- > > > fs/f2fs/super.c | 1 + > > > 3 files changed, 28 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > > index 30058e16a5d0..91cf4b3d6bc6 100644 > > > --- a/fs/f2fs/f2fs.h > > > +++ b/fs/f2fs/f2fs.h > > > @@ -847,6 +847,7 @@ struct f2fs_inode_info { > > >/* avoid racing between foreground op and gc */ > > >struct f2fs_rwsem i_gc_rwsem[2]; > > >struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and > > > changing EAs */ > > > + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read > > > and opu dio write */ > > > > > >int i_extra_isize; /* size of extra space located in > > > i_addr */ > > >kprojid_t i_projid; /* id for project quota */ > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > > index 72ce1a522fb2..4ec260af321f 100644 > > > --- a/fs/f2fs/file.c > > > +++ b/fs/f2fs/file.c > > > @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb > > > *iocb, struct iov_iter *to) > > >const loff_t pos = iocb->ki_pos; > > >const size_t count = iov_iter_count(to); > > >struct iomap_dio *dio; > > > + bool do_opu = f2fs_lfs_mode(sbi); > > >ssize_t ret; > > > > > >if (count == 0) > > > @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb > > > *iocb, struct iov_iter *to) > > >ret = -EAGAIN; > > >goto out; > > >} > > > + if (do_opu && !f2fs_down_read_trylock(&fi->i_opu_rwsem)) { > > > + f2fs_up_read(&fi->i_gc_rwsem[READ]); > > > + ret = -EAGAIN; > > > + goto out; > > > + } > > >} else { > > >f2fs_down_read(&fi->i_gc_rwsem[READ]); > > > + f2fs_down_read(&fi->i_opu_rwsem); > > >} > > > > > >/* > > > @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb > > > *iocb, struct iov_iter *to) > > >ret = iomap_dio_complete(dio); > > >} > > > > > > + f2fs_up_read(&fi->i_opu_rwsem); > > >f2fs_up_read(&fi->i_gc_rwsem[READ]); > > > > > >file_accessed(file); > > > @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb > > > *iocb, struct iov_iter *to) > > >if (f2fs_should_use_dio(inode, iocb, to)) { > > >ret = f2fs_dio_read_iter(iocb, to); > > >} else { > > > + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); > > > + > > > + if (do_opu) > > > + f2fs_down_read(&F2FS_I(inode)->i_opu_rwsem); > > >ret = filemap_read(iocb, to, 0); > > > + if (do_opu) > > > + f2fs_up_read(&F2FS_I(inode)->i_opu_rwsem);
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/5/15 0:09, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Wasn't this supposed to be managed by user-land? Actually, the test case is: 1. mount w/ lfs mode 2. touch file; 3. initialize file w/ 4k zeroed data; fsync; 4. continue triggering dio write 4k zeroed data to file; 5. and meanwhile, continue triggering buf/dio 4k read in file, use md5sum to verify the 4k data; It expects data is all zero, however it turned out it's not. Thanks, Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- v2: - fix to cover dio read path w/ i_opu_rwsem as well. fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 28 ++-- fs/f2fs/super.c | 1 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 30058e16a5d0..91cf4b3d6bc6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 72ce1a522fb2..4ec260af321f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) const loff_t pos = iocb->ki_pos; const size_t count = iov_iter_count(to); struct iomap_dio *dio; + bool do_opu = f2fs_lfs_mode(sbi); ssize_t ret; if (count == 0) @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_read_trylock(&fi->i_opu_rwsem)) { + f2fs_up_read(&fi->i_gc_rwsem[READ]); + ret = -EAGAIN; + goto out; + } } else { f2fs_down_read(&fi->i_gc_rwsem[READ]); + f2fs_down_read(&fi->i_opu_rwsem); } /* @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = iomap_dio_complete(dio); } + f2fs_up_read(&fi->i_opu_rwsem); f2fs_up_read(&fi->i_gc_rwsem[READ]); file_accessed(file); @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(&F2FS_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(&F2FS_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_READ_IO, ret); @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from, ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_write_trylock(&fi->i_opu_rwsem)) { + f2fs_up_read(&fi->i_gc_rwsem[READ]); + f2fs_up_read(&fi->i_gc_rwsem[WRITE]); + ret = -EAGAIN; + goto out; + } } else { ret = f2fs_convert_inline_inode(inode); if (ret) goto ou
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 05/10, Chao Yu wrote: > If lfs mode is on, buffered read may race w/ OPU dio write as below, > it may cause buffered read hits unwritten data unexpectly, and for > dio read, the race condition exists as well. > > Thread A Thread B > - f2fs_file_write_iter > - f2fs_dio_write_iter > - __iomap_dio_rw >- f2fs_iomap_begin > - f2fs_map_blocks > - __allocate_data_block > - allocated blkaddr #x >- iomap_dio_submit_bio > - f2fs_file_read_iter >- filemap_read > - f2fs_read_data_folio > - f2fs_mpage_readpages > - f2fs_map_blocks >: get blkaddr #x > - f2fs_submit_read_bio > IRQ > - f2fs_read_end_io >: read IO on blkaddr #x complete > IRQ > - iomap_dio_bio_end_io > : direct write IO on blkaddr #x complete > > This patch introduces a new per-inode i_opu_rwsem lock to avoid > such race condition. Wasn't this supposed to be managed by user-land? > > Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") > Signed-off-by: Chao Yu > --- > v2: > - fix to cover dio read path w/ i_opu_rwsem as well. > fs/f2fs/f2fs.h | 1 + > fs/f2fs/file.c | 28 ++-- > fs/f2fs/super.c | 1 + > 3 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 30058e16a5d0..91cf4b3d6bc6 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -847,6 +847,7 @@ struct f2fs_inode_info { > /* avoid racing between foreground op and gc */ > struct f2fs_rwsem i_gc_rwsem[2]; > struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and > changing EAs */ > + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and > opu dio write */ > > int i_extra_isize; /* size of extra space located in > i_addr */ > kprojid_t i_projid; /* id for project quota */ > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 72ce1a522fb2..4ec260af321f 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, > struct iov_iter *to) > const loff_t pos = iocb->ki_pos; > const size_t count = iov_iter_count(to); > struct iomap_dio *dio; > + bool do_opu = f2fs_lfs_mode(sbi); > ssize_t ret; > > if (count == 0) > @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, > struct iov_iter *to) > ret = -EAGAIN; > goto out; > } > + if (do_opu && !f2fs_down_read_trylock(&fi->i_opu_rwsem)) { > + f2fs_up_read(&fi->i_gc_rwsem[READ]); > + ret = -EAGAIN; > + goto out; > + } > } else { > f2fs_down_read(&fi->i_gc_rwsem[READ]); > + f2fs_down_read(&fi->i_opu_rwsem); > } > > /* > @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, > struct iov_iter *to) > ret = iomap_dio_complete(dio); > } > > + f2fs_up_read(&fi->i_opu_rwsem); > f2fs_up_read(&fi->i_gc_rwsem[READ]); > > file_accessed(file); > @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, > struct iov_iter *to) > if (f2fs_should_use_dio(inode, iocb, to)) { > ret = f2fs_dio_read_iter(iocb, to); > } else { > + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); > + > + if (do_opu) > + f2fs_down_read(&F2FS_I(inode)->i_opu_rwsem); > ret = filemap_read(iocb, to, 0); > + if (do_opu) > + f2fs_up_read(&F2FS_I(inode)->i_opu_rwsem); > if (ret > 0) > f2fs_update_iostat(F2FS_I_SB(inode), inode, > APP_BUFFERED_READ_IO, ret); > @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb > *iocb, struct iov_iter *from, > ret = -EAGAIN; > goto out; > } > + if (do_opu && !f2fs_down_write_trylock(&fi->i_opu_rwsem)) { > + f2fs_up_read(&fi->i_gc_rwsem[READ]); > + f2fs_up_read(&fi->i_gc_rwsem[WRITE]); > + ret = -EAGAIN; > + goto out; > + } > } else { > ret = f2fs_convert_inline_inode(inode); > if (ret) > goto out; > > f2fs_down_read(&fi->i_gc_rwsem[WRITE]); > - if (do_opu) > + if (do_opu) { > f2fs_down_read(&fi->i_gc_rwsem[READ]); > +