Re: [f2fs-dev] [External Mail] Re: [External Mail] Re: [PATCH] f2fs: prevent the current section from being selected as a victim during garbage collection
> From: Chao Yu > Sent: Thursday, March 27, 2025 4:30 PM > To: 정요한(JOUNG YOHAN) Mobile AE ; Yohan Joung > ; jaeg...@kernel.org; daeh...@gmail.com > Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; linux- > ker...@vger.kernel.org; 김필현(KIM PILHYUN) Mobile AE > Subject: [External Mail] Re: [External Mail] Re: [PATCH] f2fs: prevent the > current section from being selected as a victim during garbage collection > > On 3/27/25 14:43, yohan.jo...@sk.com wrote: > >> From: Chao Yu > >> Sent: Thursday, March 27, 2025 3:02 PM > >> To: Yohan Joung ; jaeg...@kernel.org; > >> daeh...@gmail.com > >> Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; linux- > >> ker...@vger.kernel.org; 정요한(JOUNG YOHAN) Mobile AE > >> > >> Subject: [External Mail] Re: [PATCH] f2fs: prevent the current > >> section from being selected as a victim during garbage collection > >> > >> On 3/26/25 22:14, Yohan Joung wrote: > >>> When selecting a victim using next_victim_seg in a large section, > >>> the selected section might already have been cleared and designated > >>> as the new current section, making it actively in use. > >>> This behavior causes inconsistency between the SIT and SSA. > >> > >> Hi, does this fix your issue? > > > > This is an issue that arises when dividing a large section into > > segments for garbage collection. > > caused by the background GC (garbage collection) thread in large > > section > > f2fs_gc(victim_section) -> > > f2fs_clear_prefree_segments(victim_section)-> > > cursec(victim_section) -> f2fs_gc(victim_section by next_victim_seg) > > I didn't get it, why f2fs_get_victim() will return section which is used > by curseg? It should be avoided by checking w/ sec_usage_check(). > > Or we missed to check gcing section which next_victim_seg points to during > get_new_segment()? > > Can this happen? > > e.g. > - bggc selects sec #0 > - next_victim_seg: seg #0 > - migrate seg #0 and stop > - next_victim_seg: seg #1 > - checkpoint, set sec #0 free if sec #0 has no valid blocks > - allocate seg #0 in sec #0 for curseg > - curseg moves to seg #1 after allocation > - bggc tries to migrate seg #1 > > Thanks, That's correct In f2fs_get_victim, we use next_victim_seg to directly jump to got_result, thereby bypassing sec_usage_check What do you think about this change? @@ -850,15 +850,20 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result, p.min_segno = sbi->next_victim_seg[BG_GC]; *result = p.min_segno; sbi->next_victim_seg[BG_GC] = NULL_SEGNO; - goto got_result; } if (gc_type == FG_GC && sbi->next_victim_seg[FG_GC] != NULL_SEGNO) { p.min_segno = sbi->next_victim_seg[FG_GC]; *result = p.min_segno; sbi->next_victim_seg[FG_GC] = NULL_SEGNO; - goto got_result; } + + secno = GET_SEC_FROM_SEG(sbi, segno); + + if (sec_usage_check(sbi, secno)) + goto next; + + goto got_result; } > > > > > Because the call stack is different, > > I think that in order to handle everything at once, we need to address > > it within do_garbage_collect, or otherwise include it on both sides. > > What do you think? > > > > [30146.337471][ T1300] F2FS-fs (dm-54): Inconsistent segment (70961) > > type [0, 1] in SSA and SIT [30146.346151][ T1300] Call trace: > > [30146.346152][ T1300] dump_backtrace+0xe8/0x10c [30146.346157][ > > T1300] show_stack+0x18/0x28 [30146.346158][ T1300] > > dump_stack_lvl+0x50/0x6c [30146.346161][ T1300] dump_stack+0x18/0x28 > > [30146.346162][ T1300] f2fs_stop_checkpoint+0x1c/0x3c [30146.346165][ > > T1300] do_garbage_collect+0x41c/0x271c [30146.346167][ T1300] > > f2fs_gc+0x27c/0x828 [30146.346168][ T1300] gc_thread_func+0x290/0x88c > > [30146.346169][ T1300] kthread+0x11c/0x164 [30146.346172][ T1300] > > ret_from_fork+0x10/0x20 > > > > struct curseg_info : 0xff803f95e800 { > > segno: 0x11531 : 70961 > > } > > > > struct f2fs_sb_info : 0xff8811d12000 { > > next_victim_seg[0] : 0x11531 : 70961 > > } > > > >> > >> https://lore.kernel.org/linux-f2fs-devel/20250325080646.3291947-2- > >> c...@kernel.org > >> > >> Thanks, > >> > >>> > >>> Signed-off-b
Re: [f2fs-dev] [External Mail] Re: [PATCH] f2fs: prevent the current section from being selected as a victim during garbage collection
> From: Chao Yu > Sent: Thursday, March 27, 2025 3:02 PM > To: Yohan Joung ; jaeg...@kernel.org; daeh...@gmail.com > Cc: c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; linux- > ker...@vger.kernel.org; 정요한(JOUNG YOHAN) Mobile AE > Subject: [External Mail] Re: [PATCH] f2fs: prevent the current section > from being selected as a victim during garbage collection > > On 3/26/25 22:14, Yohan Joung wrote: > > When selecting a victim using next_victim_seg in a large section, the > > selected section might already have been cleared and designated as the > > new current section, making it actively in use. > > This behavior causes inconsistency between the SIT and SSA. > > Hi, does this fix your issue? This is an issue that arises when dividing a large section into segments for garbage collection. caused by the background GC (garbage collection) thread in large section f2fs_gc(victim_section) -> f2fs_clear_prefree_segments(victim_section)-> cursec(victim_section) -> f2fs_gc(victim_section by next_victim_seg) Because the call stack is different, I think that in order to handle everything at once, we need to address it within do_garbage_collect, or otherwise include it on both sides. What do you think? [30146.337471][ T1300] F2FS-fs (dm-54): Inconsistent segment (70961) type [0, 1] in SSA and SIT [30146.346151][ T1300] Call trace: [30146.346152][ T1300] dump_backtrace+0xe8/0x10c [30146.346157][ T1300] show_stack+0x18/0x28 [30146.346158][ T1300] dump_stack_lvl+0x50/0x6c [30146.346161][ T1300] dump_stack+0x18/0x28 [30146.346162][ T1300] f2fs_stop_checkpoint+0x1c/0x3c [30146.346165][ T1300] do_garbage_collect+0x41c/0x271c [30146.346167][ T1300] f2fs_gc+0x27c/0x828 [30146.346168][ T1300] gc_thread_func+0x290/0x88c [30146.346169][ T1300] kthread+0x11c/0x164 [30146.346172][ T1300] ret_from_fork+0x10/0x20 struct curseg_info : 0xff803f95e800 { segno: 0x11531 : 70961 } struct f2fs_sb_info : 0xff8811d12000 { next_victim_seg[0] : 0x11531 : 70961 } > > https://lore.kernel.org/linux-f2fs-devel/20250325080646.3291947-2- > c...@kernel.org > > Thanks, > > > > > Signed-off-by: Yohan Joung > > --- > > fs/f2fs/gc.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index > > 2b8f9239bede..4b5d18e395eb 100644 > > --- a/fs/f2fs/gc.c > > +++ b/fs/f2fs/gc.c > > @@ -1926,6 +1926,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct > f2fs_gc_control *gc_control) > > goto stop; > > } > > > > + if (__is_large_section(sbi) && > > + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, segno))) > > + goto stop; > > + > > seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type, > > gc_control->should_migrate_blocks, > > gc_control->one_time); ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [External Mail] Re: [PATCH] f2fs: optimize f2fs DIO overwrites
> From: Daeho Jeong > Sent: Wednesday, March 12, 2025 6:14 AM > To: Chao Yu > Cc: Yohan Joung ; jaeg...@kernel.org; linux-f2fs- > de...@lists.sourceforge.net; linux-ker...@vger.kernel.org; 정요한(JOUNG > YOHAN) Mobile AE > Subject: [External Mail] Re: [PATCH] f2fs: optimize f2fs DIO overwrites > > On Tue, Mar 11, 2025 at 5:00 AM Chao Yu wrote: > > > > On 3/7/25 22:56, Yohan Joung wrote: > > > this is unnecessary when we know we are overwriting already > > > allocated blocks and the overhead of starting a transaction can be > > > significant especially for multithreaded workloads doing small writes. > > > > Hi Yohan, > > > > So you're trying to avoid f2fs_map_lock() in dio write path, right? > > > > Thanks, > > > > > > > > Signed-off-by: Yohan Joung > > > --- > > > fs/f2fs/data.c | 20 fs/f2fs/f2fs.h | 1 + > > > fs/f2fs/file.c | 5 - > > > 3 files changed, 25 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index > > > 09437dbd1b42..728630037b74 100644 > > > --- a/fs/f2fs/data.c > > > +++ b/fs/f2fs/data.c > > > @@ -4267,6 +4267,26 @@ static int f2fs_iomap_begin(struct inode *inode, > loff_t offset, loff_t length, > > > return 0; > > > } > > > > > > +static int f2fs_iomap_overwrite_begin(struct inode *inode, loff_t > offset, > > > + loff_t length, unsigned flags, struct iomap *iomap, > > > + struct iomap *srcmap) > > > +{ > > > + int ret; > > > + > > > + /* > > > + * Even for writes we don't need to allocate blocks, so just > pretend > > > + * we are reading to save overhead of starting a transaction. > > > + */ > > > + flags &= ~IOMAP_WRITE; > > > + ret = f2fs_iomap_begin(inode, offset, length, flags, iomap, > srcmap); > > > + WARN_ON_ONCE(!ret && iomap->type != IOMAP_MAPPED); > > > + return ret; > > > +} > > > + > > > const struct iomap_ops f2fs_iomap_ops = { > > > .iomap_begin= f2fs_iomap_begin, > > > }; > > > + > > > +const struct iomap_ops f2fs_iomap_overwrite_ops = { > > > + .iomap_begin= f2fs_iomap_overwrite_begin, > > > +}; > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index > > > c6cc2694f9ac..0511ab5ed42a 100644 > > > --- a/fs/f2fs/f2fs.h > > > +++ b/fs/f2fs/f2fs.h > > > @@ -3936,6 +3936,7 @@ void f2fs_destroy_post_read_processing(void); > > > int f2fs_init_post_read_wq(struct f2fs_sb_info *sbi); void > > > f2fs_destroy_post_read_wq(struct f2fs_sb_info *sbi); extern const > > > struct iomap_ops f2fs_iomap_ops; > > > +extern const struct iomap_ops f2fs_iomap_overwrite_ops; > > > > > > static inline struct page *f2fs_find_data_page(struct inode *inode, > > > pgoff_t index, pgoff_t *next_pgofs) diff --git > > > a/fs/f2fs/file.c b/fs/f2fs/file.c index 82b21baf5628..bb2fe6dac9b6 > > > 100644 > > > --- a/fs/f2fs/file.c > > > +++ b/fs/f2fs/file.c > > > @@ -4985,6 +4985,7 @@ static ssize_t f2fs_dio_write_iter(struct kiocb > *iocb, struct iov_iter *from, > > > const ssize_t count = iov_iter_count(from); > > > unsigned int dio_flags; > > > struct iomap_dio *dio; > > > + const struct iomap_ops *iomap_ops = &f2fs_iomap_ops; > > > ssize_t ret; > > > > > > trace_f2fs_direct_IO_enter(inode, iocb, count, WRITE); @@ > > > -5025,7 +5026,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb > *iocb, struct iov_iter *from, > > > dio_flags = 0; > > > if (pos + count > inode->i_size) > > > dio_flags |= IOMAP_DIO_FORCE_WAIT; > > > - dio = __iomap_dio_rw(iocb, from, &f2fs_iomap_ops, > > > + else if (f2fs_overwrite_io(inode, pos, count)) > > > + iomap_ops = &f2fs_iomap_overwrite_ops; > > > + dio = __iomap_dio_rw(iocb, from, iomap_ops, > > >&f2fs_iomap_dio_write_ops, dio_flags, NULL, 0); > > > if (IS_ERR_OR_NULL(dio)) { > > > ret = PTR_ERR_OR_ZERO(dio); > > I think we can add the overwrite check in f2fs_iomap_begin() before > setting the map.m_may_create, instead of adding a new function > f2fs_iomap_overwrite_begin(). > What do you think? Daeho, Is this the way you want it changed? If so, I'll upload it like this static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, unsigned int flags, struct iomap *iomap, struct iomap *srcmap) { struct f2fs_map_blocks map = {}; pgoff_t next_pgofs = 0; int err; map.m_lblk = F2FS_BYTES_TO_BLK(offset); map.m_len = F2FS_BYTES_TO_BLK(offset + length - 1) - map.m_lblk + 1; map.m_next_pgofs = &next_pgofs; map.m_seg_type = f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), inode->i_write_hint); if ((flags & IOMAP_WRITE) && !f2fs_overwrite_io(inode, offset, length)) map.m_may_create = true; > > > ___ Linux-f2fs-devel mailing list Linux-f2fs-deve