Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
Hi Chao, Thanks for the patch. I'll try it out on both my laptop and workstation soon. One question though: would it make sense to see if it works fine on Android too? (With userspace's explicit GC trigger disabled.) Maybe it could be an indication on whether it works properly or not? Thanks, On Thu, May 18, 2023 at 4:53 PM Chao Yu wrote: > > On 2023/4/21 1:26, Juhyung Park wrote: > > Hi Chao, > > > > On Fri, Apr 21, 2023 at 1:19 AM Chao Yu wrote: > >> > >> Hi JuHyung, > >> > >> Sorry for delay reply. > >> > >> On 2023/4/11 1:03, Juhyung Park wrote: > >>> Hi Chao, > >>> > >>> On Tue, Apr 11, 2023 at 12:44 AM Chao Yu wrote: > > Hi Juhyung, > > On 2023/4/4 15:36, Juhyung Park wrote: > > Hi everyone, > > > > I want to start a discussion on using f2fs for regular > > desktops/workstations. > > > > There are growing number of interests in using f2fs as the general > > root file-system: > > 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS > > 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option > > 2023: > > https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880 > > 2023: > > https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193 > > > > I've been personally running f2fs on all of my x86 Linux boxes since > > 2015, and I have several concerns that I think we need to collectively > > address for regular non-Android normies to use f2fs: > > > > A. Bootloader and installer support > > B. Host-side GC > > C. Extended node bitmap > > > > I'll go through each one. > > > > === A. Bootloader and installer support === > > > > It seems that both GRUB and systemd-boot supports f2fs without the > > need for a separate ext4-formatted /boot partition. > > Some distros are seemingly disabling f2fs module for GRUB though for > > security reasons: > > https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664 > > > > It's ultimately up to the distro folks to enable this, and still in > > the worst-case scenario, they can specify a separate /boot partition > > and format it to ext4 upon installation. > > > > The installer itself to show f2fs and call mkfs.f2fs is being worked > > on currently on Ubuntu. See the 2023 links above. > > > > Nothing f2fs mainline developers should do here, imo. > > > > === B. Host-side GC === > > > > f2fs relieves most of the device-side GC but introduces a new > > host-side GC. This is extremely confusing for people who have no > > background in SSDs and flash storage to understand, let alone > > discard/trim/erase complications. > > > > In most consumer-grade blackbox SSDs, device-side GCs are handled > > automatically for various workloads. f2fs, however, leaves that > > responsibility to the userspace with conservative tuning on the > > We've proposed a f2fs feature named "space awared garbage collection" > and shipped it in huawei/honor's devices, but forgot to try upstreaming > it. :-P > > In this feature, we introduced three mode: > - performance mode: something like write-gc in ftl, it can trigger > background gc more frequently and tune its speed according to free > segs and reclaimable blks ratio. > - lifetime mode: slow down background gc to avoid high waf if there > is less free space. > - balance mode: behave as usual. > > I guess this may be helpful for Linux desktop distros since there is > no such storage service trigger gc_urgent. > > >>> > >>> That indeed sounds interesting. > >>> > >>> If you need me to test something out, feel free to ask. > >> > >> Thanks a lot for that. :) > >> > >> I'm trying to figure out a patch... > > Juhyung, > > Are you interesting to try this patch in distros? > > https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test=4736e55bc967e91cf8a275b678739b006c2617f0 > > There are some tunable parameters, I can export them via sysfs entry, > let me update later. > > Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/1] f2fs: pass I_NEW flag to trace event
On 2023/5/18 08:32, Jaegeuk Kim wrote: On 05/17, Wu Bo wrote: On 2023/5/17 16:36, Chao Yu wrote: On 2023/5/17 11:59, Wu Bo wrote: On 2023/5/17 10:44, Chao Yu wrote: On 2023/5/16 20:07, Wu Bo wrote: Modify the order between 'trace_f2fs_iget' & 'unlock_new_inode', so the I_NEW can pass to the trace event when the inode initialised. Why is it needed? And trace_f2fs_iget() won't print inode->i_state? When connect a trace_probe to f2fs_iget, it will be able to determine whether the inode is new initialised in order to do different process. I didn't get it, you want to hook __tracepoint_f2fs_iget() w/ your own callback? Yes, to use 'tracepoint_probe_register ' to register a probe at trace_f2fs_iget Why? Sorry, I don't understand what is your real question. In my understanding, a trace_event is also a non-volatile point in kernel for probing. And for my case, I want to develop a tool by trace_probe to collect some information. Thanks Thanks, Thanks, Signed-off-by: Wu Bo --- fs/f2fs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index cf4327ad106c..caf959289fe7 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -577,8 +577,8 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) file_dont_truncate(inode); } - unlock_new_inode(inode); trace_f2fs_iget(inode); + unlock_new_inode(inode); return inode; bad_inode: ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v5] fsck.f2fs: Detect and fix looped node chain efficiently
On 2023/5/18 20:26, Chunhai Guo wrote: find_fsync_inode() detect the looped node chain by comparing the loop counter with free blocks. While it may take tens of seconds to quit when the free blocks are large enough. We can use Floyd's cycle detection algorithm to make the detection more efficient, and fix the issue by filling a NULL address in the last node of the chain. Below is the log we encounter on a 256GB UFS storage and it takes about 25 seconds to detect looped node chain. After changing the algorithm, it takes about 20ms to finish the same job. [ 10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430 [ 10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to update superblock [ 10.822953] fsck.f2fs: Info: superblock features = 1499 : encrypt verity extra_attr project_quota quota_ino casefold [ 10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt = [ 10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444 MB) [ 35.852827] fsck.f2fs: detect looped node chain, blkaddr:1114802, next:1114803 [ 35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data failed [ 35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255) Signed-off-by: Chunhai Guo Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v5] fsck.f2fs: Detect and fix looped node chain efficiently
find_fsync_inode() detect the looped node chain by comparing the loop counter with free blocks. While it may take tens of seconds to quit when the free blocks are large enough. We can use Floyd's cycle detection algorithm to make the detection more efficient, and fix the issue by filling a NULL address in the last node of the chain. Below is the log we encounter on a 256GB UFS storage and it takes about 25 seconds to detect looped node chain. After changing the algorithm, it takes about 20ms to finish the same job. [ 10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430 [ 10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to update superblock [ 10.822953] fsck.f2fs: Info: superblock features = 1499 : encrypt verity extra_attr project_quota quota_ino casefold [ 10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt = [ 10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444 MB) [ 35.852827] fsck.f2fs: detect looped node chain, blkaddr:1114802, next:1114803 [ 35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data failed [ 35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255) Signed-off-by: Chunhai Guo --- v4 -> v5 : Use IS_INODE() to make the code more clear. v3 -> v4 : Set c.bug_on with ASSERT_MSG() when issue is detected and fix it only if c.fix_on is 1. v2 -> v3 : Write inode with write_inode() to avoid chksum being broken. v1 -> v2 : Fix looped node chain directly after it is detected. --- fsck/mount.c | 128 +++ 1 file changed, 110 insertions(+), 18 deletions(-) diff --git a/fsck/mount.c b/fsck/mount.c index df0314d57caf..c98b7ba00b21 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -3394,22 +3394,91 @@ static void destroy_fsync_dnodes(struct list_head *head) del_fsync_inode(entry); } +static int find_node_blk_fast(struct f2fs_sb_info *sbi, block_t *blkaddr_fast, + struct f2fs_node *node_blk_fast, bool *is_detecting) +{ + int i, err; + + for (i = 0; i < 2; i++) { + if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) { + *is_detecting = false; + return 0; + } + + err = dev_read_block(node_blk_fast, *blkaddr_fast); + if (err) + return err; + + if (!is_recoverable_dnode(sbi, node_blk_fast)) { + *is_detecting = false; + return 0; + } + + *blkaddr_fast = next_blkaddr_of_node(node_blk_fast); + } + + return 0; +} + +static int loop_node_chain_fix(struct f2fs_sb_info *sbi, + block_t blkaddr_fast, struct f2fs_node *node_blk_fast, + block_t blkaddr, struct f2fs_node *node_blk) +{ + block_t blkaddr_entry, blkaddr_tmp; + int err; + + /* find the entry point of the looped node chain */ + while (blkaddr_fast != blkaddr) { + err = dev_read_block(node_blk_fast, blkaddr_fast); + if (err) + return err; + blkaddr_fast = next_blkaddr_of_node(node_blk_fast); + + err = dev_read_block(node_blk, blkaddr); + if (err) + return err; + blkaddr = next_blkaddr_of_node(node_blk); + } + blkaddr_entry = blkaddr; + + /* find the last node of the chain */ + do { + blkaddr_tmp = blkaddr; + err = dev_read_block(node_blk, blkaddr); + if (err) + return err; + blkaddr = next_blkaddr_of_node(node_blk); + } while (blkaddr != blkaddr_entry); + + /* fix the blkaddr of last node with NULL_ADDR. */ + node_blk->footer.next_blkaddr = NULL_ADDR; + if (IS_INODE(node_blk)) + err = write_inode(node_blk, blkaddr_tmp); + else + err = dev_write_block(node_blk, blkaddr_tmp); + if (!err) + FIX_MSG("Fix looped node chain on blkaddr %u\n", + blkaddr_tmp); + return err; +} + static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head) { struct curseg_info *curseg; - struct f2fs_node *node_blk; - block_t blkaddr; - unsigned int loop_cnt = 0; - unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg - - sbi->total_valid_block_count; + struct f2fs_node *node_blk, *node_blk_fast; + block_t blkaddr, blkaddr_fast; + bool is_detecting = true; int err = 0; + node_blk = calloc(F2FS_BLKSIZE, 1); + node_blk_fast = calloc(F2FS_BLKSIZE, 1); + ASSERT(node_blk && node_blk_fast); + +retry: /* get node pages in the current segment */ curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
[f2fs-dev] [PATCH] f2fs: fix to use le32_to_cpu() in RAW_IS_INODE()
__le32 type variable should be converted w/ le32_to_cpu() before access. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 7f6c51a6b930..a4bff3b5b887 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2840,7 +2840,11 @@ static inline void f2fs_radix_tree_insert(struct radix_tree_root *root, cond_resched(); } -#define RAW_IS_INODE(p)((p)->footer.nid == (p)->footer.ino) +static inline bool RAW_IS_INODE(struct f2fs_node *node) +{ + return le32_to_cpu(node->footer.ino) == + le32_to_cpu(node->footer.nid); +} static inline bool IS_INODE(struct page *page) { -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] fsck.f2fs: fix to use le32_to_cpu() in IS_INODE()
And use IS_INODE() to clean up codes. Signed-off-by: Chao Yu --- fsck/fsck.c | 7 +++ fsck/mount.c | 4 ++-- fsck/node.h | 3 ++- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fsck/fsck.c b/fsck/fsck.c index d03f1da..ac4cd98 100644 --- a/fsck/fsck.c +++ b/fsck/fsck.c @@ -247,7 +247,7 @@ static int is_valid_summary(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, goto out; /* check its block address */ - if (node_blk->footer.nid == node_blk->footer.ino) { + if (IS_INODE(node_blk)) { int ofs = get_extra_isize(node_blk); if (ofs + ofs_in_node >= DEF_ADDRS_PER_INODE) @@ -447,8 +447,7 @@ static int sanity_check_nid(struct f2fs_sb_info *sbi, u32 nid, nid, ni->ino, le32_to_cpu(node_blk->footer.ino)); return -EINVAL; } - if (ntype != TYPE_INODE && - node_blk->footer.nid == node_blk->footer.ino) { + if (ntype != TYPE_INODE && IS_INODE(node_blk)) { ASSERT_MSG("nid[0x%x] footer.nid[0x%x] footer.ino[0x%x]", nid, le32_to_cpu(node_blk->footer.nid), le32_to_cpu(node_blk->footer.ino)); @@ -3081,7 +3080,7 @@ static int fsck_reconnect_file(struct f2fs_sb_info *sbi) ASSERT(err >= 0); /* reconnection will restore these nodes if needed */ - if (node->footer.ino != node->footer.nid) { + if (!IS_INODE(node)) { DBG(1, "Not support non-inode node [0x%x]\n", nid); continue; diff --git a/fsck/mount.c b/fsck/mount.c index 4c74888..70619c9 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -2420,7 +2420,7 @@ void update_data_blkaddr(struct f2fs_sb_info *sbi, nid_t nid, ASSERT(ret >= 0); /* check its block address */ - if (node_blk->footer.nid == node_blk->footer.ino) { + if (IS_INODE(node_blk)) { int ofs = get_extra_isize(node_blk); oldaddr = le32_to_cpu(node_blk->i.i_addr[ofs + ofs_in_node]); @@ -2435,7 +2435,7 @@ void update_data_blkaddr(struct f2fs_sb_info *sbi, nid_t nid, } /* check extent cache entry */ - if (node_blk->footer.nid != node_blk->footer.ino) { + if (!IS_INODE(node_blk)) { get_node_info(sbi, le32_to_cpu(node_blk->footer.ino), ); /* read inode block */ diff --git a/fsck/node.h b/fsck/node.h index 99139b1..2ba7b8c 100644 --- a/fsck/node.h +++ b/fsck/node.h @@ -20,7 +20,8 @@ static inline int IS_INODE(struct f2fs_node *node) { - return ((node)->footer.nid == (node)->footer.ino); + return le32_to_cpu(node->footer.ino) == + le32_to_cpu(node->footer.nid); } static inline unsigned int ADDRS_PER_PAGE(struct f2fs_sb_info *sbi, -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v4] fsck.f2fs: Detect and fix looped node chain efficiently
On 2023/5/18 17:10, Chao Yu wrote: On 2023/5/18 12:11, Chunhai Guo wrote: find_fsync_inode() detect the looped node chain by comparing the loop counter with free blocks. While it may take tens of seconds to quit when the free blocks are large enough. We can use Floyd's cycle detection algorithm to make the detection more efficient, and fix the issue by filling a NULL address in the last node of the chain. Below is the log we encounter on a 256GB UFS storage and it takes about 25 seconds to detect looped node chain. After changing the algorithm, it takes about 20ms to finish the same job. [ 10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430 [ 10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to update superblock [ 10.822953] fsck.f2fs: Info: superblock features = 1499 : encrypt verity extra_attr project_quota quota_ino casefold [ 10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt = [ 10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444 MB) [ 35.852827] fsck.f2fs: detect looped node chain, blkaddr:1114802, next:1114803 [ 35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data failed [ 35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255) Signed-off-by: Chunhai Guo --- fsck/mount.c | 128 +++ 1 file changed, 110 insertions(+), 18 deletions(-) diff --git a/fsck/mount.c b/fsck/mount.c index df0314d57caf..755b659f0c27 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -3394,22 +3394,91 @@ static void destroy_fsync_dnodes(struct list_head *head) del_fsync_inode(entry); } +static int find_node_blk_fast(struct f2fs_sb_info *sbi, block_t *blkaddr_fast, + struct f2fs_node *node_blk_fast, bool *is_detecting) +{ + int i, err; + + for (i = 0; i < 2; i++) { + if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) { + *is_detecting = false; + return 0; + } + + err = dev_read_block(node_blk_fast, *blkaddr_fast); + if (err) + return err; + + if (!is_recoverable_dnode(sbi, node_blk_fast)) { + *is_detecting = false; + return 0; + } + + *blkaddr_fast = next_blkaddr_of_node(node_blk_fast); + } + + return 0; +} + +static int loop_node_chain_fix(struct f2fs_sb_info *sbi, + block_t blkaddr_fast, struct f2fs_node *node_blk_fast, + block_t blkaddr, struct f2fs_node *node_blk) +{ + block_t blkaddr_entry, blkaddr_tmp; + int err; + + /* find the entry point of the looped node chain */ + while (blkaddr_fast != blkaddr) { + err = dev_read_block(node_blk_fast, blkaddr_fast); + if (err) + return err; + blkaddr_fast = next_blkaddr_of_node(node_blk_fast); + + err = dev_read_block(node_blk, blkaddr); + if (err) + return err; + blkaddr = next_blkaddr_of_node(node_blk); + } + blkaddr_entry = blkaddr; + + /* find the last node of the chain */ + do { + blkaddr_tmp = blkaddr; + err = dev_read_block(node_blk, blkaddr); + if (err) + return err; + blkaddr = next_blkaddr_of_node(node_blk); + } while (blkaddr != blkaddr_entry); + + /* fix the blkaddr of last node with NULL_ADDR. */ + node_blk->footer.next_blkaddr = NULL_ADDR; + if (node_blk->footer.nid == node_blk->footer.ino) if (le32_to_cpu(node_blk->footer.nid) == le32_to_cpu(node_blk->footer.ino)) Oh, we can use IS_INODE() here? Thanks, Otherwise, it looks good to me. Thanks, + err = write_inode(node_blk, blkaddr_tmp); + else + err = dev_write_block(node_blk, blkaddr_tmp); + if (!err) + FIX_MSG("Fix looped node chain on blkaddr %u\n", + blkaddr_tmp); + return err; +} + static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head) { struct curseg_info *curseg; - struct f2fs_node *node_blk; - block_t blkaddr; - unsigned int loop_cnt = 0; - unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg - - sbi->total_valid_block_count; + struct f2fs_node *node_blk, *node_blk_fast; + block_t blkaddr, blkaddr_fast; + bool is_detecting = true; int err = 0; + node_blk = calloc(F2FS_BLKSIZE, 1); + node_blk_fast = calloc(F2FS_BLKSIZE, 1); + ASSERT(node_blk && node_blk_fast); + +retry: /* get node pages in the current segment */ curseg = CURSEG_I(sbi, CURSEG_WARM_NODE); blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); - - node_blk = calloc(F2FS_BLKSIZE, 1); - ASSERT(node_blk); + blkaddr_fast = blkaddr; while (1) { struct fsync_inode_entry *entry; @@ -3440,19 +3509,42 @@ static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head) if (IS_INODE(node_blk) && is_dent_dnode(node_blk)) entry->last_dentry = blkaddr;
Re: [f2fs-dev] [PATCH v4] fsck.f2fs: Detect and fix looped node chain efficiently
On 2023/5/18 12:11, Chunhai Guo wrote: find_fsync_inode() detect the looped node chain by comparing the loop counter with free blocks. While it may take tens of seconds to quit when the free blocks are large enough. We can use Floyd's cycle detection algorithm to make the detection more efficient, and fix the issue by filling a NULL address in the last node of the chain. Below is the log we encounter on a 256GB UFS storage and it takes about 25 seconds to detect looped node chain. After changing the algorithm, it takes about 20ms to finish the same job. [ 10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430 [ 10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to update superblock [ 10.822953] fsck.f2fs: Info: superblock features = 1499 : encrypt verity extra_attr project_quota quota_ino casefold [ 10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt = [ 10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444 MB) [ 35.852827] fsck.f2fs: detect looped node chain, blkaddr:1114802, next:1114803 [ 35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data failed [ 35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255) Signed-off-by: Chunhai Guo --- fsck/mount.c | 128 +++ 1 file changed, 110 insertions(+), 18 deletions(-) diff --git a/fsck/mount.c b/fsck/mount.c index df0314d57caf..755b659f0c27 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -3394,22 +3394,91 @@ static void destroy_fsync_dnodes(struct list_head *head) del_fsync_inode(entry); } +static int find_node_blk_fast(struct f2fs_sb_info *sbi, block_t *blkaddr_fast, + struct f2fs_node *node_blk_fast, bool *is_detecting) +{ + int i, err; + + for (i = 0; i < 2; i++) { + if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) { + *is_detecting = false; + return 0; + } + + err = dev_read_block(node_blk_fast, *blkaddr_fast); + if (err) + return err; + + if (!is_recoverable_dnode(sbi, node_blk_fast)) { + *is_detecting = false; + return 0; + } + + *blkaddr_fast = next_blkaddr_of_node(node_blk_fast); + } + + return 0; +} + +static int loop_node_chain_fix(struct f2fs_sb_info *sbi, + block_t blkaddr_fast, struct f2fs_node *node_blk_fast, + block_t blkaddr, struct f2fs_node *node_blk) +{ + block_t blkaddr_entry, blkaddr_tmp; + int err; + + /* find the entry point of the looped node chain */ + while (blkaddr_fast != blkaddr) { + err = dev_read_block(node_blk_fast, blkaddr_fast); + if (err) + return err; + blkaddr_fast = next_blkaddr_of_node(node_blk_fast); + + err = dev_read_block(node_blk, blkaddr); + if (err) + return err; + blkaddr = next_blkaddr_of_node(node_blk); + } + blkaddr_entry = blkaddr; + + /* find the last node of the chain */ + do { + blkaddr_tmp = blkaddr; + err = dev_read_block(node_blk, blkaddr); + if (err) + return err; + blkaddr = next_blkaddr_of_node(node_blk); + } while (blkaddr != blkaddr_entry); + + /* fix the blkaddr of last node with NULL_ADDR. */ + node_blk->footer.next_blkaddr = NULL_ADDR; + if (node_blk->footer.nid == node_blk->footer.ino) if (le32_to_cpu(node_blk->footer.nid) == le32_to_cpu(node_blk->footer.ino)) Otherwise, it looks good to me. Thanks, + err = write_inode(node_blk, blkaddr_tmp); + else + err = dev_write_block(node_blk, blkaddr_tmp); + if (!err) + FIX_MSG("Fix looped node chain on blkaddr %u\n", + blkaddr_tmp); + return err; +} + static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head) { struct curseg_info *curseg; - struct f2fs_node *node_blk; - block_t blkaddr; - unsigned int loop_cnt = 0; - unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg - - sbi->total_valid_block_count; + struct f2fs_node *node_blk, *node_blk_fast; + block_t blkaddr, blkaddr_fast; + bool is_detecting = true; int err = 0; + node_blk = calloc(F2FS_BLKSIZE, 1); + node_blk_fast = calloc(F2FS_BLKSIZE, 1); + ASSERT(node_blk && node_blk_fast); + +retry: /* get node pages in the current segment */ curseg = CURSEG_I(sbi, CURSEG_WARM_NODE); blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); - - node_blk = calloc(F2FS_BLKSIZE, 1); -
[f2fs-dev] [PATCH v2] f2fs: support background_gc=adjust mount option
As JuHyung reported in [1]: "In most consumer-grade blackbox SSDs, device-side GCs are handled automatically for various workloads. f2fs, however, leaves that responsibility to the userspace with conservative tuning on the kernel-side by default. Android handles this by init.rc tunings and a separate code running in vold to trigger gc_urgent. For regular Linux desktop distros, f2fs just runs on the default configuration set on the kernel and unless it’s running 24/7 with plentiful idle time, it quickly runs out of free segments and starts triggering foreground GC. This is giving people the wrong impression that f2fs slows down far drastically than other file-systems when that’s quite the contrary (i.e., less fragmentation overtime)." This patch supports background_gc=adjust mount option. If background_gc=adjust, gc will adjust its policy depends on conditions: speed up if there no free segments, and slow down if there is no free space. The main logic is as below: 1. performance mode - condition: if free_segments is less than 10 * ovp_segments and reclaimable_block is more than 20 * unused_user_block - action: a) reduce sleep time of GC thread based on free user block ratio, that is to say, the more reclaimable blocks, the less time thread will sleep b) disable IO aware 2. lifetime mode: - condition: if free space is less than 90% - action: a) reset min_sleep_time to default 3 ms b) reduce cost weight of age when cacluating cost of dirty segment, so that GC may select victim which contains less blocks c) disable IO aware 3. balance mode - condition: it is default mode - action: a) reduce min_sleep_time from 3 ms to 1 ms b) enable IO aware [1] https://lore.kernel.org/linux-f2fs-devel/CAD14+f3z=kS9E+NTKH7t1J2xL1PpLOVMNx=CabD_t2K6U=t...@mail.gmail.com Original patch was developed by Weichao Guo, I refactor it a bit and rebase the code. Signed-off-by: Weichao Guo Signed-off-by: Chao Yu --- v2: - fix typo - disable IO aware for perf/lifetime mode - check bggc mode in get_max_age() Documentation/filesystems/f2fs.rst | 7 ++- fs/f2fs/f2fs.h | 4 ++ fs/f2fs/gc.c | 94 +- fs/f2fs/gc.h | 23 fs/f2fs/super.c| 4 ++ 5 files changed, 128 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 9359978a5af2..764301f7391e 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -112,8 +112,11 @@ background_gc=%sTurn on/off cleaning operations, namely garbage collection and if background_gc=off, garbage collection will be turned off. If background_gc=sync, it will turn on synchronous garbage collection running in background. -Default value for this option is on. So garbage -collection is on by default. +If background_gc=adjust, gc will adjust its policy depends +on conditions: speed up if there no free segments, and slow +down if there is no free space. +Default value for this option is on. So garbage collection +is on by default. gc_mergeWhen background_gc is on, this option can be enabled to let background GC thread to handle foreground GC requests, it can eliminate the sluggish issue caused by slow foreground diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8d4eaf4d2246..e82af8a09d11 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1333,6 +1333,10 @@ enum { * background gc is on, migrating blocks * like foreground gc */ + BGGC_MODE_ADJUST, /* +* background gc is on, and tune its speed +* depends on conditions +*/ }; enum { diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 51d7e8d29bf1..35b95b3d57ef 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -28,6 +28,67 @@ static struct kmem_cache *victim_entry_slab; static unsigned int count_bits(const unsigned long *addr, unsigned int offset, unsigned int len); +static inline int free_user_block_ratio(struct f2fs_sb_info *sbi) +{ + block_t unused_user_blocks = sbi->user_block_count - + written_block_count(sbi); + return unused_user_blocks == 0 ? 100 : + (100 * free_user_blocks(sbi) / unused_user_blocks); +} + +static bool has_few_free_segments(struct f2fs_sb_info *sbi) +{ + unsigned int free_segs = free_segments(sbi); + unsigned int ovp_segs = overprovision_segments(sbi); + +
Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
On 2023/4/21 1:26, Juhyung Park wrote: Hi Chao, On Fri, Apr 21, 2023 at 1:19 AM Chao Yu wrote: Hi JuHyung, Sorry for delay reply. On 2023/4/11 1:03, Juhyung Park wrote: Hi Chao, On Tue, Apr 11, 2023 at 12:44 AM Chao Yu wrote: Hi Juhyung, On 2023/4/4 15:36, Juhyung Park wrote: Hi everyone, I want to start a discussion on using f2fs for regular desktops/workstations. There are growing number of interests in using f2fs as the general root file-system: 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193 I've been personally running f2fs on all of my x86 Linux boxes since 2015, and I have several concerns that I think we need to collectively address for regular non-Android normies to use f2fs: A. Bootloader and installer support B. Host-side GC C. Extended node bitmap I'll go through each one. === A. Bootloader and installer support === It seems that both GRUB and systemd-boot supports f2fs without the need for a separate ext4-formatted /boot partition. Some distros are seemingly disabling f2fs module for GRUB though for security reasons: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664 It's ultimately up to the distro folks to enable this, and still in the worst-case scenario, they can specify a separate /boot partition and format it to ext4 upon installation. The installer itself to show f2fs and call mkfs.f2fs is being worked on currently on Ubuntu. See the 2023 links above. Nothing f2fs mainline developers should do here, imo. === B. Host-side GC === f2fs relieves most of the device-side GC but introduces a new host-side GC. This is extremely confusing for people who have no background in SSDs and flash storage to understand, let alone discard/trim/erase complications. In most consumer-grade blackbox SSDs, device-side GCs are handled automatically for various workloads. f2fs, however, leaves that responsibility to the userspace with conservative tuning on the We've proposed a f2fs feature named "space awared garbage collection" and shipped it in huawei/honor's devices, but forgot to try upstreaming it. :-P In this feature, we introduced three mode: - performance mode: something like write-gc in ftl, it can trigger background gc more frequently and tune its speed according to free segs and reclaimable blks ratio. - lifetime mode: slow down background gc to avoid high waf if there is less free space. - balance mode: behave as usual. I guess this may be helpful for Linux desktop distros since there is no such storage service trigger gc_urgent. That indeed sounds interesting. If you need me to test something out, feel free to ask. Thanks a lot for that. :) I'm trying to figure out a patch... Juhyung, Are you interesting to try this patch in distros? https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test=4736e55bc967e91cf8a275b678739b006c2617f0 There are some tunable parameters, I can export them via sysfs entry, let me update later. Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: support background_gc=adjust mount option
As JuHyung reported in [1]: "In most consumer-grade blackbox SSDs, device-side GCs are handled automatically for various workloads. f2fs, however, leaves that responsibility to the userspace with conservative tuning on the kernel-side by default. Android handles this by init.rc tunings and a separate code running in vold to trigger gc_urgent. For regular Linux desktop distros, f2fs just runs on the default configuration set on the kernel and unless it’s running 24/7 with plentiful idle time, it quickly runs out of free segments and starts triggering foreground GC. This is giving people the wrong impression that f2fs slows down far drastically than other file-systems when that’s quite the contrary (i.e., less fragmentation overtime)." This patch supports background_gc=adjust mount option. If background_gc=adjust, gc will adjust its policy depends on conditions: speed up if there no free segments, and slow down if there is no free space. The main logic is as below: 1. performance mode - condition: if free_segments is less than 10 * ovp_segments and reclaimable_block is more than 20 * unused_user_block - action: reduce sleep time of GC thread based on free user block ratio, that is to say, the more reclaimable blocks, the less time thread will sleep 2. lifetime mode: - condition: if free space is less than 90% - action: a) reset min_sleep_time to default 3 ms b) reduce cost weight of age when cacluating cost of dirty segment, so that GC may select victim which contains less blocks 3. balance mode - condition: it is default mode - action: reduce min_sleep_time from 3 ms to 1 ms [1] https://lore.kernel.org/linux-f2fs-devel/CAD14+f3z=kS9E+NTKH7t1J2xL1PpLOVMNx=CabD_t2K6U=t...@mail.gmail.com Original patch was developed by Weichao Guo, I refactor it a bit and rebase the code. Signed-off-by: Weichao Guo Signed-off-by: Chao Yu --- Documentation/filesystems/f2fs.rst | 7 ++- fs/f2fs/f2fs.h | 4 ++ fs/f2fs/gc.c | 92 +- fs/f2fs/gc.h | 23 fs/f2fs/super.c| 4 ++ 5 files changed, 126 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 9359978a5af2..764301f7391e 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -112,8 +112,11 @@ background_gc=%sTurn on/off cleaning operations, namely garbage collection and if background_gc=off, garbage collection will be turned off. If background_gc=sync, it will turn on synchronous garbage collection running in background. -Default value for this option is on. So garbage -collection is on by default. +If background_gc=adjust, gc will adjust its policy depends +on conditions: speed up if there no free segments, and slow +down if there is no free space. +Default value for this option is on. So garbage collection +is on by default. gc_mergeWhen background_gc is on, this option can be enabled to let background GC thread to handle foreground GC requests, it can eliminate the sluggish issue caused by slow foreground diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8d4eaf4d2246..4c2f65d3c208 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1333,6 +1333,10 @@ enum { * background gc is on, migrating blocks * like foreground gc */ + BGGC_MODE_ADJUST, /* +* background gc is on, and tune its speed +* dependso n conditions +*/ }; enum { diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 51d7e8d29bf1..43f935c2502a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -28,6 +28,67 @@ static struct kmem_cache *victim_entry_slab; static unsigned int count_bits(const unsigned long *addr, unsigned int offset, unsigned int len); +static inline int free_user_block_ratio(struct f2fs_sb_info *sbi) +{ + block_t unused_user_blocks = sbi->user_block_count - + written_block_count(sbi); + return unused_user_blocks == 0 ? 100 : + (100 * free_user_blocks(sbi) / unused_user_blocks); +} + +static bool has_few_free_segments(struct f2fs_sb_info *sbi) +{ + unsigned int free_segs = free_segments(sbi); + unsigned int ovp_segs = overprovision_segments(sbi); + + return free_segs <= DEF_FEW_FREE_SEGMENT_MULTIPLE * ovp_segs; +} + +static bool has_few_free_space(struct f2fs_sb_info *sbi) +{ + block_t total_user_block =