Re: [f2fs-dev] [DISCUSSION] f2fs for desktop

2023-05-18 Thread Juhyung Park
Hi Chao,

Thanks for the patch. I'll try it out on both my laptop and workstation soon.

One question though: would it make sense to see if it works fine on
Android too? (With userspace's explicit GC trigger disabled.)
Maybe it could be an indication on whether it works properly or not?

Thanks,

On Thu, May 18, 2023 at 4:53 PM Chao Yu  wrote:
>
> On 2023/4/21 1:26, Juhyung Park wrote:
> > Hi Chao,
> >
> > On Fri, Apr 21, 2023 at 1:19 AM Chao Yu  wrote:
> >>
> >> Hi JuHyung,
> >>
> >> Sorry for delay reply.
> >>
> >> On 2023/4/11 1:03, Juhyung Park wrote:
> >>> Hi Chao,
> >>>
> >>> On Tue, Apr 11, 2023 at 12:44 AM Chao Yu  wrote:
> 
>  Hi Juhyung,
> 
>  On 2023/4/4 15:36, Juhyung Park wrote:
> > Hi everyone,
> >
> > I want to start a discussion on using f2fs for regular 
> > desktops/workstations.
> >
> > There are growing number of interests in using f2fs as the general
> > root file-system:
> > 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> > 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> > 2023: 
> > https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> > 2023: 
> > https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
> >
> > I've been personally running f2fs on all of my x86 Linux boxes since
> > 2015, and I have several concerns that I think we need to collectively
> > address for regular non-Android normies to use f2fs:
> >
> > A. Bootloader and installer support
> > B. Host-side GC
> > C. Extended node bitmap
> >
> > I'll go through each one.
> >
> > === A. Bootloader and installer support ===
> >
> > It seems that both GRUB and systemd-boot supports f2fs without the
> > need for a separate ext4-formatted /boot partition.
> > Some distros are seemingly disabling f2fs module for GRUB though for
> > security reasons:
> > https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> >
> > It's ultimately up to the distro folks to enable this, and still in
> > the worst-case scenario, they can specify a separate /boot partition
> > and format it to ext4 upon installation.
> >
> > The installer itself to show f2fs and call mkfs.f2fs is being worked
> > on currently on Ubuntu. See the 2023 links above.
> >
> > Nothing f2fs mainline developers should do here, imo.
> >
> > === B. Host-side GC ===
> >
> > f2fs relieves most of the device-side GC but introduces a new
> > host-side GC. This is extremely confusing for people who have no
> > background in SSDs and flash storage to understand, let alone
> > discard/trim/erase complications.
> >
> > In most consumer-grade blackbox SSDs, device-side GCs are handled
> > automatically for various workloads. f2fs, however, leaves that
> > responsibility to the userspace with conservative tuning on the
> 
>  We've proposed a f2fs feature named "space awared garbage collection"
>  and shipped it in huawei/honor's devices, but forgot to try upstreaming
>  it. :-P
> 
>  In this feature, we introduced three mode:
>  - performance mode: something like write-gc in ftl, it can trigger
>  background gc more frequently and tune its speed according to free
>  segs and reclaimable blks ratio.
>  - lifetime mode: slow down background gc to avoid high waf if there
>  is less free space.
>  - balance mode: behave as usual.
> 
>  I guess this may be helpful for Linux desktop distros since there is
>  no such storage service trigger gc_urgent.
> 
> >>>
> >>> That indeed sounds interesting.
> >>>
> >>> If you need me to test something out, feel free to ask.
> >>
> >> Thanks a lot for that. :)
> >>
> >> I'm trying to figure out a patch...
>
> Juhyung,
>
> Are you interesting to try this patch in distros?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test=4736e55bc967e91cf8a275b678739b006c2617f0
>
> There are some tunable parameters, I can export them via sysfs entry,
> let me update later.
>
> Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/1] f2fs: pass I_NEW flag to trace event

2023-05-18 Thread Wu Bo

On 2023/5/18 08:32, Jaegeuk Kim wrote:

On 05/17, Wu Bo wrote:

On 2023/5/17 16:36, Chao Yu wrote:

On 2023/5/17 11:59, Wu Bo wrote:

On 2023/5/17 10:44, Chao Yu wrote:

On 2023/5/16 20:07, Wu Bo wrote:

Modify the order between 'trace_f2fs_iget' &
'unlock_new_inode', so the
I_NEW can pass to the trace event when the inode initialised.

Why is it needed? And trace_f2fs_iget() won't print inode->i_state?

When connect a trace_probe to f2fs_iget, it will be able to
determine whether
the inode is new initialised in order to do different process.

I didn't get it, you want to hook __tracepoint_f2fs_iget() w/ your own
callback?

Yes,  to use 'tracepoint_probe_register ' to register a probe at
trace_f2fs_iget

Why?


Sorry, I don't understand what is your real question.

In my understanding, a trace_event is also a  non-volatile point in 
kernel for probing.


And for my case, I want to develop a tool by trace_probe to collect some 
information.


Thanks




Thanks,


Thanks,


Signed-off-by: Wu Bo 
---
    fs/f2fs/inode.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index cf4327ad106c..caf959289fe7 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -577,8 +577,8 @@ struct inode *f2fs_iget(struct super_block *sb,
unsigned long ino)
    file_dont_truncate(inode);
    }
    -    unlock_new_inode(inode);
    trace_f2fs_iget(inode);
+    unlock_new_inode(inode);
    return inode;
  bad_inode:


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel




___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v5] fsck.f2fs: Detect and fix looped node chain efficiently

2023-05-18 Thread Chao Yu

On 2023/5/18 20:26, Chunhai Guo wrote:

find_fsync_inode() detect the looped node chain by comparing the loop
counter with free blocks. While it may take tens of seconds to quit when
the free blocks are large enough. We can use Floyd's cycle detection
algorithm to make the detection more efficient, and fix the issue by
filling a NULL address in the last node of the chain.

Below is the log we encounter on a 256GB UFS storage and it takes about
25 seconds to detect looped node chain. After changing the algorithm, it
takes about 20ms to finish the same job.

 [   10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430
 [   10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to
update superblock
 [   10.822953] fsck.f2fs: Info: superblock features = 1499 :
encrypt verity extra_attr project_quota quota_ino casefold
 [   10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt =

 [   10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444
MB)
 [   35.852827] fsck.f2fs:  detect looped node chain,
blkaddr:1114802, next:1114803
 [   35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data
failed
 [   35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255)

Signed-off-by: Chunhai Guo 


Reviewed-by: Chao Yu 

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v5] fsck.f2fs: Detect and fix looped node chain efficiently

2023-05-18 Thread Chunhai Guo via Linux-f2fs-devel
find_fsync_inode() detect the looped node chain by comparing the loop
counter with free blocks. While it may take tens of seconds to quit when
the free blocks are large enough. We can use Floyd's cycle detection
algorithm to make the detection more efficient, and fix the issue by
filling a NULL address in the last node of the chain.

Below is the log we encounter on a 256GB UFS storage and it takes about
25 seconds to detect looped node chain. After changing the algorithm, it
takes about 20ms to finish the same job.

[   10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430
[   10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to
update superblock
[   10.822953] fsck.f2fs: Info: superblock features = 1499 :
encrypt verity extra_attr project_quota quota_ino casefold
[   10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt =

[   10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444
MB)
[   35.852827] fsck.f2fs:   detect looped node chain,
blkaddr:1114802, next:1114803
[   35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data
failed
[   35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255)

Signed-off-by: Chunhai Guo 
---
v4 -> v5 : Use IS_INODE() to make the code more clear.
v3 -> v4 : Set c.bug_on with ASSERT_MSG() when issue is detected and fix
it only if c.fix_on is 1.
v2 -> v3 : Write inode with write_inode() to avoid chksum being broken.
v1 -> v2 : Fix looped node chain directly after it is detected.
---
 fsck/mount.c | 128 +++
 1 file changed, 110 insertions(+), 18 deletions(-)

diff --git a/fsck/mount.c b/fsck/mount.c
index df0314d57caf..c98b7ba00b21 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -3394,22 +3394,91 @@ static void destroy_fsync_dnodes(struct list_head *head)
del_fsync_inode(entry);
 }
 
+static int find_node_blk_fast(struct f2fs_sb_info *sbi, block_t *blkaddr_fast,
+   struct f2fs_node *node_blk_fast, bool *is_detecting)
+{
+   int i, err;
+
+   for (i = 0; i < 2; i++) {
+   if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) {
+   *is_detecting = false;
+   return 0;
+   }
+
+   err = dev_read_block(node_blk_fast, *blkaddr_fast);
+   if (err)
+   return err;
+
+   if (!is_recoverable_dnode(sbi, node_blk_fast)) {
+   *is_detecting = false;
+   return 0;
+   }
+
+   *blkaddr_fast = next_blkaddr_of_node(node_blk_fast);
+   }
+
+   return 0;
+}
+
+static int loop_node_chain_fix(struct f2fs_sb_info *sbi,
+   block_t blkaddr_fast, struct f2fs_node *node_blk_fast,
+   block_t blkaddr, struct f2fs_node *node_blk)
+{
+   block_t blkaddr_entry, blkaddr_tmp;
+   int err;
+
+   /* find the entry point of the looped node chain */
+   while (blkaddr_fast != blkaddr) {
+   err = dev_read_block(node_blk_fast, blkaddr_fast);
+   if (err)
+   return err;
+   blkaddr_fast = next_blkaddr_of_node(node_blk_fast);
+
+   err = dev_read_block(node_blk, blkaddr);
+   if (err)
+   return err;
+   blkaddr = next_blkaddr_of_node(node_blk);
+   }
+   blkaddr_entry = blkaddr;
+
+   /* find the last node of the chain */
+   do {
+   blkaddr_tmp = blkaddr;
+   err = dev_read_block(node_blk, blkaddr);
+   if (err)
+   return err;
+   blkaddr = next_blkaddr_of_node(node_blk);
+   } while (blkaddr != blkaddr_entry);
+
+   /* fix the blkaddr of last node with NULL_ADDR. */
+   node_blk->footer.next_blkaddr = NULL_ADDR;
+   if (IS_INODE(node_blk))
+   err = write_inode(node_blk, blkaddr_tmp);
+   else
+   err = dev_write_block(node_blk, blkaddr_tmp);
+   if (!err)
+   FIX_MSG("Fix looped node chain on blkaddr %u\n",
+   blkaddr_tmp);
+   return err;
+}
+
 static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head)
 {
struct curseg_info *curseg;
-   struct f2fs_node *node_blk;
-   block_t blkaddr;
-   unsigned int loop_cnt = 0;
-   unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg -
-   sbi->total_valid_block_count;
+   struct f2fs_node *node_blk, *node_blk_fast;
+   block_t blkaddr, blkaddr_fast;
+   bool is_detecting = true;
int err = 0;
 
+   node_blk = calloc(F2FS_BLKSIZE, 1);
+   node_blk_fast = calloc(F2FS_BLKSIZE, 1);
+   ASSERT(node_blk && node_blk_fast);
+
+retry:
/* get node pages in the current segment */
curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
   

[f2fs-dev] [PATCH] f2fs: fix to use le32_to_cpu() in RAW_IS_INODE()

2023-05-18 Thread Chao Yu
__le32 type variable should be converted w/ le32_to_cpu() before access.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 7f6c51a6b930..a4bff3b5b887 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2840,7 +2840,11 @@ static inline void f2fs_radix_tree_insert(struct 
radix_tree_root *root,
cond_resched();
 }
 
-#define RAW_IS_INODE(p)((p)->footer.nid == (p)->footer.ino)
+static inline bool RAW_IS_INODE(struct f2fs_node *node)
+{
+   return le32_to_cpu(node->footer.ino) ==
+   le32_to_cpu(node->footer.nid);
+}
 
 static inline bool IS_INODE(struct page *page)
 {
-- 
2.40.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] fsck.f2fs: fix to use le32_to_cpu() in IS_INODE()

2023-05-18 Thread Chao Yu
And use IS_INODE() to clean up codes.

Signed-off-by: Chao Yu 
---
 fsck/fsck.c  | 7 +++
 fsck/mount.c | 4 ++--
 fsck/node.h  | 3 ++-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index d03f1da..ac4cd98 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -247,7 +247,7 @@ static int is_valid_summary(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
goto out;
 
/* check its block address */
-   if (node_blk->footer.nid == node_blk->footer.ino) {
+   if (IS_INODE(node_blk)) {
int ofs = get_extra_isize(node_blk);
 
if (ofs + ofs_in_node >= DEF_ADDRS_PER_INODE)
@@ -447,8 +447,7 @@ static int sanity_check_nid(struct f2fs_sb_info *sbi, u32 
nid,
nid, ni->ino, 
le32_to_cpu(node_blk->footer.ino));
return -EINVAL;
}
-   if (ntype != TYPE_INODE &&
-   node_blk->footer.nid == node_blk->footer.ino) {
+   if (ntype != TYPE_INODE && IS_INODE(node_blk)) {
ASSERT_MSG("nid[0x%x] footer.nid[0x%x] footer.ino[0x%x]",
nid, le32_to_cpu(node_blk->footer.nid),
le32_to_cpu(node_blk->footer.ino));
@@ -3081,7 +3080,7 @@ static int fsck_reconnect_file(struct f2fs_sb_info *sbi)
ASSERT(err >= 0);
 
/* reconnection will restore these nodes if needed */
-   if (node->footer.ino != node->footer.nid) {
+   if (!IS_INODE(node)) {
DBG(1, "Not support non-inode node [0x%x]\n",
nid);
continue;
diff --git a/fsck/mount.c b/fsck/mount.c
index 4c74888..70619c9 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -2420,7 +2420,7 @@ void update_data_blkaddr(struct f2fs_sb_info *sbi, nid_t 
nid,
ASSERT(ret >= 0);
 
/* check its block address */
-   if (node_blk->footer.nid == node_blk->footer.ino) {
+   if (IS_INODE(node_blk)) {
int ofs = get_extra_isize(node_blk);
 
oldaddr = le32_to_cpu(node_blk->i.i_addr[ofs + ofs_in_node]);
@@ -2435,7 +2435,7 @@ void update_data_blkaddr(struct f2fs_sb_info *sbi, nid_t 
nid,
}
 
/* check extent cache entry */
-   if (node_blk->footer.nid != node_blk->footer.ino) {
+   if (!IS_INODE(node_blk)) {
get_node_info(sbi, le32_to_cpu(node_blk->footer.ino), );
 
/* read inode block */
diff --git a/fsck/node.h b/fsck/node.h
index 99139b1..2ba7b8c 100644
--- a/fsck/node.h
+++ b/fsck/node.h
@@ -20,7 +20,8 @@
 
 static inline int IS_INODE(struct f2fs_node *node)
 {
-   return ((node)->footer.nid == (node)->footer.ino);
+   return le32_to_cpu(node->footer.ino) ==
+   le32_to_cpu(node->footer.nid);
 }
 
 static inline unsigned int ADDRS_PER_PAGE(struct f2fs_sb_info *sbi,
-- 
2.40.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v4] fsck.f2fs: Detect and fix looped node chain efficiently

2023-05-18 Thread Chao Yu

On 2023/5/18 17:10, Chao Yu wrote:

On 2023/5/18 12:11, Chunhai Guo wrote:

find_fsync_inode() detect the looped node chain by comparing the loop
counter with free blocks. While it may take tens of seconds to quit when
the free blocks are large enough. We can use Floyd's cycle detection
algorithm to make the detection more efficient, and fix the issue by
filling a NULL address in the last node of the chain.

Below is the log we encounter on a 256GB UFS storage and it takes about
25 seconds to detect looped node chain. After changing the algorithm, it
takes about 20ms to finish the same job.

 [   10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430
 [   10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to
update superblock
 [   10.822953] fsck.f2fs: Info: superblock features = 1499 :
encrypt verity extra_attr project_quota quota_ino casefold
 [   10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt =

 [   10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444
MB)
 [   35.852827] fsck.f2fs:    detect looped node chain,
blkaddr:1114802, next:1114803
 [   35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data
failed
 [   35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255)

Signed-off-by: Chunhai Guo 
---
  fsck/mount.c | 128 +++
  1 file changed, 110 insertions(+), 18 deletions(-)

diff --git a/fsck/mount.c b/fsck/mount.c
index df0314d57caf..755b659f0c27 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -3394,22 +3394,91 @@ static void destroy_fsync_dnodes(struct list_head *head)
  del_fsync_inode(entry);
  }
+static int find_node_blk_fast(struct f2fs_sb_info *sbi, block_t *blkaddr_fast,
+    struct f2fs_node *node_blk_fast, bool *is_detecting)
+{
+    int i, err;
+
+    for (i = 0; i < 2; i++) {
+    if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) {
+    *is_detecting = false;
+    return 0;
+    }
+
+    err = dev_read_block(node_blk_fast, *blkaddr_fast);
+    if (err)
+    return err;
+
+    if (!is_recoverable_dnode(sbi, node_blk_fast)) {
+    *is_detecting = false;
+    return 0;
+    }
+
+    *blkaddr_fast = next_blkaddr_of_node(node_blk_fast);
+    }
+
+    return 0;
+}
+
+static int loop_node_chain_fix(struct f2fs_sb_info *sbi,
+    block_t blkaddr_fast, struct f2fs_node *node_blk_fast,
+    block_t blkaddr, struct f2fs_node *node_blk)
+{
+    block_t blkaddr_entry, blkaddr_tmp;
+    int err;
+
+    /* find the entry point of the looped node chain */
+    while (blkaddr_fast != blkaddr) {
+    err = dev_read_block(node_blk_fast, blkaddr_fast);
+    if (err)
+    return err;
+    blkaddr_fast = next_blkaddr_of_node(node_blk_fast);
+
+    err = dev_read_block(node_blk, blkaddr);
+    if (err)
+    return err;
+    blkaddr = next_blkaddr_of_node(node_blk);
+    }
+    blkaddr_entry = blkaddr;
+
+    /* find the last node of the chain */
+    do {
+    blkaddr_tmp = blkaddr;
+    err = dev_read_block(node_blk, blkaddr);
+    if (err)
+    return err;
+    blkaddr = next_blkaddr_of_node(node_blk);
+    } while (blkaddr != blkaddr_entry);
+
+    /* fix the blkaddr of last node with NULL_ADDR. */
+    node_blk->footer.next_blkaddr = NULL_ADDR;
+    if (node_blk->footer.nid == node_blk->footer.ino)


if (le32_to_cpu(node_blk->footer.nid) == le32_to_cpu(node_blk->footer.ino))


Oh, we can use IS_INODE() here?

Thanks,



Otherwise, it looks good to me.

Thanks,


+    err = write_inode(node_blk, blkaddr_tmp);
+    else
+    err = dev_write_block(node_blk, blkaddr_tmp);
+    if (!err)
+    FIX_MSG("Fix looped node chain on blkaddr %u\n",
+    blkaddr_tmp);
+    return err;
+}
+
  static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head)
  {
  struct curseg_info *curseg;
-    struct f2fs_node *node_blk;
-    block_t blkaddr;
-    unsigned int loop_cnt = 0;
-    unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg -
-    sbi->total_valid_block_count;
+    struct f2fs_node *node_blk, *node_blk_fast;
+    block_t blkaddr, blkaddr_fast;
+    bool is_detecting = true;
  int err = 0;
+    node_blk = calloc(F2FS_BLKSIZE, 1);
+    node_blk_fast = calloc(F2FS_BLKSIZE, 1);
+    ASSERT(node_blk && node_blk_fast);
+
+retry:
  /* get node pages in the current segment */
  curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
  blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
-
-    node_blk = calloc(F2FS_BLKSIZE, 1);
-    ASSERT(node_blk);
+    blkaddr_fast = blkaddr;
  while (1) {
  struct fsync_inode_entry *entry;
@@ -3440,19 +3509,42 @@ static int find_fsync_inode(struct f2fs_sb_info *sbi, 
struct list_head *head)
  if (IS_INODE(node_blk) && is_dent_dnode(node_blk))
  entry->last_dentry = blkaddr;

Re: [f2fs-dev] [PATCH v4] fsck.f2fs: Detect and fix looped node chain efficiently

2023-05-18 Thread Chao Yu

On 2023/5/18 12:11, Chunhai Guo wrote:

find_fsync_inode() detect the looped node chain by comparing the loop
counter with free blocks. While it may take tens of seconds to quit when
the free blocks are large enough. We can use Floyd's cycle detection
algorithm to make the detection more efficient, and fix the issue by
filling a NULL address in the last node of the chain.

Below is the log we encounter on a 256GB UFS storage and it takes about
25 seconds to detect looped node chain. After changing the algorithm, it
takes about 20ms to finish the same job.

 [   10.822904] fsck.f2fs: Info: version timestamp cur: 17, prev: 430
 [   10.822949] fsck.f2fs: [update_superblock: 762] Info: Done to
update superblock
 [   10.822953] fsck.f2fs: Info: superblock features = 1499 :
encrypt verity extra_attr project_quota quota_ino casefold
 [   10.822956] fsck.f2fs: Info: superblock encrypt level = 0, salt =

 [   10.822960] fsck.f2fs: Info: total FS sectors = 59249811 (231444
MB)
 [   35.852827] fsck.f2fs:  detect looped node chain,
blkaddr:1114802, next:1114803
 [   35.852842] fsck.f2fs: [f2fs_do_mount:3846] record_fsync_data
failed
 [   35.856106] fsck.f2fs: fsck.f2fs terminated by exit(255)

Signed-off-by: Chunhai Guo 
---
  fsck/mount.c | 128 +++
  1 file changed, 110 insertions(+), 18 deletions(-)

diff --git a/fsck/mount.c b/fsck/mount.c
index df0314d57caf..755b659f0c27 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -3394,22 +3394,91 @@ static void destroy_fsync_dnodes(struct list_head *head)
del_fsync_inode(entry);
  }
  
+static int find_node_blk_fast(struct f2fs_sb_info *sbi, block_t *blkaddr_fast,

+   struct f2fs_node *node_blk_fast, bool *is_detecting)
+{
+   int i, err;
+
+   for (i = 0; i < 2; i++) {
+   if (!f2fs_is_valid_blkaddr(sbi, *blkaddr_fast, META_POR)) {
+   *is_detecting = false;
+   return 0;
+   }
+
+   err = dev_read_block(node_blk_fast, *blkaddr_fast);
+   if (err)
+   return err;
+
+   if (!is_recoverable_dnode(sbi, node_blk_fast)) {
+   *is_detecting = false;
+   return 0;
+   }
+
+   *blkaddr_fast = next_blkaddr_of_node(node_blk_fast);
+   }
+
+   return 0;
+}
+
+static int loop_node_chain_fix(struct f2fs_sb_info *sbi,
+   block_t blkaddr_fast, struct f2fs_node *node_blk_fast,
+   block_t blkaddr, struct f2fs_node *node_blk)
+{
+   block_t blkaddr_entry, blkaddr_tmp;
+   int err;
+
+   /* find the entry point of the looped node chain */
+   while (blkaddr_fast != blkaddr) {
+   err = dev_read_block(node_blk_fast, blkaddr_fast);
+   if (err)
+   return err;
+   blkaddr_fast = next_blkaddr_of_node(node_blk_fast);
+
+   err = dev_read_block(node_blk, blkaddr);
+   if (err)
+   return err;
+   blkaddr = next_blkaddr_of_node(node_blk);
+   }
+   blkaddr_entry = blkaddr;
+
+   /* find the last node of the chain */
+   do {
+   blkaddr_tmp = blkaddr;
+   err = dev_read_block(node_blk, blkaddr);
+   if (err)
+   return err;
+   blkaddr = next_blkaddr_of_node(node_blk);
+   } while (blkaddr != blkaddr_entry);
+
+   /* fix the blkaddr of last node with NULL_ADDR. */
+   node_blk->footer.next_blkaddr = NULL_ADDR;
+   if (node_blk->footer.nid == node_blk->footer.ino)


if (le32_to_cpu(node_blk->footer.nid) == le32_to_cpu(node_blk->footer.ino))

Otherwise, it looks good to me.

Thanks,


+   err = write_inode(node_blk, blkaddr_tmp);
+   else
+   err = dev_write_block(node_blk, blkaddr_tmp);
+   if (!err)
+   FIX_MSG("Fix looped node chain on blkaddr %u\n",
+   blkaddr_tmp);
+   return err;
+}
+
  static int find_fsync_inode(struct f2fs_sb_info *sbi, struct list_head *head)
  {
struct curseg_info *curseg;
-   struct f2fs_node *node_blk;
-   block_t blkaddr;
-   unsigned int loop_cnt = 0;
-   unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg -
-   sbi->total_valid_block_count;
+   struct f2fs_node *node_blk, *node_blk_fast;
+   block_t blkaddr, blkaddr_fast;
+   bool is_detecting = true;
int err = 0;
  
+	node_blk = calloc(F2FS_BLKSIZE, 1);

+   node_blk_fast = calloc(F2FS_BLKSIZE, 1);
+   ASSERT(node_blk && node_blk_fast);
+
+retry:
/* get node pages in the current segment */
curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
-
-   node_blk = calloc(F2FS_BLKSIZE, 1);
-   

[f2fs-dev] [PATCH v2] f2fs: support background_gc=adjust mount option

2023-05-18 Thread Chao Yu
As JuHyung reported in [1]:

"In most consumer-grade blackbox SSDs, device-side GCs are handled
automatically for various workloads. f2fs, however, leaves that
responsibility to the userspace with conservative tuning on the
kernel-side by default. Android handles this by init.rc tunings and a
separate code running in vold to trigger gc_urgent.

For regular Linux desktop distros, f2fs just runs on the default
configuration set on the kernel and unless it’s running 24/7 with
plentiful idle time, it quickly runs out of free segments and starts
triggering foreground GC. This is giving people the wrong impression
that f2fs slows down far drastically than other file-systems when
that’s quite the contrary (i.e., less fragmentation overtime)."

This patch supports background_gc=adjust mount option.

If background_gc=adjust, gc will adjust its policy depends
on conditions: speed up if there no free segments, and slow
down if there is no free space.

The main logic is as below:

1. performance mode
- condition: if free_segments is less than 10 * ovp_segments and
reclaimable_block is more than 20 * unused_user_block
- action:
 a) reduce sleep time of GC thread based on free user block
ratio, that is to say, the more reclaimable blocks, the less time
thread will sleep
 b) disable IO aware

2. lifetime mode:
- condition: if free space is less than 90%
- action:
 a) reset min_sleep_time to default 3 ms
 b) reduce cost weight of age when cacluating cost of dirty
 segment, so that GC may select victim which contains less blocks
 c) disable IO aware

3. balance mode
- condition: it is default mode
- action:
 a) reduce min_sleep_time from 3 ms to 1 ms
 b) enable IO aware

[1] 
https://lore.kernel.org/linux-f2fs-devel/CAD14+f3z=kS9E+NTKH7t1J2xL1PpLOVMNx=CabD_t2K6U=t...@mail.gmail.com

Original patch was developed by Weichao Guo, I refactor it a bit and
rebase the code.

Signed-off-by: Weichao Guo 
Signed-off-by: Chao Yu 
---
v2:
- fix typo
- disable IO aware for perf/lifetime mode
- check bggc mode in get_max_age()
 Documentation/filesystems/f2fs.rst |  7 ++-
 fs/f2fs/f2fs.h |  4 ++
 fs/f2fs/gc.c   | 94 +-
 fs/f2fs/gc.h   | 23 
 fs/f2fs/super.c|  4 ++
 5 files changed, 128 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/f2fs.rst 
b/Documentation/filesystems/f2fs.rst
index 9359978a5af2..764301f7391e 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -112,8 +112,11 @@ background_gc=%sTurn on/off cleaning operations, 
namely garbage
 collection and if background_gc=off, garbage collection
 will be turned off. If background_gc=sync, it will turn
 on synchronous garbage collection running in 
background.
-Default value for this option is on. So garbage
-collection is on by default.
+If background_gc=adjust, gc will adjust its policy 
depends
+on conditions: speed up if there no free segments, and 
slow
+down if there is no free space.
+Default value for this option is on. So garbage 
collection
+is on by default.
 gc_mergeWhen background_gc is on, this option can be enabled to
 let background GC thread to handle foreground GC 
requests,
 it can eliminate the sluggish issue caused by slow 
foreground
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8d4eaf4d2246..e82af8a09d11 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1333,6 +1333,10 @@ enum {
 * background gc is on, migrating blocks
 * like foreground gc
 */
+   BGGC_MODE_ADJUST,   /*
+* background gc is on, and tune its speed
+* depends on conditions
+*/
 };
 
 enum {
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 51d7e8d29bf1..35b95b3d57ef 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -28,6 +28,67 @@ static struct kmem_cache *victim_entry_slab;
 static unsigned int count_bits(const unsigned long *addr,
unsigned int offset, unsigned int len);
 
+static inline int free_user_block_ratio(struct f2fs_sb_info *sbi)
+{
+   block_t unused_user_blocks = sbi->user_block_count -
+   written_block_count(sbi);
+   return unused_user_blocks == 0 ? 100 :
+   (100 * free_user_blocks(sbi) / unused_user_blocks);
+}
+
+static bool has_few_free_segments(struct f2fs_sb_info *sbi)
+{
+   unsigned int free_segs = free_segments(sbi);
+   unsigned int ovp_segs = overprovision_segments(sbi);
+
+   

Re: [f2fs-dev] [DISCUSSION] f2fs for desktop

2023-05-18 Thread Chao Yu

On 2023/4/21 1:26, Juhyung Park wrote:

Hi Chao,

On Fri, Apr 21, 2023 at 1:19 AM Chao Yu  wrote:


Hi JuHyung,

Sorry for delay reply.

On 2023/4/11 1:03, Juhyung Park wrote:

Hi Chao,

On Tue, Apr 11, 2023 at 12:44 AM Chao Yu  wrote:


Hi Juhyung,

On 2023/4/4 15:36, Juhyung Park wrote:

Hi everyone,

I want to start a discussion on using f2fs for regular desktops/workstations.

There are growing number of interests in using f2fs as the general
root file-system:
2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193

I've been personally running f2fs on all of my x86 Linux boxes since
2015, and I have several concerns that I think we need to collectively
address for regular non-Android normies to use f2fs:

A. Bootloader and installer support
B. Host-side GC
C. Extended node bitmap

I'll go through each one.

=== A. Bootloader and installer support ===

It seems that both GRUB and systemd-boot supports f2fs without the
need for a separate ext4-formatted /boot partition.
Some distros are seemingly disabling f2fs module for GRUB though for
security reasons:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664

It's ultimately up to the distro folks to enable this, and still in
the worst-case scenario, they can specify a separate /boot partition
and format it to ext4 upon installation.

The installer itself to show f2fs and call mkfs.f2fs is being worked
on currently on Ubuntu. See the 2023 links above.

Nothing f2fs mainline developers should do here, imo.

=== B. Host-side GC ===

f2fs relieves most of the device-side GC but introduces a new
host-side GC. This is extremely confusing for people who have no
background in SSDs and flash storage to understand, let alone
discard/trim/erase complications.

In most consumer-grade blackbox SSDs, device-side GCs are handled
automatically for various workloads. f2fs, however, leaves that
responsibility to the userspace with conservative tuning on the


We've proposed a f2fs feature named "space awared garbage collection"
and shipped it in huawei/honor's devices, but forgot to try upstreaming
it. :-P

In this feature, we introduced three mode:
- performance mode: something like write-gc in ftl, it can trigger
background gc more frequently and tune its speed according to free
segs and reclaimable blks ratio.
- lifetime mode: slow down background gc to avoid high waf if there
is less free space.
- balance mode: behave as usual.

I guess this may be helpful for Linux desktop distros since there is
no such storage service trigger gc_urgent.



That indeed sounds interesting.

If you need me to test something out, feel free to ask.


Thanks a lot for that. :)

I'm trying to figure out a patch...


Juhyung,

Are you interesting to try this patch in distros?

https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test=4736e55bc967e91cf8a275b678739b006c2617f0

There are some tunable parameters, I can export them via sysfs entry,
let me update later.

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: support background_gc=adjust mount option

2023-05-18 Thread Chao Yu
As JuHyung reported in [1]:

"In most consumer-grade blackbox SSDs, device-side GCs are handled
automatically for various workloads. f2fs, however, leaves that
responsibility to the userspace with conservative tuning on the
kernel-side by default. Android handles this by init.rc tunings and a
separate code running in vold to trigger gc_urgent.

For regular Linux desktop distros, f2fs just runs on the default
configuration set on the kernel and unless it’s running 24/7 with
plentiful idle time, it quickly runs out of free segments and starts
triggering foreground GC. This is giving people the wrong impression
that f2fs slows down far drastically than other file-systems when
that’s quite the contrary (i.e., less fragmentation overtime)."

This patch supports background_gc=adjust mount option.

If background_gc=adjust, gc will adjust its policy depends
on conditions: speed up if there no free segments, and slow
down if there is no free space.

The main logic is as below:

1. performance mode
- condition: if free_segments is less than 10 * ovp_segments and
reclaimable_block is more than 20 * unused_user_block
- action: reduce sleep time of GC thread based on free user block
ratio, that is to say, the more reclaimable blocks, the less time
thread will sleep

2. lifetime mode:
- condition: if free space is less than 90%
- action:
 a) reset min_sleep_time to default 3 ms
 b) reduce cost weight of age when cacluating cost of dirty
 segment, so that GC may select victim which contains less blocks

3. balance mode
- condition: it is default mode
- action: reduce min_sleep_time from 3 ms to 1 ms

[1] 
https://lore.kernel.org/linux-f2fs-devel/CAD14+f3z=kS9E+NTKH7t1J2xL1PpLOVMNx=CabD_t2K6U=t...@mail.gmail.com

Original patch was developed by Weichao Guo, I refactor it a bit and
rebase the code.

Signed-off-by: Weichao Guo 
Signed-off-by: Chao Yu 
---
 Documentation/filesystems/f2fs.rst |  7 ++-
 fs/f2fs/f2fs.h |  4 ++
 fs/f2fs/gc.c   | 92 +-
 fs/f2fs/gc.h   | 23 
 fs/f2fs/super.c|  4 ++
 5 files changed, 126 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/f2fs.rst 
b/Documentation/filesystems/f2fs.rst
index 9359978a5af2..764301f7391e 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -112,8 +112,11 @@ background_gc=%sTurn on/off cleaning operations, 
namely garbage
 collection and if background_gc=off, garbage collection
 will be turned off. If background_gc=sync, it will turn
 on synchronous garbage collection running in 
background.
-Default value for this option is on. So garbage
-collection is on by default.
+If background_gc=adjust, gc will adjust its policy 
depends
+on conditions: speed up if there no free segments, and 
slow
+down if there is no free space.
+Default value for this option is on. So garbage 
collection
+is on by default.
 gc_mergeWhen background_gc is on, this option can be enabled to
 let background GC thread to handle foreground GC 
requests,
 it can eliminate the sluggish issue caused by slow 
foreground
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8d4eaf4d2246..4c2f65d3c208 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1333,6 +1333,10 @@ enum {
 * background gc is on, migrating blocks
 * like foreground gc
 */
+   BGGC_MODE_ADJUST,   /*
+* background gc is on, and tune its speed
+* dependso n conditions
+*/
 };
 
 enum {
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 51d7e8d29bf1..43f935c2502a 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -28,6 +28,67 @@ static struct kmem_cache *victim_entry_slab;
 static unsigned int count_bits(const unsigned long *addr,
unsigned int offset, unsigned int len);
 
+static inline int free_user_block_ratio(struct f2fs_sb_info *sbi)
+{
+   block_t unused_user_blocks = sbi->user_block_count -
+   written_block_count(sbi);
+   return unused_user_blocks == 0 ? 100 :
+   (100 * free_user_blocks(sbi) / unused_user_blocks);
+}
+
+static bool has_few_free_segments(struct f2fs_sb_info *sbi)
+{
+   unsigned int free_segs = free_segments(sbi);
+   unsigned int ovp_segs = overprovision_segments(sbi);
+
+   return free_segs <= DEF_FEW_FREE_SEGMENT_MULTIPLE * ovp_segs;
+}
+
+static bool has_few_free_space(struct f2fs_sb_info *sbi)
+{
+   block_t total_user_block =