Hi Juhyung, On 04/04, Juhyung Park wrote: > Hi everyone, > > I want to start a discussion on using f2fs for regular desktops/workstations. > > There are growing number of interests in using f2fs as the general > root file-system: > 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS > 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option > 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880 > 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
This is quite promising. :) > > I've been personally running f2fs on all of my x86 Linux boxes since > 2015, and I have several concerns that I think we need to collectively > address for regular non-Android normies to use f2fs: > > A. Bootloader and installer support > B. Host-side GC > C. Extended node bitmap > > I'll go through each one. > > === A. Bootloader and installer support === > > It seems that both GRUB and systemd-boot supports f2fs without the > need for a separate ext4-formatted /boot partition. > Some distros are seemingly disabling f2fs module for GRUB though for > security reasons: > https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664 > > It's ultimately up to the distro folks to enable this, and still in > the worst-case scenario, they can specify a separate /boot partition > and format it to ext4 upon installation. > > The installer itself to show f2fs and call mkfs.f2fs is being worked > on currently on Ubuntu. See the 2023 links above. > > Nothing f2fs mainline developers should do here, imo. > > === B. Host-side GC === > > f2fs relieves most of the device-side GC but introduces a new > host-side GC. This is extremely confusing for people who have no > background in SSDs and flash storage to understand, let alone > discard/trim/erase complications. > > In most consumer-grade blackbox SSDs, device-side GCs are handled > automatically for various workloads. f2fs, however, leaves that > responsibility to the userspace with conservative tuning on the > kernel-side by default. Android handles this by init.rc tunings and a > separate code running in vold to trigger gc_urgent. > > For regular Linux desktop distros, f2fs just runs on the default > configuration set on the kernel and unless it’s running 24/7 with > plentiful idle time, it quickly runs out of free segments and starts > triggering foreground GC. This is giving people the wrong impression > that f2fs slows down far drastically than other file-systems when > that’s quite the contrary (i.e., less fragmentation overtime). > > This is almost the equivalent of re-living the nightmare of trim. On > SSDs with very small to no over-provisioned space, running a > file-system with no discard what-so-ever (sadly still a common case > when an external SSD is used with no UAS) will also drastically slow > the performance down. On file-systems with no asynchronous discard, > mounting a file-system with the discard option adds a non-negligible > overhead on every remove/delete operations, so most distros now > (thankfully) use a timer job registered to systemd to trigger fstrim: > https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer > > This is still far from ideal. The default file-system, ext4, slows > down drastically almost to a halt when fstrim -a is called, especially > on SATA. For some reason that is still a mystery for me, people seem > to be happy with it. No one bothered to improve it for years > ¯\_(ツ)_/¯. > > So here’s my proposal: > As Linux distros don’t have a good mechanism for hinting when to > trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and > enable it by default. > This config will hook up ioctl(FITRIM), which is currently ignored on > f2fs - > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb > , to perform discard and GC on all invalid segments. > Userspace configuration with enough f2fs/GC knowledge such as Android > should disable it. How about adding an option like "memory=high" to tune background GC parameters seamlessly? > > This will ensure that Linux distros that blindly call fstrim will at > least avoid constant slowdowns when free segments are depleted with > the occasional (once a week) slowdown, which *people are already > living with on ext4*. I'll even go further and mention that since f2fs > GC is a regular R/W workload, it doesn't cause an extreme slowdown > comparable to a level of a full file-system trim operation. > > If this is acceptable, I’ll cook up a patch. > > In an ideal world, all Linux distros should have an explicit f2fs GC > trigger mechanism (akin to > https://github.com/kdave/btrfsmaintenance#distro-integration ), but > it’s practically unrealistic to expect that, given the installer > doesn’t even support f2fs for now. > > === C. Extended node bitmap === > > f2fs by default have a very limited number of allowed inodes compared > to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs > and result in -ENOSPC. > > Here are some of the stats collected from me and my colleague that we > use daily as a regular desktop with GUI, web-browsing and everything: > 1. Laptop > Utilization: 68% (182914850 valid blocks, 462 discard blocks) > - Node: 10234905 (Inode: 10106526, Other: 128379) > - Data: 172679945 > - Inline_xattr Inode: 2004827 > - Inline_data Inode: 867204 > - Inline_dentry Inode: 51456 > > 2. Desktop #1 > Utilization: 55% (133310465 valid blocks, 0 discard blocks) > - Node: 6389660 (Inode: 6289765, Other: 99895) > - Data: 126920805 > - Inline_xattr Inode: 2253838 > - Inline_data Inode: 1119109 > - Inline_dentry Inode: 187958 > > 3. Desktop #2 > Utilization: 83% (202222003 valid blocks, 1 discard blocks) > - Node: 21887836 (Inode: 21757139, Other: 130697) > - Data: 180334167 > - Inline_xattr Inode: 39292 > - Inline_data Inode: 35213 > - Inline_dentry Inode: 1127 > > 4. Colleague > Utilization: 22% (108652929 valid blocks, 362420605 discard blocks) > - Node: 5629348 (Inode: 5542909, Other: 86439) > - Data: 103023581 > - Inline_xattr Inode: 655752 > - Inline_data Inode: 259900 > - Inline_dentry Inode: 193000 > > 5. Android phone (for reference) > Utilization: 78% (36505713 valid blocks, 1074 discard blocks) > - Node: 704698 (Inode: 683337, Other: 21361) > - Data: 35801015 > - Inline_xattr Inode: 683333 > - Inline_data Inode: 237470 > - Inline_dentry Inode: 112177 > > Chao Yu added a functionality to expand this via the -i flag passed to > mkfs.f2fs back in 2018 - > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4 > > I occasionally find myself in a weird position of having to tell > people "Oh you should use the -i option from mkfs.f2fs" when they > encounter this issue only after they’ve migrated most of the data and > ask back "Why isn’t this enabled by default?". > > While this might not be an issue for the foreseeable future in > Android, I’d argue that this is a feature that needs to be enabled by > default for desktop environments with preferably a robust testing > infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to > make much sense as it introduces more complications to how > fuzzing/testing should be done. > > I’ll also add that it’s a common practice for userspace mkfs tools to > introduce breaking default changes to older kernels (with options to > produce a legacy image, of course). Do you have some measurements regarding to the additional space that large NAT occupies? Thanks, > > This was a lengthy email, but I hope I was being reasonable. > > Jaegeuk and Chao, let me know what you think. > And as always, thanks for your hard work :) > > Thanks, > regards _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel