Hi Juhyung,

On 04/04, Juhyung Park wrote:
> Hi everyone,
> 
> I want to start a discussion on using f2fs for regular desktops/workstations.
> 
> There are growing number of interests in using f2fs as the general
> root file-system:
> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193

This is quite promising. :)

> 
> I've been personally running f2fs on all of my x86 Linux boxes since
> 2015, and I have several concerns that I think we need to collectively
> address for regular non-Android normies to use f2fs:
> 
> A. Bootloader and installer support
> B. Host-side GC
> C. Extended node bitmap
> 
> I'll go through each one.
> 
> === A. Bootloader and installer support ===
> 
> It seems that both GRUB and systemd-boot supports f2fs without the
> need for a separate ext4-formatted /boot partition.
> Some distros are seemingly disabling f2fs module for GRUB though for
> security reasons:
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> 
> It's ultimately up to the distro folks to enable this, and still in
> the worst-case scenario, they can specify a separate /boot partition
> and format it to ext4 upon installation.
> 
> The installer itself to show f2fs and call mkfs.f2fs is being worked
> on currently on Ubuntu. See the 2023 links above.
> 
> Nothing f2fs mainline developers should do here, imo.
> 
> === B. Host-side GC ===
> 
> f2fs relieves most of the device-side GC but introduces a new
> host-side GC. This is extremely confusing for people who have no
> background in SSDs and flash storage to understand, let alone
> discard/trim/erase complications.
> 
> In most consumer-grade blackbox SSDs, device-side GCs are handled
> automatically for various workloads. f2fs, however, leaves that
> responsibility to the userspace with conservative tuning on the
> kernel-side by default. Android handles this by init.rc tunings and a
> separate code running in vold to trigger gc_urgent.
> 
> For regular Linux desktop distros, f2fs just runs on the default
> configuration set on the kernel and unless it’s running 24/7 with
> plentiful idle time, it quickly runs out of free segments and starts
> triggering foreground GC. This is giving people the wrong impression
> that f2fs slows down far drastically than other file-systems when
> that’s quite the contrary (i.e., less fragmentation overtime).
> 
> This is almost the equivalent of re-living the nightmare of trim. On
> SSDs with very small to no over-provisioned space, running a
> file-system with no discard what-so-ever (sadly still a common case
> when an external SSD is used with no UAS) will also drastically slow
> the performance down. On file-systems with no asynchronous discard,
> mounting a file-system with the discard option adds a non-negligible
> overhead on every remove/delete operations, so most distros now
> (thankfully) use a timer job registered to systemd to trigger fstrim:
> https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer
> 
> This is still far from ideal. The default file-system, ext4, slows
> down drastically almost to a halt when fstrim -a is called, especially
> on SATA. For some reason that is still a mystery for me, people seem
> to be happy with it. No one bothered to improve it for years
> ¯\_(ツ)_/¯.
> 
> So here’s my proposal:
> As Linux distros don’t have a good mechanism for hinting when to
> trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
> enable it by default.
> This config will hook up ioctl(FITRIM), which is currently ignored on
> f2fs - 
> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
> , to perform discard and GC on all invalid segments.
> Userspace configuration with enough f2fs/GC knowledge such as Android
> should disable it.

How about adding an option like "memory=high" to tune background GC parameters
seamlessly?

> 
> This will ensure that Linux distros that blindly call fstrim will at
> least avoid constant slowdowns when free segments are depleted with
> the occasional (once a week) slowdown, which *people are already
> living with on ext4*. I'll even go further and mention that since f2fs
> GC is a regular R/W workload, it doesn't cause an extreme slowdown
> comparable to a level of a full file-system trim operation.
> 
> If this is acceptable, I’ll cook up a patch.
> 
> In an ideal world, all Linux distros should have an explicit f2fs GC
> trigger mechanism (akin to
> https://github.com/kdave/btrfsmaintenance#distro-integration ), but
> it’s practically unrealistic to expect that, given the installer
> doesn’t even support f2fs for now.
> 
> === C. Extended node bitmap ===
> 
> f2fs by default have a very limited number of allowed inodes compared
> to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
> and result in -ENOSPC.
> 
> Here are some of the stats collected from me and my colleague that we
> use daily as a regular desktop with GUI, web-browsing and everything:
> 1. Laptop
> Utilization: 68% (182914850 valid blocks, 462 discard blocks)
>   - Node: 10234905 (Inode: 10106526, Other: 128379)
>   - Data: 172679945
>   - Inline_xattr Inode: 2004827
>   - Inline_data Inode: 867204
>   - Inline_dentry Inode: 51456
> 
> 2. Desktop #1
> Utilization: 55% (133310465 valid blocks, 0 discard blocks)
>   - Node: 6389660 (Inode: 6289765, Other: 99895)
>   - Data: 126920805
>   - Inline_xattr Inode: 2253838
>   - Inline_data Inode: 1119109
>   - Inline_dentry Inode: 187958
> 
> 3. Desktop #2
> Utilization: 83% (202222003 valid blocks, 1 discard blocks)
>   - Node: 21887836 (Inode: 21757139, Other: 130697)
>   - Data: 180334167
>   - Inline_xattr Inode: 39292
>   - Inline_data Inode: 35213
>   - Inline_dentry Inode: 1127
> 
> 4. Colleague
> Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
>   - Node: 5629348 (Inode: 5542909, Other: 86439)
>   - Data: 103023581
>   - Inline_xattr Inode: 655752
>   - Inline_data Inode: 259900
>   - Inline_dentry Inode: 193000
> 
> 5. Android phone (for reference)
> Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
>   - Node: 704698 (Inode: 683337, Other: 21361)
>   - Data: 35801015
>   - Inline_xattr Inode: 683333
>   - Inline_data Inode: 237470
>   - Inline_dentry Inode: 112177
> 
> Chao Yu added a functionality to expand this via the -i flag passed to
> mkfs.f2fs back in 2018 -
> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
> 
> I occasionally find myself in a weird position of having to tell
> people "Oh you should use the -i option from mkfs.f2fs" when they
> encounter this issue only after they’ve migrated most of the data and
> ask back "Why isn’t this enabled by default?".
> 
> While this might not be an issue for the foreseeable future in
> Android, I’d argue that this is a feature that needs to be enabled by
> default for desktop environments with preferably a robust testing
> infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
> make much sense as it introduces more complications to how
> fuzzing/testing should be done.
> 
> I’ll also add that it’s a common practice for userspace mkfs tools to
> introduce breaking default changes to older kernels (with options to
> produce a legacy image, of course).

Do you have some measurements regarding to the additional space that large NAT
occupies?

Thanks,

> 
> This was a lengthy email, but I hope I was being reasonable.
> 
> Jaegeuk and Chao, let me know what you think.
> And as always, thanks for your hard work :)
> 
> Thanks,
> regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to