Hi Jaegeuk, sorry for the late reply. On Wed, Apr 5, 2023 at 8:35 AM Jaegeuk Kim <jaeg...@kernel.org> wrote: > > Hi Juhyung, > > > So here’s my proposal: > > As Linux distros don’t have a good mechanism for hinting when to > > trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and > > enable it by default. > > This config will hook up ioctl(FITRIM), which is currently ignored on > > f2fs - > > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb > > , to perform discard and GC on all invalid segments. > > Userspace configuration with enough f2fs/GC knowledge such as Android > > should disable it. > > How about adding an option like "memory=high" to tune background GC parameters > seamlessly? >
Can you elaborate this design? Not sure what both "memory" and "high" mean in that context. Even if we tune BG GC parameters, I still think the same problem exists: we don't know which heuristic covers all workloads and aged environments. I like the idea of a dynamic GC tuner in the kernel (if that's what you meant), but I think it's complimentary to the proposed CONFIG_F2FS_GC_UPON_FSTRIM, not a replacement unless we're 100% certain that it can cover all workloads. My proposed CONFIG_F2FS_GC_UPON_FSTRIM can act as a safeguard to *not* introduce any more performance slowdowns compared to other file-systems. > > > > === C. Extended node bitmap === > > > > f2fs by default have a very limited number of allowed inodes compared > > to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs > > and result in -ENOSPC. > > > > Here are some of the stats collected from me and my colleague that we > > use daily as a regular desktop with GUI, web-browsing and everything: > > 1. Laptop > > Utilization: 68% (182914850 valid blocks, 462 discard blocks) > > - Node: 10234905 (Inode: 10106526, Other: 128379) > > - Data: 172679945 > > - Inline_xattr Inode: 2004827 > > - Inline_data Inode: 867204 > > - Inline_dentry Inode: 51456 > > > > 2. Desktop #1 > > Utilization: 55% (133310465 valid blocks, 0 discard blocks) > > - Node: 6389660 (Inode: 6289765, Other: 99895) > > - Data: 126920805 > > - Inline_xattr Inode: 2253838 > > - Inline_data Inode: 1119109 > > - Inline_dentry Inode: 187958 > > > > 3. Desktop #2 > > Utilization: 83% (202222003 valid blocks, 1 discard blocks) > > - Node: 21887836 (Inode: 21757139, Other: 130697) > > - Data: 180334167 > > - Inline_xattr Inode: 39292 > > - Inline_data Inode: 35213 > > - Inline_dentry Inode: 1127 > > > > 4. Colleague > > Utilization: 22% (108652929 valid blocks, 362420605 discard blocks) > > - Node: 5629348 (Inode: 5542909, Other: 86439) > > - Data: 103023581 > > - Inline_xattr Inode: 655752 > > - Inline_data Inode: 259900 > > - Inline_dentry Inode: 193000 > > > > 5. Android phone (for reference) > > Utilization: 78% (36505713 valid blocks, 1074 discard blocks) > > - Node: 704698 (Inode: 683337, Other: 21361) > > - Data: 35801015 > > - Inline_xattr Inode: 683333 > > - Inline_data Inode: 237470 > > - Inline_dentry Inode: 112177 > > > > Chao Yu added a functionality to expand this via the -i flag passed to > > mkfs.f2fs back in 2018 - > > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4 > > > > I occasionally find myself in a weird position of having to tell > > people "Oh you should use the -i option from mkfs.f2fs" when they > > encounter this issue only after they’ve migrated most of the data and > > ask back "Why isn’t this enabled by default?". > > > > While this might not be an issue for the foreseeable future in > > Android, I’d argue that this is a feature that needs to be enabled by > > default for desktop environments with preferably a robust testing > > infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to > > make much sense as it introduces more complications to how > > fuzzing/testing should be done. > > > > I’ll also add that it’s a common practice for userspace mkfs tools to > > introduce breaking default changes to older kernels (with options to > > produce a legacy image, of course). > > Do you have some measurements regarding to the additional space that large NAT > occupies? > Is this something that f2fs' debugfs captures? If not, please let me know if there's a standard practice on how to measure it. debugfs output on each system in order: =====[ partition info(nvme0n1p4). #0, RW, CP: Good]===== [SBI: fs_dirty recovered] [SB: 1] [CP: 2] [SIT: 38] [NAT: 460] [SSA: 1024] [MAIN: 522763(OverProv:2050 Resv:1007)] Current Time Sec: 229327 / Mounted Time Sec: 0 Policy: - IPU: [ FSYNC ] Utilization: 68% (182885118 valid blocks, 598 discard blocks) - Node: 10222831 (Inode: 10094469, Other: 128362) - Data: 172662287 - Inline_xattr Inode: 244470 - Inline_data Inode: 92414 - Inline_dentry Inode: 16497 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 26, 0, 0 Main area: 522763 segs, 522763 secs 522763 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 496287 496287 496287 2128 73057 38372383 - WARM data: 496410 496410 496410 661 242675 124379666 - HOT data: 22368 22368 22368 161 19195 9856098 - Dir dnode: 20964 20964 20964 70 2481 1301404 - File dnode: 22893 22893 22893 753 16742 8913189 - Indir nodes: 822 822 822 3 14 8225 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 354170 - Dirty: 3770 - Prefree: 0 - Free: 164823 (164823) CP calls: 1600 (BG: 420) - cp blocks : 12986 - sit blocks : 25046 - nat blocks : 35336 - ssa blocks : 8504 CP merge: - Queued : 0 - Issued : 1660 - Total : 1663 - Cur time : 1(ms) - Peak time : 165(ms) GC calls: 1155 (BG: 1156) - data segments : 993 (993) - node segments : 162 (162) - Reclaimed segs : - Normal : 7 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 1148 - Urgent Mid : 0 - Urgent Low : 0 Try to move 177920 blocks (BG: 177920) - data blocks : 152337 (152337) - node blocks : 25583 (25583) BG skip : IO: 143, Other: 10 Extent Cache (Read): - Hit Count: L1-1:290494 L1-2:36316 L2:3872 - Hit Ratio: 17% (330682 / 1885295) - Inner Struct Count: tree: 223660(0), node: 108586 Extent Cache (Block Age): - Allocated Data Blocks: 2561142 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 15980 1), Discard: ( 0 11499)) cmd: 24771 undiscard:592388 - atomic IO: 0 (Max. 3) - compress: 0, hit: 0 - nodes: 36 in 48840 - dents: 19 in dirs: 5 ( 45) - datas: 249 in files: 0 - quota datas: 0 in quota files: 0 - meta: 2 in 4893 - imeta: 31 - fsync mark: 22 - NATs: 66/ 99805 - SITs: 35/ 522763 - free_nids: 9360/ 43357933 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [----------------------------------|-|---------------] IPU: 30928 blocks SSR: 157339 blocks in 3514 segments LFS: 2548926 blocks in 4979 segments BDF: 99, avg. vblocks: 396 Memory: 374747 KB - static: 126900 KB - cached all: 32914 KB - read extent cache: 26855 KB - block age extent cache: 0 KB - paged : 214932 KB =====[ partition info(nvme0n1p3). #0, RW, CP: Good]===== [SB: 1] [CP: 2] [SIT: 34] [NAT: 418] [SSA: 930] [MAIN: 474781(OverProv:1955 Resv:960)] Current Time Sec: 2144822 / Mounted Time Sec: 2 Policy: - IPU: [ FSYNC ] Utilization: 55% (133484583 valid blocks, 0 discard blocks) - Node: 6365874 (Inode: 6265128, Other: 100746) - Data: 127118709 - Inline_xattr Inode: 2271411 - Inline_data Inode: 1120167 - Inline_dentry Inode: 226245 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 12, 0, 0 Main area: 474781 segs, 474781 secs 474781 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 326480 326480 326480 59 33625 17245779 - WARM data: 94665 94665 94665 22 194916 99808056 - HOT data: 11739 11739 11739 7 19473 9973372 - Dir dnode: 6525 6525 6525 103 1231 682165 - File dnode: 8957 8957 8957 621 10477 5679709 - Indir nodes: 609 609 609 1 7 4000 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 259735 - Dirty: 807 - Prefree: 0 - Free: 214239 (214239) CP calls: 39861 (BG: 34749) - cp blocks : 322043 - sit blocks : 218279 - nat blocks : 2070086 - ssa blocks : 55514 CP merge: - Queued : 0 - Issued : 40665 - Total : 40679 - Cur time : 0(ms) - Peak time : 668(ms) GC calls: 32936 (BG: 33191) - data segments : 12319 (12319) - node segments : 20617 (20617) - Reclaimed segs : - Normal : 32936 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 0 - Urgent Mid : 0 - Urgent Low : 0 Try to move 13622714 blocks (BG: 13622714) - data blocks : 4354751 (4354751) - node blocks : 9267963 (9267963) BG skip : IO: 740, Other: 0 Extent Cache (Read): - Hit Count: L1-1:4404282 L1-2:2449632 L2:61448 - Hit Ratio: 34% (6915362 / 20237092) - Inner Struct Count: tree: 2024530(0), node: 848681 Extent Cache (Block Age): - Allocated Data Blocks: 17851435 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 0 1), Discard: ( 0 78857)) cmd: 1852 undiscard:2873 - atomic IO: 0 (Max. 0) - compress: 0, hit: 0 - nodes: 0 in 3568509 - dents: 0 in dirs: 0 ( 2) - datas: 0 in files: 0 - quota datas: 0 in quota files: 0 - meta: 0 in 80118 - imeta: 0 - fsync mark: 0 - NATs: 0/ 99802 - SITs: 0/ 474781 - free_nids: 75842/ 42322763 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [---------------------------|-|----------------------] IPU: 1060837 blocks SSR: 0 blocks in 0 segments LFS: 28422813 blocks in 55514 segments BDF: 99, avg. vblocks: 506 Memory: 14948530 KB - static: 115259 KB - cached all: 238763 KB - read extent cache: 233655 KB - block age extent cache: 0 KB - paged : 14594508 KB =====[ partition info(nvme1n1p1). #1, RW, CP: Good]===== [SB: 1] [CP: 2] [SIT: 34] [NAT: 418] [SSA: 931] [MAIN: 475548(OverProv:1956 Resv:960)] Current Time Sec: 2144822 / Mounted Time Sec: 3 Policy: - IPU: [ FSYNC ] Utilization: 83% (202224148 valid blocks, 26 discard blocks) - Node: 21888175 (Inode: 21757478, Other: 130697) - Data: 180335973 - Inline_xattr Inode: 39665 - Inline_data Inode: 35530 - Inline_dentry Inode: 1133 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 0, 0, 0 Main area: 475548 segs, 475548 secs 475548 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 371394 371394 371394 4 125054 64029544 - WARM data: 62872 62872 62872 2 213588 109357995 - HOT data: 56416 56416 56416 1 13571 6948434 - Dir dnode: 38945 38945 38945 15 3828 1967280 - File dnode: 26083 26083 26083 25 38827 19891815 - Indir nodes: 32035 32035 32035 1 56 29080 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 394930 - Dirty: 42 - Prefree: 0 - Free: 80576 (80576) CP calls: 2507 (BG: 9077) - cp blocks : 17756 - sit blocks : 12427 - nat blocks : 92338 - ssa blocks : 8736 CP merge: - Queued : 0 - Issued : 9091 - Total : 9091 - Cur time : 0(ms) - Peak time : 1095(ms) GC calls: 2192 (BG: 8833) - data segments : 781 (781) - node segments : 1411 (1411) - Reclaimed segs : - Normal : 2192 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 0 - Urgent Mid : 0 - Urgent Low : 0 Try to move 816343 blocks (BG: 816343) - data blocks : 241539 (241539) - node blocks : 574804 (574804) BG skip : IO: 65, Other: 0 Extent Cache (Read): - Hit Count: L1-1:154989 L1-2:162385 L2:444 - Hit Ratio: 64% (317818 / 495270) - Inner Struct Count: tree: 38118(0), node: 661 Extent Cache (Block Age): - Allocated Data Blocks: 3447097 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 0 1), Discard: ( 0 25545)) cmd: 0 undiscard: 0 - atomic IO: 0 (Max. 0) - compress: 0, hit: 0 - nodes: 0 in 114151 - dents: 0 in dirs: 0 ( 0) - datas: 0 in files: 0 - quota datas: 0 in quota files: 0 - meta: 0 in 31810 - imeta: 0 - fsync mark: 0 - NATs: 0/ 99571 - SITs: 0/ 475548 - free_nids: 10557/ 26800462 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [-----------------------------------------|-|--------] IPU: 55 blocks SSR: 0 blocks in 0 segments LFS: 4473987 blocks in 8736 segments BDF: 99, avg. vblocks: 480 Memory: 705967 KB - static: 115434 KB - cached all: 6689 KB - read extent cache: 3322 KB - block age extent cache: 0 KB - paged : 583844 KB =====[ partition info(nvme1n1p3). #0, RW, CP: Good]===== [SBI: fs_dirty recovered] [SB: 1] [CP: 2] [SIT: 68] [NAT: 52] [SSA: 1861] [MAIN: 950830(OverProv:2765 Resv:1341)] Current Time Sec: 5410 / Mounted Time Sec: 4 Utilization: 22% (108652929 valid blocks, 362420605 discard blocks) - Node: 5629348 (Inode: 5542909, Other: 86439) - Data: 103023581 - Inline_xattr Inode: 655752 - Inline_data Inode: 259900 - Inline_dentry Inode: 193000 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 52, 0, 0 Main area: 950830 segs, 950830 secs 950830 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 292597 292597 292597 128 50139 25718354 - WARM data: 685385 685385 685385 8937 142173 74591050 - HOT data: 21291 21291 21291 1372 4332 2689025 - Dir dnode: 23471 23471 23471 1041 110 476470 - File dnode: 21255 21255 21255 11021 3507 5149194 - Indir nodes: 22875 22875 22875 40 1 3683 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 200268 - Dirty: 22533 - Prefree: 0 - Free: 728029 (728029) CP calls: 1736 (BG: 2664) - cp blocks : 8821 - sit blocks : 12481 - nat blocks : 29761 - ssa blocks : 31070 CP merge: - Queued : 0 - Issued : 3280 - Total : 3350 - Cur time : 0(ms) - Peak time : 61(ms) GC calls: 0 (BG: 0) - data segments : 0 (0) - node segments : 0 (0) - Reclaimed segs : - Normal : 0 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 0 - Urgent Mid : 0 - Urgent Low : 0 Try to move 0 blocks (BG: 0) - data blocks : 0 (0) - node blocks : 0 (0) BG skip : IO: 88, Other: 0 Extent Cache (Read): - Hit Count: L1-1:163381 L1-2:3828 L2:2956 - Hit Ratio: 2% (170165 / 7168533) - Inner Struct Count: tree: 453025(0), node: 193727 Extent Cache (Block Age): - Allocated Data Blocks: 15227950 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 3920 1), Discard: ( 0 550)) cmd: 23117 undiscard:15649654 - atomic IO: 0 (Max. 2) - compress: 0, hit: 0 - nodes: 52 in 1200554 - dents: 4 in dirs: 1 ( 16) - datas: 1796 in files: 0 - quota datas: 0 in quota files: 0 - meta: 2 in 63898 - imeta: 9 - fsync mark: 5 - NATs: 18/ 99583 - SITs: 90/ 950830 - free_nids: 427597/ 427597 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [-----------|-|--------------------------------------] IPU: 7883 blocks SSR: 0 blocks in 0 segments LFS: 15907965 blocks in 31070 segments BDF: 98, avg. vblocks: 270 Memory: 5343760 KB - static: 217726 KB - cached all: 68225 KB - read extent cache: 52553 KB - block age extent cache: 0 KB - paged : 5057808 KB =====[ partition info(sda20). #0, RW, CP: Good]===== [SBI: fs_dirty recovered quota_need_flush] [SB: 1] [CP: 2] [SIT: 8] [NAT: 112] [SSA: 180] [MAIN: 91857(OverProv:862 Resv:433)] Current Time Sec: 774835 / Mounted Time Sec: 3 Policy: - IPU: [ FSYNC ] Utilization: 79% (37048389 valid blocks, 430 discard blocks) - Node: 719422 (Inode: 697722, Other: 21700) - Data: 36328967 - Inline_xattr Inode: 445204 - Inline_data Inode: 195233 - Inline_dentry Inode: 51879 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 368, 0, 0 Main area: 91857 segs, 91857 secs 91857 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 91264 91264 91264 1893 42822 22726490 - WARM data: 8424 8424 8424 766 25037 12909190 - HOT data: 7453 7453 7453 538 1282 692980 - Dir dnode: 8372 8372 8372 192 111 111076 - File dnode: 8051 8051 8051 1309 393 608259 - Indir nodes: 256 256 256 1 0 85 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 69651 - Dirty: 4693 - Prefree: 0 - Free: 17513 (17513) CP calls: 130942 (BG: 2181) - cp blocks : 662119 - sit blocks : 874869 - nat blocks : 3043527 - ssa blocks : 136941 CP merge: - Queued : 0 - Issued : 134360 - Total : 134806 - Cur time : 22(ms) - Peak time : 525(ms) GC calls: 30425 (BG: 30455) - data segments : 21321 (21321) - node segments : 9104 (9104) - Reclaimed segs : - Normal : 417 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 30008 - Urgent Mid : 0 - Urgent Low : 0 Try to move 3257936 blocks (BG: 3257936) - data blocks : 2470378 (2470378) - node blocks : 787558 (787558) BG skip : IO: 7577, Other: 36953 Extent Cache (Read): - Hit Count: L1-1:13680740 L1-2:907686 L2:450749 - Hit Ratio: 14% (15039175 / 105839291) - Inner Struct Count: tree: 369427(0), node: 447 Extent Cache (Block Age): - Allocated Data Blocks: 36567055 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 0 1), Discard: ( 0 406517)) cmd: 28565 undiscard:110127 - atomic IO: 0 (Max. 8) - compress: 0, hit: 0 - nodes: 6 in 2914 - dents: 1 in dirs: 1 ( 43) - datas: 505 in files: 0 - quota datas: 1 in quota files: 3 - meta: 0 in 714 - imeta: 6 - fsync mark: 5 - NATs: 12/ 31418 - SITs: 7/ 91857 - free_nids: 10902/ 12326330 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [---------------------------------------|--|---------] IPU: 3055488 blocks SSR: 1881000 blocks in 52764 segments LFS: 41233030 blocks in 80532 segments BDF: 98, avg. vblocks: 295 Memory: 73523 KB - static: 22853 KB - cached all: 36158 KB - read extent cache: 31779 KB - block age extent cache: 0 KB - paged : 14512 KB Thanks, > Thanks, > > > > > This was a lengthy email, but I hope I was being reasonable. > > > > Jaegeuk and Chao, let me know what you think. > > And as always, thanks for your hard work :) > > > > Thanks, > > regards _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel