On Mon, Jun 10, 2024 at 08:44:37PM +0200, Mateusz Guzik wrote: > On Mon, Jun 10, 2024 at 8:13 PM Kent Overstreet > <kent.overstr...@linux.dev> wrote: > > > > On Sat, Jun 08, 2024 at 11:24:37AM +0200, Mateusz Guzik wrote: > > > On Fri, Jun 07, 2024 at 02:10:05PM -0400, Kent Overstreet wrote: > > > > Does the following patch help? I think the hammering on the key cache > > > > lock may be correlated with the key cache being mostly empty (and it > > > > looks like the shrinker code is behaving badly and trying very hard to > > > > free from a mostly empty cache) > > > > > > > > [snip the patch] > > > > > > I see you committed the patch along with some other stuff. > > > > > > I git pulled to 46eb7b6c7420c2313bde44ab8f74f303b042e754 ("bcachefs: add > > > might_sleep() annotations for fsck_err()"). > > > > > > For me the problem is still there. > > > > > > This time around I populated the fs and *rebooted*, then did walktrees > > > test in a loop. So the first run has 0 state cached. > > > > > > It looked like this: > > > # while true; do sh walktrees /testfs 20; done > > > 3.17s user 605.22s system 1409% cpu 43.150 total > > > 2.77s user 887.92s system 693% cpu 2:08.34 total > > > 3.05s user 726.20s system 777% cpu 1:33.77 total > > > 3.19s user 600.90s system 1476% cpu 40.904 total > > > 3.10s user 956.80s system 599% cpu 2:40.19 total > > > 3.15s user 575.78s system 1331% cpu 43.464 total > > > 3.30s user 865.48s system 1386% cpu 1:02.64 total > > > 2.79s user 470.09s system 1404% cpu 33.666 total > > > 2.78s user 884.00s system 718% cpu 2:03.36 total > > > 2.95s user 568.63s system 1439% cpu 39.714 total > > > 3.02s user 964.90s system 729% cpu 2:12.72 total > > > 2.95s user 829.68s system 703% cpu 1:58.44 total > > > 2.77s user 597.92s system 1486% cpu 40.403 total > > > > > > So the time varies wildly and there is tons of off cpu time -- with 20 > > > workers eating 100% cpu for the duration it would be 2000%. > > > > > > I also started seeing splats in dmesg (example at the end of the mail). > > > > this confirms that the SRCU splats we've been seeing are partly caused > > by this, as I suspected > > > > > That is to say I would suggest you also leave it running in a loop. > > > Expected setting is that it approaches complete CPU usage for the period > > > and takes in the ballpark of 40s (well shorter on your hw, but you get > > > the idea). > > > > Yeah, I do see it with more iterations. > > > > I'm redoing the key cache to get rid of the key cache lock - that solves > > it nicely. If you want to take a look at what I'm working on, > > > > https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-vfs-inodes-rhashtable > > > > this is unfinished - I'm going to be switching back to the polling srcu > > interface (so we can allocate from pending frees), and the patch to > > switch vfs inodes to an rhashtable currently breaks writeback - but it's > > much faster. > > So you are not patching up the vfs inode hash with rhashtables, just > migrating bcachefs away from it to your own rhashtables variant?
Yeah, doing it for every filesystem will be more involved since the lifetime rules of fs/inode.c are tricky - and right now I just want to be able to profile without inode hash table lock contention blowing everything up. > I don't blame you, but this also means I'm going to do some touch ups > to my patch and send a v2. Was waiting for the bcachefs (and maybe > inode hash) situation to resolve itself. Well, we really should figure out some sort of a plan for fs/inode.c :/