Re: long stalls when creating 20 directory trees of 1 million inodes in parallel

Kent Overstreet Mon, 10 Jun 2024 11:50:49 -0700

On Mon, Jun 10, 2024 at 08:44:37PM +0200, Mateusz Guzik wrote:
> On Mon, Jun 10, 2024 at 8:13 PM Kent Overstreet
> <kent.overstr...@linux.dev> wrote:
> >
> > On Sat, Jun 08, 2024 at 11:24:37AM +0200, Mateusz Guzik wrote:
> > > On Fri, Jun 07, 2024 at 02:10:05PM -0400, Kent Overstreet wrote:
> > > > Does the following patch help? I think the hammering on the key cache
> > > > lock may be correlated with the key cache being mostly empty (and it
> > > > looks like the shrinker code is behaving badly and trying very hard to
> > > > free from a mostly empty cache)
> > > >
> > > > [snip the patch]
> > >
> > > I see you committed the patch along with some other stuff.
> > >
> > > I git pulled to 46eb7b6c7420c2313bde44ab8f74f303b042e754 ("bcachefs: add
> > > might_sleep() annotations for fsck_err()").
> > >
> > > For me the problem is still there.
> > >
> > > This time around I populated the fs and *rebooted*, then did walktrees
> > > test in a loop. So the first run has 0 state cached.
> > >
> > > It looked like this:
> > > # while true; do sh walktrees /testfs 20; done
> > > 3.17s user 605.22s system 1409% cpu 43.150 total
> > > 2.77s user 887.92s system 693% cpu 2:08.34 total
> > > 3.05s user 726.20s system 777% cpu 1:33.77 total
> > > 3.19s user 600.90s system 1476% cpu 40.904 total
> > > 3.10s user 956.80s system 599% cpu 2:40.19 total
> > > 3.15s user 575.78s system 1331% cpu 43.464 total
> > > 3.30s user 865.48s system 1386% cpu 1:02.64 total
> > > 2.79s user 470.09s system 1404% cpu 33.666 total
> > > 2.78s user 884.00s system 718% cpu 2:03.36 total
> > > 2.95s user 568.63s system 1439% cpu 39.714 total
> > > 3.02s user 964.90s system 729% cpu 2:12.72 total
> > > 2.95s user 829.68s system 703% cpu 1:58.44 total
> > > 2.77s user 597.92s system 1486% cpu 40.403 total
> > >
> > > So the time varies wildly and there is tons of off cpu time -- with 20
> > > workers eating 100% cpu for the duration it would be 2000%.
> > >
> > > I also started seeing splats in dmesg (example at the end of the mail).
> >
> > this confirms that the SRCU splats we've been seeing are partly caused
> > by this, as I suspected
> >
> > > That is to say I would suggest you also leave it running in a loop.
> > > Expected setting is that it approaches complete CPU usage for the period
> > > and takes in the ballpark of 40s (well shorter on your hw, but you get
> > > the idea).
> >
> > Yeah, I do see it with more iterations.
> >
> > I'm redoing the key cache to get rid of the key cache lock - that solves
> > it nicely. If you want to take a look at what I'm working on,
> >
> > https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-vfs-inodes-rhashtable
> >
> > this is unfinished - I'm going to be switching back to the polling srcu
> > interface (so we can allocate from pending frees), and the patch to
> > switch vfs inodes to an rhashtable currently breaks writeback - but it's
> > much faster.
> 
> So you are not patching up the vfs inode hash with rhashtables, just
> migrating bcachefs away from it to your own rhashtables variant?


Yeah, doing it for every filesystem will be more involved since the
lifetime rules of fs/inode.c are tricky - and right now I just want to
be able to profile without inode hash table lock contention blowing
everything up.

> I don't blame you, but this also means I'm going to do some touch ups
> to my patch and send a v2. Was waiting for the bcachefs (and maybe
> inode hash) situation to resolve itself.

Well, we really should figure out some sort of a plan for fs/inode.c :/

Re: long stalls when creating 20 directory trees of 1 million inodes in parallel

Reply via email to