On Fri, Jun 07, 2024 at 06:51:05PM +0200, Mateusz Guzik wrote: > On Fri, Jun 7, 2024 at 6:28 PM Mateusz Guzik <mjgu...@gmail.com> wrote: > > > > On Fri, Jun 7, 2024 at 6:07 PM Kent Overstreet > > <kent.overstr...@linux.dev> wrote: > > > > > > On Fri, Jun 07, 2024 at 12:10:34PM +0200, Mateusz Guzik wrote: > > > > On Fri, Jun 7, 2024 at 11:13 AM Kent Overstreet > > > > <kent.overstr...@linux.dev> wrote: > > > > > > > > > > On Fri, Jun 07, 2024 at 08:50:40AM +0200, Mateusz Guzik wrote: > > > > > > On Fri, Jun 7, 2024 at 2:31 AM Kent Overstreet > > > > > > <kent.overstr...@linux.dev> wrote: > > > > > > > > > > > > > > On Thu, Jun 06, 2024 at 08:40:50PM +0200, Mateusz Guzik wrote: > > > > > > > > So I tried out bcachefs again and it once more fails to complete > > > > > > > > parallel creation of 20 mln files -- processes doing the work > > > > > > > > use cpu > > > > > > > > time in the kernel indefinitely. > > > > > > > > > > > > > > Hey thanks for the report - can you try my branch? It's behaving > > > > > > > reasonable well in my testing now > > > > > > > > > > > > > > https://evilpiepirate.org/git/bcachefs.git bcachefs-for-upstream > > > > > > > > > > > > No dice, booted up top of your tree: 6.10.0-rc2-00013-gb7d8959f5ce9 > > > > > > > > > > That commit completely fixes the lock contention on the btree key > > > > > cache > > > > > lock I was seeing before, are you sure that's the commit you were > > > > > running? > > > > > > > > Yea, I have CONFIG_LOCALVERSION_AUTO=y and included part of uname -a > > > > from the running kernel to indicate the commit. That's b7d8959f5ce9 > > > > "bcachefs: Fix reporting of freed objects from key cache shrinker". > > > > > > > > I also have CONFIG_BCACHEFS=y > > > > > > > > But to put this to bed I added this: > > > > diff --git a/fs/bcachefs/btree_key_cache.c > > > > b/fs/bcachefs/btree_key_cache.c > > > > index e73162f9af37..459f0871c9a4 100644 > > > > --- a/fs/bcachefs/btree_key_cache.c > > > > +++ b/fs/bcachefs/btree_key_cache.c > > > > @@ -821,8 +821,14 @@ static unsigned long > > > > bch2_btree_key_cache_scan(struct shrinker *shrink, > > > > size_t scanned = 0, freed = 0, nr = sc->nr_to_scan; > > > > unsigned start, flags; > > > > int srcu_idx; > > > > + static bool printed; > > > > > > > > mutex_lock(&bc->lock); > > > > + if (!printed) { > > > > + printk(KERN_EMERG "here\n"); > > > > + printed = true; > > > > + } > > > > + > > > > bc->requested_to_free += sc->nr_to_scan; > > > > > > > > srcu_idx = srcu_read_lock(&c->btree_trans_barrier); > > > > > > > > > > > > it shows up and my procs are still stuck in the kernel > > > > > > > > Is it a problem for you to get a similar setup to mine? It's not very > > > > outlandish -- 24 cores and 24G of ram. You can probably get away with > > > > both less cores and ram. > > > > > > I've tried with 4 GB of ram and with 24, and giving the vm 24 cores - > > > your test passes for me, latest run was in 71 seconds. > > > > > > Host machine is a ryzen 5950x, 16 physical cores, 32 hyperthreaded - the > > > lack of 24 physical cores is the only possible difference I can see. But > > > I'm just not seeing the key cache lock contention anymore - I am seeing > > > massive lock contention on the inode hash table lock, though. > > > > > > Here's a call graph profile of part of the last run. There's still a bit > > > of key cache lock contention, but it's only 2% of cpu time now. > > > > > > > Huh, it does appear fixed after all. > > > > I tried it out again, but this time without any of the debug. It *did* > > finish, in 3 minutes. I ran it again just to be sure. > > Also the stalls I was seeing while trying to poke also gone (for > > example bpftrace would hang). > > > > The kernel with your fixes saddled with all the debug probably needed > > significantly more than said 3 minutes and I concluded things went bad > > before it could legitimately finish. Kind of a PEBKAC here. > > > > tl;dr I confirm it is fixed, thanks > > > > While things do *work*, the walktree part of the test is very slow > compared to ext4 and bcachefs, both of which do it in less than a > minute. > bcachefs repeatedly needs over two. > > mkfs.bcachefs /dev/vdb > mount.bcachefs.sh -o noatime /dev/vdb /testfs > sh createtrees /testfs 20 > while true; do sh walktrees /testfs 20; done > > I get times: > 2.80s user 986.09s system 697% cpu 2:21.84 total > 2.77s user 993.15s system 748% cpu 2:13.07 total > 2.95s user 992.67s system 747% cpu 2:13.26 total
ext4 gives me WATCHDOG 300 mke2fs 1.47.1-rc2 (01-May-2024) Discarding device blocks: done Creating filesystem with 8386560 4k blocks and 41943040 inodes Filesystem UUID: 4e72e045-ef0d-4d49-82d8-bc83e853a492 Superblock backups stored on blocks: 6552, 19656, 32760, 45864, 58968, 163800, 176904, 321048, 530712, 819000, 1592136, 2247336, 4095000, 4776408 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done EXT4-fs (vdb): mounted filesystem 4e72e045-ef0d-4d49-82d8-bc83e853a492 r/w with ordered data mode. Quota mode: disabled. createtrees real 0m27.552s user 0m6.678s sys 4m17.175s walktrees real 0m34.830s user 0m2.266s sys 10m38.617s and I get similar from bcachefs - except I did just have a run where createtrees took ~3m with key cache lock contention spiking again, so it appears that's not completely fixed. How are you getting sub 3s?