On Fri, Jun 07, 2024 at 02:10:05PM -0400, Kent Overstreet wrote:
> Does the following patch help? I think the hammering on the key cache
> lock may be correlated with the key cache being mostly empty (and it
> looks like the shrinker code is behaving badly and trying very hard to
> free from a mostly empty cache)
> 
> [snip the patch]

I see you committed the patch along with some other stuff.

I git pulled to 46eb7b6c7420c2313bde44ab8f74f303b042e754 ("bcachefs: add
might_sleep() annotations for fsck_err()").

For me the problem is still there.

This time around I populated the fs and *rebooted*, then did walktrees
test in a loop. So the first run has 0 state cached.

It looked like this:
# while true; do sh walktrees /testfs 20; done
3.17s user 605.22s system 1409% cpu 43.150 total
2.77s user 887.92s system 693% cpu 2:08.34 total
3.05s user 726.20s system 777% cpu 1:33.77 total
3.19s user 600.90s system 1476% cpu 40.904 total
3.10s user 956.80s system 599% cpu 2:40.19 total
3.15s user 575.78s system 1331% cpu 43.464 total
3.30s user 865.48s system 1386% cpu 1:02.64 total
2.79s user 470.09s system 1404% cpu 33.666 total
2.78s user 884.00s system 718% cpu 2:03.36 total
2.95s user 568.63s system 1439% cpu 39.714 total
3.02s user 964.90s system 729% cpu 2:12.72 total
2.95s user 829.68s system 703% cpu 1:58.44 total
2.77s user 597.92s system 1486% cpu 40.403 total

So the time varies wildly and there is tons of off cpu time -- with 20
workers eating 100% cpu for the duration it would be 2000%.

I also started seeing splats in dmesg (example at the end of the mail).

That is to say I would suggest you also leave it running in a loop.
Expected setting is that it approaches complete CPU usage for the period
and takes in the ballpark of 40s (well shorter on your hw, but you get
the idea).

It may be there is something in my config which makes this easier to run
into, you can find it here:
https://people.freebsd.org/~mjg/.linux-crap/.config

[everything that's built is =y, as in no modules, usable with direct
boot with kvm without kernel-specific initrd]

trace:
[  135.911692] btree trans held srcu lock (delaying memory reclaim) for 23 
seconds
[  135.911712] WARNING: CPU: 4 PID: 1011 at fs/bcachefs/btree_iter.c:2996 
bch2_trans_srcu_unlock+0x108/0x120
[  135.911732] Modules linked in:
[  135.911735] CPU: 4 PID: 1011 Comm: walktree Not tainted 
6.10.0-rc2-00019-g46eb7b6c7420 #25
[  135.911737] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[  135.911739] RIP: 0010:bch2_trans_srcu_unlock+0x108/0x120
[  135.911741] Code: 6e 8d 47 01 48 c7 c7 10 84 25 9f 48 b8 cf f7 53 e3 a5 9b 
c4 20 48 29 ca 48 d1 ea 48 f7 e2 48 89 d6 48 c1 ee 04 e8 68 61 a1 ff <0f> 0b e9 
68 ff ff ff 0f 0b e9 77 ff ff ff 66 2e 0f 1f 84 00 00
00
[  135.911743] RSP: 0018:ff8d1078010d3b08 EFLAGS: 00010286
[  135.911746] RAX: 0000000000000000 RBX: ff4d738ec7d78000 RCX: 0000000000000027
[  135.911747] RDX: ff4d739427221788 RSI: 0000000000000001 RDI: ff4d739427221780
[  135.911748] RBP: ff4d73903b2c0000 R08: 0000000000000000 R09: 0000000000000003
[  135.911749] R10: ff8d1078010d39a8 R11: ff4d73943fd8a8a8 R12: fffffffffffff77a
[  135.911750] R13: ff4d738ec7d78000 R14: ff4d738e33d40780 R15: ff4d738ec7d78000
[  135.911751] FS:  00007fee88948740(0000) GS:ff4d739427200000(0000) 
knlGS:0000000000000000
[  135.911753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.911754] CR2: 00007f3273d214e0 CR3: 0000000667376006 CR4: 0000000000371ef0
[  135.911758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.911759] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  135.911760] Call Trace:
[  135.911766]  <TASK>
[  135.911769]  ? __warn+0x80/0x120
[  135.911773]  ? bch2_trans_srcu_unlock+0x108/0x120
[  135.911775]  ? report_bug+0x164/0x190
[  135.911779]  ? handle_bug+0x3c/0x80
[  135.911782]  ? exc_invalid_op+0x17/0x70
[  135.911784]  ? asm_exc_invalid_op+0x1a/0x20
[  135.911789]  ? bch2_trans_srcu_unlock+0x108/0x120
[  135.911791]  ? bch2_trans_srcu_unlock+0x108/0x120
[  135.911792]  ? bch2_trans_begin+0xf8/0x5d0
[  135.911794]  bch2_trans_begin+0x56b/0x5d0
[  135.911796]  bch2_lookup+0x77/0x180
[  135.911801]  __lookup_slow+0x82/0x130
[  135.911808]  walk_component+0xdb/0x150
[  135.911811]  path_lookupat+0x6a/0x1a0
[  135.911813]  filename_lookup+0xcc/0x1c0
[  135.911816]  ? _raw_spin_unlock+0xe/0x30
[  135.911819]  ? list_lru_add+0xb3/0x100
[  135.911822]  ? from_kgid_munged+0x12/0x30
[  135.911825]  vfs_statx+0x80/0x1b0
[  135.911829]  vfs_fstatat+0x85/0xb0
[  135.911831]  __do_sys_newfstatat+0x26/0x60
[  135.911835]  ? syscall_exit_to_user_mode+0x5b/0x1e0
[  135.911837]  ? clear_bhb_loop+0x45/0xa0
[  135.911838]  do_syscall_64+0x57/0x160
[  135.911842]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.911845] RIP: 0033:0x7fee88a4275a
[  135.911848] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 
00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 
ff ff 77 07 31 c0 c3 0f 1f 40 00 48 8b 15 71 a6 0d 00
f7
[  135.911849] RSP: 002b:00007fff556a8a18 EFLAGS: 00000246 ORIG_RAX: 
0000000000000106
[  135.911851] RAX: ffffffffffffffda RBX: 00007fff556a8ac0 RCX: 00007fee88a4275a
[  135.911852] RDX: 00007fff556a8a30 RSI: 00007fff556a8ac0 RDI: 00000000ffffff9c
[  135.911853] RBP: 0000000000000263 R08: 0000000000000000 R09: 0000000000000064
[  135.911855] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff556a8a30
[  135.911856] R13: 000055ef2c27f013 R14: 00000000000003e8 R15: 00000000000002df
[  135.911858]  </TASK>



Reply via email to