Hello, Note: I've also followed up on the LKML with this inquiry but as Debian stable uses an older kernel (6.1.0), I was wondering if anyone on this list has run into this problem?
Kernel: 6.1.0-17-amd64 Distribution: Debian stable Arch: x86_64 I have 2 NVME drives as part of a BTRFS RAID-1, initially when this happened the first time I added the following to the kernel cmdline at boot: nvme_core.default_ps_max_latency_us=0 pcie_aspm=off This greatly reduced the frequency of this issue (last uptime was ~70 days). However, it has occurred twice since then, this time I had netconsole up to capture the crash. The full kernel netconsole before during and after the crash: https://installkernel.tripod.com/20240701-6.1.0-crash.txt The model & firmware version of both drives are identical: Model Number: Samsung SSD 990 PRO with Heatsink 4TB Firmware Version: 4B2QJXD7 Motherboard being used: Manufacturer: ASUSTeK COMPUTER INC. Product Name: Pro WS W680-ACE IPMI Is there a workaround or potential fix for this issue? The issue starts when this occurs: [6078737.345641] nvme nvme2: I/O 154 (I/O Cmd) QID 6 timeout, aborting [6078737.348143] nvme nvme2: I/O 155 (I/O Cmd) QID 6 timeout, aborting Then later, a kernel panic: [6078894.702941] BTRFS error (device nvme0n1p2): error writing primary super block to device 2 [6078894.707920] BTRFS warning (device nvme0n1p2): csum hole found for disk bytenr range [3659038877598419968, 3659038877598424064) [6078894.708310] BTRFS critical (device nvme0n1p2): unable to find chunk map for logical 3659038877598419968 length 4096 [6078894.708652] BUG: kernel NULL pointer dereference, address: 000000000000005a [6078894.708879] #PF: supervisor read access in kernel mode [6078894.709107] #PF: error_code(0x0000) - not-present page [6078894.709292] PGD 0 P4D 0 [6078894.709509] Oops: 0000 [#1] PREEMPT SMP NOPTI [6078894.709692] CPU: 12 PID: 3349611 Comm: kworker/u64:18 Not tainted 6.1.0-17-amd64 #1 Debian 6.1.69-1 [6078894.709856] Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 3401 03/19/2024 [6078894.710022] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] [6078894.710267] RIP: 0010:btrfs_get_io_geometry+0x13/0xf0 [btrfs] [6078894.710483] Code: f4 ff ff ff e9 67 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 41 56 49 89 c9 48 89 cf 41 55 41 54 55 53 <4c> 8b 76 70 89 d3 31 d2 4c 8b 5e 18 41 8b 4e 10 45 8b 6e 14 4d 29 [6078894.710692] RSP: 0018:ffffa9cfc6657c08 EFLAGS: 00010286 [6078894.710711] BTRFS error (device nvme0n1p2): error writing primary super block to device 2 [6078894.710876] RAX: ffffffffffffffea RBX: ffffffffffffffea RCX: 32c7847906990c00 [6078894.710882] RDX: 0000000000000000 RSI: ffffffffffffffea RDI: 32c7847906990c00 [6078894.710882] RBP: ffffa9cfc6657d28 R08: ffffa9cfc6657cc8 R09: 32c7847906990c00 [6078894.710882] R10: 0000000000000003 R11: ffff9a3efff6dc28 R12: ffff9a2018195000 [6078894.710883] R13: 0000000000000001 R14: 0000000000001000 R15: ffffa9cfc6657d50 [6078894.710884] FS: 0000000000000000(0000) GS:ffff9a3e7fb00000(0000) knlGS:0000000000000000 [6078894.710884] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6078894.710885] CR2: 000000000000005a CR3: 0000000bfc210000 CR4: 0000000000750ee0 [6078894.710885] PKRU: 55555554 [6078894.710885] Call Trace: [6078894.710887] <TASK> [6078894.710891] ? page_fault_oops+0xd2/0x2b0 [6078894.710889] ? __die_body.cold+0x1a/0x1f [6078894.710893] ? exc_page_fault+0x70/0x170 [6078894.715724] ? asm_exc_page_fault+0x22/0x30 [6078894.716084] ? btrfs_get_io_geometry+0x13/0xf0 [btrfs] [6078894.716470] BTRFS error (device nvme0n1p2): error writing primary super block to device 2 [6078894.716462] ? btrfs_get_chunk_map.cold+0x15/0x42 [btrfs] [6078894.717384] __btrfs_map_block+0xc4/0xe40 [btrfs] [6078894.717771] ? kmem_cache_free+0x15/0x310 [6078894.718147] btrfs_submit_bio+0xa2/0x240 [btrfs] [6078894.718571] btrfs_repair_one_sector+0x29f/0x3a0 [btrfs] [6078894.718972] ? btrfs_submit_data_write_bio+0x110/0x110 [btrfs] [6078894.719364] end_compressed_bio_read+0x118/0x2f0 [btrfs] [6078894.719753] process_one_work+0x1c4/0x380 [6078894.720135] worker_thread+0x4d/0x380 [6078894.720510] ? rescuer_thread+0x3a0/0x3a0 [6078894.720864] kthread+0xd7/0x100 [6078894.721224] ? kthread_complete_and_exit+0x20/0x20 [6078894.721579] ret_from_fork+0x1f/0x30 [6078894.721984] </TASK> Justin