Public bug reported: After upgrading to jammy kernel 5.15.0-144-generic we encountered a serious regression when the weekly fstrim timer ran.
This bug was introduced by commit "md/raid10: fix missing discard IO accounting" https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2 which was backported to all stable kernels and became part of 5.15.181 The issue was discovered earlier upstream[1] and also in Debian[2], which resulted in a fix being added to the Debian kernel and subsequently into 6.1. However the missing patch[3] did not make it into the 5.15-stable kernel triggering the regression also in Ubuntu jammy. [1] https://lists.linaro.org/archives/list/[email protected]/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/ [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460 [3] https://lore.kernel.org/all/[email protected]/ dmesg: kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: #PF: supervisor instruction fetch in kernel mode kernel: #PF: error_code(0x0010) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0010 [#1] SMP PTI kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic #157-Ubuntu kernel: Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.27.0.SR.1 for D3417-B2x 06/10/2020 kernel: RIP: 0010:0x0 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206 kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001 kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800 kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050 kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00 kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400 kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: <TASK> kernel: mempool_alloc+0x61/0x1b0 kernel: ? __kmalloc+0x179/0x330 kernel: bio_alloc_bioset+0x9d/0x370 kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10] kernel: bio_clone_fast+0x1f/0x90 kernel: md_account_bio+0x42/0x80 kernel: raid10_handle_discard+0x56f/0x6b0 [raid10] kernel: raid10_make_request+0x147/0x180 [raid10] kernel: md_handle_request+0x12a/0x1b0 kernel: ? submit_bio_checks+0x1a5/0x580 kernel: md_submit_bio+0x76/0xc0 kernel: __submit_bio+0x1a2/0x220 kernel: ? mempool_alloc_slab+0x17/0x20 kernel: ? mempool_alloc+0x61/0x1b0 kernel: ? schedule_timeout+0x91/0x140 kernel: __submit_bio_noacct+0x85/0x200 kernel: submit_bio_noacct+0x4e/0x120 kernel: ? __cond_resched+0x1a/0x60 kernel: submit_bio+0x4a/0x130 kernel: submit_bio_wait+0x5a/0xc0 kernel: blkdev_issue_discard+0x7e/0xd0 kernel: ext4_try_to_trim_range+0x2db/0x520 kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0 kernel: ext4_trim_fs+0x313/0x510 kernel: __ext4_ioctl+0x82c/0xef0 kernel: ext4_ioctl+0xe/0x20 kernel: __x64_sys_ioctl+0x92/0xd0 kernel: x64_sys_call+0x1e5f/0x1fa0 kernel: do_syscall_64+0x56/0xb0 kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6 kernel: RIP: 0033:0x7f6fffc0994f kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 > kernel: RSP: 002b:00007ffdce979c30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: RAX: ffffffffffffffda RBX: 00007ffdce979d80 RCX: 00007f6fffc0994f kernel: RDX: 00007ffdce979ca0 RSI: 00000000c0185879 RDI: 0000000000000003 kernel: RBP: 0000558436acccb0 R08: 0000558436acccb0 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 kernel: R13: 0000558436accfa0 R14: 0000558436acce80 R15: 0000558436acce80 kernel: </TASK> kernel: Modules linked in: tls tcp_diag udp_diag inet_diag bridge stp llc nft_counter nft_chain_nat nf_nat > kernel: xhci_pci_renesas wmi video kernel: CR2: 0000000000000000 kernel: ---[ end trace db9334d27f904581 ]--- kernel: RIP: 0010:0x0 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206 kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001 kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800 kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050 kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00 kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400 kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: BUG: unable to handle page fault for address: ffffb57600000010 ** Affects: linux (Ubuntu) Importance: Undecided Status: Confirmed ** Summary changed: - {Regression] kernel 5.15.0-144-generic - discard broken with RAID10 + [Regression] kernel 5.15.0-144-generic - discard broken with RAID10 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2117395 Title: [Regression] kernel 5.15.0-144-generic - discard broken with RAID10 Status in linux package in Ubuntu: Confirmed Bug description: After upgrading to jammy kernel 5.15.0-144-generic we encountered a serious regression when the weekly fstrim timer ran. This bug was introduced by commit "md/raid10: fix missing discard IO accounting" https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2 which was backported to all stable kernels and became part of 5.15.181 The issue was discovered earlier upstream[1] and also in Debian[2], which resulted in a fix being added to the Debian kernel and subsequently into 6.1. However the missing patch[3] did not make it into the 5.15-stable kernel triggering the regression also in Ubuntu jammy. [1] https://lists.linaro.org/archives/list/[email protected]/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/ [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460 [3] https://lore.kernel.org/all/[email protected]/ dmesg: kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: #PF: supervisor instruction fetch in kernel mode kernel: #PF: error_code(0x0010) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0010 [#1] SMP PTI kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic #157-Ubuntu kernel: Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.27.0.SR.1 for D3417-B2x 06/10/2020 kernel: RIP: 0010:0x0 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206 kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001 kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800 kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050 kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00 kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400 kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: <TASK> kernel: mempool_alloc+0x61/0x1b0 kernel: ? __kmalloc+0x179/0x330 kernel: bio_alloc_bioset+0x9d/0x370 kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10] kernel: bio_clone_fast+0x1f/0x90 kernel: md_account_bio+0x42/0x80 kernel: raid10_handle_discard+0x56f/0x6b0 [raid10] kernel: raid10_make_request+0x147/0x180 [raid10] kernel: md_handle_request+0x12a/0x1b0 kernel: ? submit_bio_checks+0x1a5/0x580 kernel: md_submit_bio+0x76/0xc0 kernel: __submit_bio+0x1a2/0x220 kernel: ? mempool_alloc_slab+0x17/0x20 kernel: ? mempool_alloc+0x61/0x1b0 kernel: ? schedule_timeout+0x91/0x140 kernel: __submit_bio_noacct+0x85/0x200 kernel: submit_bio_noacct+0x4e/0x120 kernel: ? __cond_resched+0x1a/0x60 kernel: submit_bio+0x4a/0x130 kernel: submit_bio_wait+0x5a/0xc0 kernel: blkdev_issue_discard+0x7e/0xd0 kernel: ext4_try_to_trim_range+0x2db/0x520 kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0 kernel: ext4_trim_fs+0x313/0x510 kernel: __ext4_ioctl+0x82c/0xef0 kernel: ext4_ioctl+0xe/0x20 kernel: __x64_sys_ioctl+0x92/0xd0 kernel: x64_sys_call+0x1e5f/0x1fa0 kernel: do_syscall_64+0x56/0xb0 kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6 kernel: RIP: 0033:0x7f6fffc0994f kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 > kernel: RSP: 002b:00007ffdce979c30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: RAX: ffffffffffffffda RBX: 00007ffdce979d80 RCX: 00007f6fffc0994f kernel: RDX: 00007ffdce979ca0 RSI: 00000000c0185879 RDI: 0000000000000003 kernel: RBP: 0000558436acccb0 R08: 0000558436acccb0 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 kernel: R13: 0000558436accfa0 R14: 0000558436acce80 R15: 0000558436acce80 kernel: </TASK> kernel: Modules linked in: tls tcp_diag udp_diag inet_diag bridge stp llc nft_counter nft_chain_nat nf_nat > kernel: xhci_pci_renesas wmi video kernel: CR2: 0000000000000000 kernel: ---[ end trace db9334d27f904581 ]--- kernel: RIP: 0010:0x0 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206 kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001 kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800 kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050 kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00 kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400 kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: BUG: unable to handle page fault for address: ffffb57600000010 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2117395/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

