[Bug 1929591] Re: MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault
Another one today on 5.8.0-55: [ OK ] Started Hostname Service. [ OK ] Started User Login Management. [ OK ] Started Docker Application Container Engine. Ubuntu 20.10 babylon ttyS0 babylon login: [ 43.284962] cloud-init[6278]: Cloud-init v. 21.2-3-g899bfaa9-0ubuntu2~20.10.1 running 'modules:config' at Thu, 17 Jun 2021 04:50:23 +. Up 43.23 seconds. [ 43.555449] cloud-init[6294]: Cloud-init v. 21.2-3-g899bfaa9-0ubuntu2~20.10.1 running 'modules:final' at Thu, 17 Jun 2021 04:50:23 +. Up 43.48 seconds. [ 43.59] cloud-init[6294]: Cloud-init v. 21.2-3-g899bfaa9-0ubuntu2~20.10.1 finished at Thu, 17 Jun 2021 04:50:23 +. Datasource DataSourceNone. Up 43.55 seconds [ 43.98] cloud-init[6294]: 2021-06-17 04:50:23,906 - cc_final_message.py[WARNING]: Used fallback datasource [470667.791418] BUG: stack guard page was hit at 6cd7c52c (stack is b38fb7cf..d2b542d2) [470667.791418] kernel stack overflow (double-fault): [#1] SMP NOPTI [470667.791418] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P OE 5.8.0-55-generic #62-Ubuntu [470667.791419] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F13h 04/23/2021 [470667.791419] RIP: 0010:slab_free_freelist_hook+0x35/0x120 [470667.791419] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16 [470667.791419] RSP: 0018:9b13808b3ff8 EFLAGS: 00010246 [470667.791420] RAX: 8c43bd86d9c0 RBX: 8c459b407800 RCX: 0001 [470667.791420] RDX: 9b13808b4040 RSI: 9b13808b4038 RDI: 8c459b407800 [470667.791420] RBP: 9b13808b4028 R08: 0001 R09: ae641900 [470667.791421] R10: 8c43bd86d1e0 R11: 0001 R12: 9b13808b4038 [470667.791421] R13: 9b13808b4040 R14: 8c43bd86d9c0 R15: 8c4585ea1070 [470667.791421] FS: () GS:8c459edc() knlGS: [470667.791421] CS: 0010 DS: ES: CR0: 80050033 [470667.791422] CR2: 9b13808b3fe8 CR3: 0007948a8000 CR4: 00340ee0 [470667.791422] Call Trace: [470667.791422] ? mempool_kfree+0xe/0x10 [470667.791422] ? kfree+0xb8/0x220 [470667.791422] ? mempool_kfree+0xe/0x10 [470667.791422] ? mempool_free+0x2f/0x80 [470667.791422] ? md_end_io+0x4b/0x70 [470667.791423] ? bio_endio+0xe6/0x150 [470667.791423] ? bio_chain_endio+0x2d/0x40 [470667.791423] ? md_end_io+0x5d/0x70 [470667.791423] ? bio_endio+0xe6/0x150 [470667.791423] ? bio_chain_endio+0x2d/0x40 [470667.791423] ? md_end_io+0x5d/0x70 [470667.791423] ? bio_endio+0xe6/0x150 [470667.791424] ? bio_chain_endio+0x2d/0x40 [470667.791424] ? md_end_io+0x5d/0x70 [470667.791424] ? bio_endio+0xe6/0x150 [470667.791424] ? bio_chain_endio+0x2d/0x40 [470667.791424] ? md_end_io+0x5d/0x70 [470667.791424] ? bio_endio+0xe6/0x150 [470667.791424] ? bio_chain_endio+0x2d/0x40 [470667.791424] ? md_end_io+0x5d/0x70 [470667.791425] ? bio_endio+0xe6/0x150 [470667.791425] ? bio_chain_endio+0x2d/0x40 [470667.791425] ? md_end_io+0x5d/0x70 [470667.791425] ? bio_endio+0xe6/0x150 [470667.791425] ? bio_chain_endio+0x2d/0x40 [470667.791425] ? md_end_io+0x5d/0x70 [470667.791425] ? bio_endio+0xe6/0x150 [470667.791425] ? bio_chain_endio+0x2d/0x40 [470667.791426] ? md_end_io+0x5d/0x70 [470667.791426] ? bio_endio+0xe6/0x150 [470667.791426] ? bio_chain_endio+0x2d/0x40 [470667.791426] ? md_end_io+0x5d/0x70 [470667.791426] ? bio_endio+0xe6/0x150 [470667.791426] ? bio_chain_endio+0x2d/0x40 [470667.791426] ? md_end_io+0x5d/0x70 [470667.791427] ? bio_endio+0xe6/0x150 [470667.791427] ? bio_chain_endio+0x2d/0x40 [470667.791427] ? md_end_io+0x5d/0x70 [470667.791427] ? bio_endio+0xe6/0x150 [470667.791427] ? bio_chain_endio+0x2d/0x40 [470667.791427] ? md_end_io+0x5d/0x70 [470667.791427] ? bio_endio+0xe6/0x150 [470667.791427] ? bio_chain_endio+0x2d/0x40 [470667.791428] ? md_end_io+0x5d/0x70 [470667.791428] ? bio_endio+0xe6/0x150 [470667.791428] ? bio_chain_endio+0x2d/0x40 [470667.791428] ? md_end_io+0x5d/0x70 [470667.791428] ? bio_endio+0xe6/0x150 [470667.791428] ? bio_chain_endio+0x2d/0x40 [470667.791428] ? md_end_io+0x5d/0x70 [470667.791429] ? bio_endio+0xe6/0x150 [470667.791429] ? bio_chain_endio+0x2d/0x40 [470667.791429] ? md_end_io+0x5d/0x70 [470667.791429] ? bio_endio+0xe6/0x150 [470667.791429] ? bio_chain_endio+0x2d/0x40 [470667.791429] ? md_end_io+0x5d/0x70 [470667.791429] ? bio_endio+0xe6/0x150 [470667.791429] ? bio_chain_endio+0x2d/0x40 [470667.791430] ? md_end_io+0x5d/0x70 [470667.791430] ? bio_endio+0xe6/0x150 [470667.791430] ? bio_chain_endio+0x2d/0x40 [470667.791430] ? md_end_io+0x5d/0x70 [470667.791430] ? bio_endio+0xe6/0x150 [470667.791430] ? bio_chain_endio+0x2d/0x40 [470667.791430] ? md_end_io+0x5d/0x70 [470667.791430] ? bio_endio+0xe6/0x150 [470667.791431] ? bio_chain
[Bug 1929591] Re: MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault
Hey so this is totally still happening on kernel 5.8.0-53. Just got this serial console capture: babylon login: [1457468.880947] BUG: stack guard page was hit at 7aef1a4a (stack is af9c61cd..7ccda653) [1457468.880948] kernel stack overflow (double-fault): [#1] SMP NOPTI [1457468.880948] CPU: 3 PID: 512 Comm: md0_raid6 Tainted: P OE 5.8.0-53-generic #60-Ubuntu [1457468.880949] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F13h 04/23/2021 [1457468.880949] RIP: 0010:slab_free_freelist_hook+0x35/0x120 [1457468.880950] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16 [1457468.880951] RSP: 0018:bcda805efff8 EFLAGS: 00010246 [1457468.880952] RAX: 9bfb8ccc42a0 RBX: 9bfcdb407800 RCX: 0001 [1457468.880952] RDX: bcda805f0040 RSI: bcda805f0038 RDI: 9bfcdb407800 [1457468.880953] RBP: bcda805f0028 R08: 0001 R09: 90841600 [1457468.880953] R10: 9bfb8ccc4f40 R11: 0001 R12: bcda805f0038 [1457468.880953] R13: bcda805f0040 R14: 9bfb8ccc42a0 R15: 9bf766967940 [1457468.880954] FS: () GS:9bfcdeac() knlGS: [1457468.880954] CS: 0010 DS: ES: CR0: 80050033 [1457468.880955] CR2: bcda805effe8 CR3: 0003a65ee000 CR4: 00340ee0 [1457468.880955] Call Trace: [1457468.880955] ? mempool_kfree+0xe/0x10 [1457468.880956] ? kfree+0xb8/0x220 [1457468.880956] ? mempool_kfree+0xe/0x10 [1457468.880956] ? mempool_free+0x2f/0x80 [1457468.880956] ? md_end_io+0x4b/0x70 [1457468.880957] ? bio_endio+0xe6/0x150 [1457468.880957] ? bio_chain_endio+0x2d/0x40 [1457468.880957] ? md_end_io+0x5d/0x70 [1457468.880958] ? bio_endio+0xe6/0x150 [1457468.880958] ? bio_chain_endio+0x2d/0x40 [1457468.880958] ? md_end_io+0x5d/0x70 [1457468.880959] ? bio_endio+0xe6/0x150 [1457468.880959] ? bio_chain_endio+0x2d/0x40 [1457468.880959] ? md_end_io+0x5d/0x70 [1457468.880959] ? bio_endio+0xe6/0x150 [1457468.880960] ? bio_chain_endio+0x2d/0x40 [1457468.880960] ? md_end_io+0x5d/0x70 [1457468.880960] ? bio_endio+0xe6/0x150 [1457468.880960] ? bio_chain_endio+0x2d/0x40 [1457468.880961] ? md_end_io+0x5d/0x70 [1457468.880961] ? bio_endio+0xe6/0x150 [1457468.880961] ? bio_chain_endio+0x2d/0x40 [1457468.880962] ? md_end_io+0x5d/0x70 [1457468.880962] ? bio_endio+0xe6/0x150 [1457468.880962] ? bio_chain_endio+0x2d/0x40 [1457468.880962] ? md_end_io+0x5d/0x70 [1457468.880963] ? bio_endio+0xe6/0x150 [1457468.880963] ? bio_chain_endio+0x2d/0x40 [1457468.880963] ? md_end_io+0x5d/0x70 [1457468.880963] ? bio_endio+0xe6/0x150 [1457468.880964] ? bio_chain_endio+0x2d/0x40 [1457468.880964] ? md_end_io+0x5d/0x70 [1457468.880964] ? bio_endio+0xe6/0x150 [1457468.880965] ? bio_chain_endio+0x2d/0x40 [1457468.880965] ? md_end_io+0x5d/0x70 [1457468.880965] ? bio_endio+0xe6/0x150 [1457468.880965] ? bio_chain_endio+0x2d/0x40 [1457468.880966] ? md_end_io+0x5d/0x70 [1457468.880966] ? bio_endio+0xe6/0x150 [1457468.880966] ? bio_chain_endio+0x2d/0x40 [1457468.880966] ? md_end_io+0x5d/0x70 [1457468.880967] ? bio_endio+0xe6/0x150 [1457468.880967] ? bio_chain_endio+0x2d/0x40 [1457468.880967] ? md_end_io+0x5d/0x70 [1457468.880968] ? bio_endio+0xe6/0x150 [1457468.880968] ? bio_chain_endio+0x2d/0x40 [1457468.880968] ? md_end_io+0x5d/0x70 [1457468.880968] ? bio_endio+0xe6/0x150 [1457468.880969] ? bio_chain_endio+0x2d/0x40 [1457468.880969] ? md_end_io+0x5d/0x70 [1457468.880969] ? bio_endio+0xe6/0x150 [1457468.880969] ? bio_chain_endio+0x2d/0x40 [1457468.880970] ? md_end_io+0x5d/0x70 [1457468.880970] ? bio_endio+0xe6/0x150 [1457468.880970] ? bio_chain_endio+0x2d/0x40 [1457468.880971] ? md_end_io+0x5d/0x70 [1457468.880971] ? bio_endio+0xe6/0x150 [1457468.880971] ? bio_chain_endio+0x2d/0x40 [1457468.880971] ? md_end_io+0x5d/0x70 [1457468.880972] ? bio_endio+0xe6/0x150 [1457468.880972] ? bio_chain_endio+0x2d/0x40 [1457468.880972] ? md_end_io+0x5d/0x70 [1457468.880972] ? bio_endio+0xe6/0x150 [1457468.880973] ? bio_chain_endio+0x2d/0x40 [1457468.880973] ? md_end_io+0x5d/0x70 [1457468.880973] ? bio_endio+0xe6/0x150 [1457468.880973] ? bio_chain_endio+0x2d/0x40 [1457468.880974] ? md_end_io+0x5d/0x70 [1457468.880974] ? bio_endio+0xe6/0x150 [1457468.880974] ? bio_chain_endio+0x2d/0x40 [1457468.880975] ? md_end_io+0x5d/0x70 [1457468.880975] ? bio_endio+0xe6/0x150 [1457468.880975] ? bio_chain_endio+0x2d/0x40 [1457468.880975] ? md_end_io+0x5d/0x70 [1457468.880976] ? bio_endio+0xe6/0x150 [1457468.880976] ? bio_chain_endio+0x2d/0x40 [1457468.880976] ? md_end_io+0x5d/0x70 [1457468.880976] ? bio_endio+0xe6/0x150 [1457468.880977] ? bio_chain_endio+0x2d/0x40 [1457468.880977] ? md_end_io+0x5d/0x70 [1457468.880977] ? bio_endio+0xe6/0x150 [1457468.88097
[Bug 1929591] [NEW] MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault
Public bug reported: Hello: Every few days I get a kernel panic on my Ubuntu Server 20.10 box, which was recently upgraded to a Ryzen 3700X. I have 7 WD Red Pro HDDs in a RAID 6 array with Linux MD, and they're all attached to a LSI 9211-8ik PCIe card. Motherboard is currently a Gigabyte B550M Aorus Pro. My Ubuntu install is running the latest 5.8.0-53 kernel. This is the 2nd hardware configuration with the exact same kernel panic text. Previously I had these HDDs directly connected to the SATA controller of a ASRock X570 Pro4 ATX mobo with the same 3700X. I was also previously using Ubuntu Server 20.04 LTS -- I had upgraded to 20.10 in hopes that the newer kernel would fix it, which it did not. I had posted a whole story on StackOverflow about this journey if you're interested: https://superuser.com/questions/1615400/md-raid-6-periodic-kernel-panic-possible-kernel-bug However, I am now convinced this is a Linux kernel bug in the MD driver. Example 1 kernel panic: [406005.583315] BUG: stack guard page was hit at 7cbff150 (stack is 3b7072a2..dac5ed08) [406005.583315] kernel stack overflow (double-fault): [#1] SMP NOPTI [406005.583315] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P OE 5.8.0-36-generic #40-Ubuntu [406005.583316] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F1 05/19/2020 [406005.583316] RIP: 0010:slab_free_freelist_hook+0x35/0x120 [406005.583316] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16 [406005.583316] RSP: 0018:a620c06e3ff8 EFLAGS: 00010246 [406005.583317] RAX: 9aaf36f54720 RBX: 9ab34b407800 RCX: 0001 [406005.583317] RDX: a620c06e4040 RSI: a620c06e4038 RDI: 9ab34b407800 [406005.583317] RBP: a620c06e4028 R08: 0001 R09: b9c54500 [406005.583318] R10: 9aaf36f54fe0 R11: 0001 R12: a620c06e4038 [406005.583318] R13: a620c06e4040 R14: 9aaf36f54720 R15: 9ab2925cbd10 [406005.583318] FS: () GS:9ab34edc() knlGS: [406005.583318] CS: 0010 DS: ES: CR0: 80050033 [406005.583318] CR2: a620c06e3fe8 CR3: 0005d52ac000 CR4: 00340ee0 [406005.583319] Call Trace: [406005.583319] ? mempool_kfree+0xe/0x10 [406005.583319] ? kfree+0xb8/0x220 [406005.583319] ? mempool_kfree+0xe/0x10 [406005.583319] ? mempool_free+0x2f/0x80 [406005.583319] ? md_end_io+0x4b/0x70 [406005.583319] ? bio_endio+0xe6/0x150 Example 2 kernel panic with old mobo: [161342.301305] BUG: stack guard page was hit at fc60f228 (stack is 875efe77..3f38a379) [161342.301306] kernel stack overflow (double-fault): [#1] SMP NOPTI [161342.301306] CPU: 10 PID: 465 Comm: md0_raid6 Tainted: P OE 5.8.0-33-generic #36-Ubuntu [161342.301307] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.60 12/01/2020 [161342.301307] RIP: 0010:slab_free_freelist_hook+0x35/0x120 [161342.301308] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16 [161342.301308] RSP: 0018:a86b00c6fff8 EFLAGS: 00010246 [161342.301309] RAX: 98edc21cac40 RBX: 98ef0b407800 RCX: 0001 [161342.301310] RDX: a86b00c70040 RSI: a86b00c70038 RDI: 98ef0b407800 [161342.301310] RBP: a86b00c70028 R08: 0001 R09: 85854500 [161342.301311] R10: 98edc21ca100 R11: 0001 R12: a86b00c70038 [161342.301311] R13: a86b00c70040 R14: 98edc21cac40 R15: 98e9b53d74d8 [161342.301311] FS: () GS:98ef0ec8() knlGS: [161342.301312] CS: 0010 DS: ES: CR0: 80050033 [161342.301312] CR2: a86b00c6ffe8 CR3: 0007fa766000 CR4: 00340ee0 [161342.301312] Call Trace: [161342.301313] ? mempool_kfree+0xe/0x10 [161342.301313] ? kfree+0xb8/0x220 [161342.301313] ? mempool_kfree+0xe/0x10 [161342.301313] ? mempool_free+0x2f/0x80 [161342.301314] ? md_end_io+0x4b/0x70 [161342.301314] ? bio_endio+0xe6/0x150 [161342.301314] ? bio_chain_endio+0x2d/0x40 [161342.301315] ? md_end_io+0x5d/0x70 [161342.301315] ? bio_endio+0xe6/0x150 [161342.301315] ? bio_chain_endio+0x2d/0x40 [161342.301315] ? md_end_io+0x5d/0x70 [161342.301316] ? bio_endio+0xe6/0x150 [161342.301316] ? bio_chain_endio+0x2d/0x40 [161342.301316] ? md_end_io+0x5d/0x70 [161342.301316] ? bio_endio+0xe6/0x150 [161342.301317] ? bio_chain_endio+0x2d/0x40 [161342.301317] ? md_end_io+0x5d/0x70 [161342.301317] ? bio_endio+0xe6/0x150 [161342.301317] ? bio_chain_endio+0x2d/0x40 ... [161342.301379] ? md_end_io+0x5d/0x70 [161342.301379] ? bio_endio+0xe6