Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure
Hello, Andrew. On Tue, Jul 24, 2018 at 12:29:13PM -0700, Andrew Morton wrote: > How did you make this happen, btw? Fault injection, or did a small > GFP_KERNEL allocation fail? We have a group of machines which are pushing memory really hard and this actually triggered in prod on several of them. Thanks. -- tejun
Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure
Hello, Andrew. On Tue, Jul 24, 2018 at 12:29:13PM -0700, Andrew Morton wrote: > How did you make this happen, btw? Fault injection, or did a small > GFP_KERNEL allocation fail? We have a group of machines which are pushing memory really hard and this actually triggered in prod on several of them. Thanks. -- tejun
Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure
On Tue, 24 Jul 2018 10:55:42 -0700 Tejun Heo wrote: > While forking, if delayacct init fails due to memory shortage, it > continues expecting all delayacct users to check task->delays pointer > against NULL before dereferencing it, which all of them used to do. > > c96f5471ce7d ("delayacct: Account blkio completion on the correct > task"), while updating delayacct_blkio_end() to take the target task > instead of always using %current, made the function test NULL on > %current->delays and then continue to operated on @p->delays. If > %current succeeded init while @p didn't, it leads to the following > crash. > lgtm. How did you make this happen, btw? Fault injection, or did a small GFP_KERNEL allocation fail?
Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure
On Tue, 24 Jul 2018 10:55:42 -0700 Tejun Heo wrote: > While forking, if delayacct init fails due to memory shortage, it > continues expecting all delayacct users to check task->delays pointer > against NULL before dereferencing it, which all of them used to do. > > c96f5471ce7d ("delayacct: Account blkio completion on the correct > task"), while updating delayacct_blkio_end() to take the target task > instead of always using %current, made the function test NULL on > %current->delays and then continue to operated on @p->delays. If > %current succeeded init while @p didn't, it leads to the following > crash. > lgtm. How did you make this happen, btw? Fault injection, or did a small GFP_KERNEL allocation fail?
[PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure
While forking, if delayacct init fails due to memory shortage, it continues expecting all delayacct users to check task->delays pointer against NULL before dereferencing it, which all of them used to do. c96f5471ce7d ("delayacct: Account blkio completion on the correct task"), while updating delayacct_blkio_end() to take the target task instead of always using %current, made the function test NULL on %current->delays and then continue to operated on @p->delays. If %current succeeded init while @p didn't, it leads to the following crash. BUG: unable to handle kernel NULL pointer dereference at 0004 IP: __delayacct_blkio_end+0xc/0x40 PGD 801fd07e1067 P4D 801fd07e1067 PUD 1fcffbb067 PMD 0 Oops: [#1] SMP PTI CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9 Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 08/17/2017 RIP: 0010:__delayacct_blkio_end+0xc/0x40 RSP: :881fff703bf8 EFLAGS: 00010086 RAX: 881f1ec8b800 RBX: 8804f735cd54 RCX: 881fff703cb0 RDX: 0002 RSI: 0003 RDI: RBP: R08: R09: 881fff703cc0 R10: 1000 R11: 881fd3f73d00 R12: 8804f735c600 R13: R14: 001d R15: 881fff703cb0 FS: 7f5003f7d700() GS:881fff70() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0004 CR3: 001f401a6006 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: try_to_wake_up+0x2c0/0x600 autoremove_wake_function+0xe/0x30 __wake_up_common+0x74/0x120 wake_up_page_bit+0x9c/0xe0 mpage_end_io+0x27/0x70 blk_update_request+0x78/0x2c0 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x20b/0x5f0 blk_mq_complete_request+0xa2/0x100 ata_scsi_qc_complete+0x79/0x400 ata_qc_complete_multiple+0x86/0xd0 ahci_handle_port_interrupt+0xc9/0x5c0 ahci_handle_port_intr+0x54/0xb0 ahci_single_level_irq_intr+0x3b/0x60 __handle_irq_event_percpu+0x43/0x190 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x2a/0x50 handle_edge_irq+0x80/0x1c0 handle_irq+0xaf/0x120 do_IRQ+0x41/0xc0 common_interrupt+0xf/0xf Fix it by updating delayacct_blkio_end() check @p->delays instead. Signed-off-by: Tejun Heo Reported-and-debugged-by: Dave Jones Cc: Josh Snyder Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task") Cc: sta...@vger.kernel.org # v4.15+ --- include/linux/delayacct.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index e6c0448ebcc7..31c865d1842e 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -124,7 +124,7 @@ static inline void delayacct_blkio_start(void) static inline void delayacct_blkio_end(struct task_struct *p) { - if (current->delays) + if (p->delays) __delayacct_blkio_end(p); delayacct_clear_flag(DELAYACCT_PF_BLKIO); }
[PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure
While forking, if delayacct init fails due to memory shortage, it continues expecting all delayacct users to check task->delays pointer against NULL before dereferencing it, which all of them used to do. c96f5471ce7d ("delayacct: Account blkio completion on the correct task"), while updating delayacct_blkio_end() to take the target task instead of always using %current, made the function test NULL on %current->delays and then continue to operated on @p->delays. If %current succeeded init while @p didn't, it leads to the following crash. BUG: unable to handle kernel NULL pointer dereference at 0004 IP: __delayacct_blkio_end+0xc/0x40 PGD 801fd07e1067 P4D 801fd07e1067 PUD 1fcffbb067 PMD 0 Oops: [#1] SMP PTI CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9 Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 08/17/2017 RIP: 0010:__delayacct_blkio_end+0xc/0x40 RSP: :881fff703bf8 EFLAGS: 00010086 RAX: 881f1ec8b800 RBX: 8804f735cd54 RCX: 881fff703cb0 RDX: 0002 RSI: 0003 RDI: RBP: R08: R09: 881fff703cc0 R10: 1000 R11: 881fd3f73d00 R12: 8804f735c600 R13: R14: 001d R15: 881fff703cb0 FS: 7f5003f7d700() GS:881fff70() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0004 CR3: 001f401a6006 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: try_to_wake_up+0x2c0/0x600 autoremove_wake_function+0xe/0x30 __wake_up_common+0x74/0x120 wake_up_page_bit+0x9c/0xe0 mpage_end_io+0x27/0x70 blk_update_request+0x78/0x2c0 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x20b/0x5f0 blk_mq_complete_request+0xa2/0x100 ata_scsi_qc_complete+0x79/0x400 ata_qc_complete_multiple+0x86/0xd0 ahci_handle_port_interrupt+0xc9/0x5c0 ahci_handle_port_intr+0x54/0xb0 ahci_single_level_irq_intr+0x3b/0x60 __handle_irq_event_percpu+0x43/0x190 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x2a/0x50 handle_edge_irq+0x80/0x1c0 handle_irq+0xaf/0x120 do_IRQ+0x41/0xc0 common_interrupt+0xf/0xf Fix it by updating delayacct_blkio_end() check @p->delays instead. Signed-off-by: Tejun Heo Reported-and-debugged-by: Dave Jones Cc: Josh Snyder Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task") Cc: sta...@vger.kernel.org # v4.15+ --- include/linux/delayacct.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index e6c0448ebcc7..31c865d1842e 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -124,7 +124,7 @@ static inline void delayacct_blkio_start(void) static inline void delayacct_blkio_end(struct task_struct *p) { - if (current->delays) + if (p->delays) __delayacct_blkio_end(p); delayacct_clear_flag(DELAYACCT_PF_BLKIO); }