Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure

2018-07-24 Thread Tejun Heo
Hello, Andrew.

On Tue, Jul 24, 2018 at 12:29:13PM -0700, Andrew Morton wrote:
> How did you make this happen, btw?  Fault injection, or did a small
> GFP_KERNEL allocation fail?

We have a group of machines which are pushing memory really hard and
this actually triggered in prod on several of them.

Thanks.

-- 
tejun


Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure

2018-07-24 Thread Tejun Heo
Hello, Andrew.

On Tue, Jul 24, 2018 at 12:29:13PM -0700, Andrew Morton wrote:
> How did you make this happen, btw?  Fault injection, or did a small
> GFP_KERNEL allocation fail?

We have a group of machines which are pushing memory really hard and
this actually triggered in prod on several of them.

Thanks.

-- 
tejun


Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure

2018-07-24 Thread Andrew Morton
On Tue, 24 Jul 2018 10:55:42 -0700 Tejun Heo  wrote:

> While forking, if delayacct init fails due to memory shortage, it
> continues expecting all delayacct users to check task->delays pointer
> against NULL before dereferencing it, which all of them used to do.
> 
> c96f5471ce7d ("delayacct: Account blkio completion on the correct
> task"), while updating delayacct_blkio_end() to take the target task
> instead of always using %current, made the function test NULL on
> %current->delays and then continue to operated on @p->delays.  If
> %current succeeded init while @p didn't, it leads to the following
> crash.
> 

lgtm.

How did you make this happen, btw?  Fault injection, or did a small
GFP_KERNEL allocation fail?



Re: [PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure

2018-07-24 Thread Andrew Morton
On Tue, 24 Jul 2018 10:55:42 -0700 Tejun Heo  wrote:

> While forking, if delayacct init fails due to memory shortage, it
> continues expecting all delayacct users to check task->delays pointer
> against NULL before dereferencing it, which all of them used to do.
> 
> c96f5471ce7d ("delayacct: Account blkio completion on the correct
> task"), while updating delayacct_blkio_end() to take the target task
> instead of always using %current, made the function test NULL on
> %current->delays and then continue to operated on @p->delays.  If
> %current succeeded init while @p didn't, it leads to the following
> crash.
> 

lgtm.

How did you make this happen, btw?  Fault injection, or did a small
GFP_KERNEL allocation fail?



[PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure

2018-07-24 Thread Tejun Heo
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.

c96f5471ce7d ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays.  If
%current succeeded init while @p didn't, it leads to the following
crash.

 BUG: unable to handle kernel NULL pointer dereference at 0004
 IP: __delayacct_blkio_end+0xc/0x40
 PGD 801fd07e1067 P4D 801fd07e1067 PUD 1fcffbb067 PMD 0 
 Oops:  [#1] SMP PTI
 CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 
4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
 Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 
08/17/2017
 RIP: 0010:__delayacct_blkio_end+0xc/0x40
 RSP: :881fff703bf8 EFLAGS: 00010086
 RAX: 881f1ec8b800 RBX: 8804f735cd54 RCX: 881fff703cb0
 RDX: 0002 RSI: 0003 RDI: 
 RBP:  R08:  R09: 881fff703cc0
 R10: 1000 R11: 881fd3f73d00 R12: 8804f735c600
 R13:  R14: 001d R15: 881fff703cb0
 FS:  7f5003f7d700() GS:881fff70() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0004 CR3: 001f401a6006 CR4: 003606e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  
  try_to_wake_up+0x2c0/0x600
  autoremove_wake_function+0xe/0x30
  __wake_up_common+0x74/0x120
  wake_up_page_bit+0x9c/0xe0
  mpage_end_io+0x27/0x70
  blk_update_request+0x78/0x2c0
  scsi_end_request+0x2c/0x1e0
  scsi_io_completion+0x20b/0x5f0
  blk_mq_complete_request+0xa2/0x100
  ata_scsi_qc_complete+0x79/0x400
  ata_qc_complete_multiple+0x86/0xd0
  ahci_handle_port_interrupt+0xc9/0x5c0
  ahci_handle_port_intr+0x54/0xb0
  ahci_single_level_irq_intr+0x3b/0x60
  __handle_irq_event_percpu+0x43/0x190
  handle_irq_event_percpu+0x20/0x50
  handle_irq_event+0x2a/0x50
  handle_edge_irq+0x80/0x1c0
  handle_irq+0xaf/0x120
  do_IRQ+0x41/0xc0
  common_interrupt+0xf/0xf
  

Fix it by updating delayacct_blkio_end() check @p->delays instead.

Signed-off-by: Tejun Heo 
Reported-and-debugged-by: Dave Jones 
Cc: Josh Snyder 
Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task")
Cc: sta...@vger.kernel.org # v4.15+
---
 include/linux/delayacct.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h
index e6c0448ebcc7..31c865d1842e 100644
--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -124,7 +124,7 @@ static inline void delayacct_blkio_start(void)
 
 static inline void delayacct_blkio_end(struct task_struct *p)
 {
-   if (current->delays)
+   if (p->delays)
__delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
 }


[PATCH] delayacct: Fix crash in delayacct_blkio_end() after delayacct init failure

2018-07-24 Thread Tejun Heo
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.

c96f5471ce7d ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays.  If
%current succeeded init while @p didn't, it leads to the following
crash.

 BUG: unable to handle kernel NULL pointer dereference at 0004
 IP: __delayacct_blkio_end+0xc/0x40
 PGD 801fd07e1067 P4D 801fd07e1067 PUD 1fcffbb067 PMD 0 
 Oops:  [#1] SMP PTI
 CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 
4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
 Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 
08/17/2017
 RIP: 0010:__delayacct_blkio_end+0xc/0x40
 RSP: :881fff703bf8 EFLAGS: 00010086
 RAX: 881f1ec8b800 RBX: 8804f735cd54 RCX: 881fff703cb0
 RDX: 0002 RSI: 0003 RDI: 
 RBP:  R08:  R09: 881fff703cc0
 R10: 1000 R11: 881fd3f73d00 R12: 8804f735c600
 R13:  R14: 001d R15: 881fff703cb0
 FS:  7f5003f7d700() GS:881fff70() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0004 CR3: 001f401a6006 CR4: 003606e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  
  try_to_wake_up+0x2c0/0x600
  autoremove_wake_function+0xe/0x30
  __wake_up_common+0x74/0x120
  wake_up_page_bit+0x9c/0xe0
  mpage_end_io+0x27/0x70
  blk_update_request+0x78/0x2c0
  scsi_end_request+0x2c/0x1e0
  scsi_io_completion+0x20b/0x5f0
  blk_mq_complete_request+0xa2/0x100
  ata_scsi_qc_complete+0x79/0x400
  ata_qc_complete_multiple+0x86/0xd0
  ahci_handle_port_interrupt+0xc9/0x5c0
  ahci_handle_port_intr+0x54/0xb0
  ahci_single_level_irq_intr+0x3b/0x60
  __handle_irq_event_percpu+0x43/0x190
  handle_irq_event_percpu+0x20/0x50
  handle_irq_event+0x2a/0x50
  handle_edge_irq+0x80/0x1c0
  handle_irq+0xaf/0x120
  do_IRQ+0x41/0xc0
  common_interrupt+0xf/0xf
  

Fix it by updating delayacct_blkio_end() check @p->delays instead.

Signed-off-by: Tejun Heo 
Reported-and-debugged-by: Dave Jones 
Cc: Josh Snyder 
Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task")
Cc: sta...@vger.kernel.org # v4.15+
---
 include/linux/delayacct.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h
index e6c0448ebcc7..31c865d1842e 100644
--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -124,7 +124,7 @@ static inline void delayacct_blkio_start(void)
 
 static inline void delayacct_blkio_end(struct task_struct *p)
 {
-   if (current->delays)
+   if (p->delays)
__delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
 }