Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2018-12-18 Thread Alan Adamson

Hi,

Was this issue ever resolved?  We are seeing this on 4.14.35.

Thanks,

Alan Adamson

On 5/9/18 3:27 PM, Ilkka Sovanto wrote:

Hi again, Chris and others!

We're hitting this again on 4.14.32.

Looks like the path iscsi_queuecommand -> prepd_reject/prepd_fault
results in iscsi_complete_task getting called with frwd_lock held
instead of the requisite back_lock which might explain the task list
corruption under heavy loads.

An alternate possibly problematic situation seems to be
fail_mgmt_tasks which calls iscsi_complete_task while holding
frwd_lock from iscsi_start_session_recovery.

See also: https://github.com/open-iscsi/open-iscsi/issues/56

Attached an ugly attempt at ensuring back_lock is held instead.

  - Ilkka

On 5 March 2017 at 22:49, Ilkka Sovanto  wrote:

Hi!

Was running pretty nicely for a week or so, until we got this one:

[534630.679965] BUG: unable to handle kernel at 0078
[534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0
[534630.685724] PGD a1fcd3067
[534630.687346] Oops: 0002 [#1]
[534630.688846] Modules linked in:
[534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW
[534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014
[534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker
[534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti:
880c0d9d8000
[534630.700837] RIP: 0010:[] []
iscsi_xmit_task+0x29/0xc0
[534630.703689] RSP: 0018:880c0d9dbdb0  EFLAGS: 00010246
[534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX:
dead0200
[534630.708062] RDX:  RSI: 0200 RDI:
880fe3dfda10
[534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09:

[534630.713539] R10: 880fb799f710 R11: 0260 R12:
880fe3dfda10
[534630.716344] R13:  R14: 880fe3881c80 R15:
880fe3dfdae0
[534630.719143] FS:  () GS:880fff42()
[534630.722001] CS:  0010 DS:  ES:  CR0: 80050033
[534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4:
000406e0
[534630.726163] Stack:
[534630.727622]  880fe3dfdaa8 880fe3dfda10 880fe3dfdac0
880c0d9dbe18
[534630.730906]  8171226d 880fe3dfdad0 880fe3881c00
880fe3dfdab0
[534630.767489]  880ff7d98780 880fe3dfdae0 880ffec99400
880fe3dfdae8

[534630.770543] Call Trace:
[534630.771967]  [] iscsi_xmitworker+0x1dd/0x310
[534630.773523]  [] process_one_work+0x149/0x3e0
[534630.775084]  [] worker_thread+0x69/0x470
[534630.776585]  [] ? process_one_work+0x3e0/0x3e0
[534630.778131]  [] ? process_one_work+0x3e0/0x3e0
[534630.779682]  [] kthread+0xea/0x100
[534630.781300]  [] ? kthread_create_on_node+0x1a0/0x1a0
[534630.782872]  [] ret_from_fork+0x3f/0x70
[534630.784340]  [] ? kthread_create_on_node+0x1a0/0x1a0
[534630.785885] Code:
[534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0
[534630.794266]  RSP 
[534630.795617] CR2: 0078
[534630.798051] ---[ end trace 9d963f95212dfbe2 ]---
[534630.799788] Kernel panic - not syncing: Fatal exception in interrupt
[534630.802712] Kernel Offset: disabled
[534630.804304] ---[ end Kernel panic - not syncing: Fatal exception
in interrupt

And here's the disassembly for iscsi_xmit_task:

8170efa0 :
8170efa0:   e8 0b fd 52 00  call
81c3ecb0 <__fentry__>
8170efa5:   55  push   rbp
8170efa6:   b8 c3 ff ff ff  moveax,0xffc3
8170efab:   48 89 e5movrbp,rsp
8170efae:   41 55   push   r13
8170efb0:   41 54   push   r12
8170efb2:   53  push   rbx
8170efb3:   48 8b 97 f0 00 00 00movrdx,QWORD PTR [rdi+0xf0]
8170efba:   4c 8b af 90 00 00 00movr13,QWORD PTR [rdi+0x90]
8170efc1:   83 e2 02andedx,0x2
8170efc4:   75 71   jne
8170f037 
8170efc6:   48 89 fbmovrbx,rdi
***
8170efc9:   f0 41 ff 45 78  lock inc DWORD PTR [r13+0x78]
***
8170efce:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
8170efd2:   48 8d b8 18 01 00 00leardi,[rax+0x118]
8170efd9:   e8 b2 d3 52 00  call
81c3c390 <_raw_spin_unlock_bh>
8170efde:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170efe2:   4c 89 efmovrdi,r13
8170efe5:   48 8b 80 00 01 00 00movrax,QWORD PTR [rax+0x100]
8170efec:   ff 90 98 00 00 00   call   QWORD PTR [rax+0x98]
8170eff2:   41 89 c4movr12d,eax
8170eff5:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170eff9:   48 8d b8 18 01 00 00leardi,[rax+0x11

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2018-05-09 Thread Ilkka Sovanto
Hi again, Chris and others!

We're hitting this again on 4.14.32.

Looks like the path iscsi_queuecommand -> prepd_reject/prepd_fault
results in iscsi_complete_task getting called with frwd_lock held
instead of the requisite back_lock which might explain the task list
corruption under heavy loads.

An alternate possibly problematic situation seems to be
fail_mgmt_tasks which calls iscsi_complete_task while holding
frwd_lock from iscsi_start_session_recovery.

See also: https://github.com/open-iscsi/open-iscsi/issues/56

Attached an ugly attempt at ensuring back_lock is held instead.

 - Ilkka

On 5 March 2017 at 22:49, Ilkka Sovanto  wrote:
> Hi!
>
> Was running pretty nicely for a week or so, until we got this one:
>
> [534630.679965] BUG: unable to handle kernel at 0078
> [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0
> [534630.685724] PGD a1fcd3067
> [534630.687346] Oops: 0002 [#1]
> [534630.688846] Modules linked in:
> [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW
> [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014
> [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker
> [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti:
> 880c0d9d8000
> [534630.700837] RIP: 0010:[] []
> iscsi_xmit_task+0x29/0xc0
> [534630.703689] RSP: 0018:880c0d9dbdb0  EFLAGS: 00010246
> [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX:
> dead0200
> [534630.708062] RDX:  RSI: 0200 RDI:
> 880fe3dfda10
> [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09:
> 
> [534630.713539] R10: 880fb799f710 R11: 0260 R12:
> 880fe3dfda10
> [534630.716344] R13:  R14: 880fe3881c80 R15:
> 880fe3dfdae0
> [534630.719143] FS:  () GS:880fff42()
> [534630.722001] CS:  0010 DS:  ES:  CR0: 80050033
> [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4:
> 000406e0
> [534630.726163] Stack:
> [534630.727622]  880fe3dfdaa8 880fe3dfda10 880fe3dfdac0
> 880c0d9dbe18
> [534630.730906]  8171226d 880fe3dfdad0 880fe3881c00
> 880fe3dfdab0
> [534630.767489]  880ff7d98780 880fe3dfdae0 880ffec99400
> 880fe3dfdae8
>
> [534630.770543] Call Trace:
> [534630.771967]  [] iscsi_xmitworker+0x1dd/0x310
> [534630.773523]  [] process_one_work+0x149/0x3e0
> [534630.775084]  [] worker_thread+0x69/0x470
> [534630.776585]  [] ? process_one_work+0x3e0/0x3e0
> [534630.778131]  [] ? process_one_work+0x3e0/0x3e0
> [534630.779682]  [] kthread+0xea/0x100
> [534630.781300]  [] ? kthread_create_on_node+0x1a0/0x1a0
> [534630.782872]  [] ret_from_fork+0x3f/0x70
> [534630.784340]  [] ? kthread_create_on_node+0x1a0/0x1a0
> [534630.785885] Code:
> [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0
> [534630.794266]  RSP 
> [534630.795617] CR2: 0078
> [534630.798051] ---[ end trace 9d963f95212dfbe2 ]---
> [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt
> [534630.802712] Kernel Offset: disabled
> [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception
> in interrupt
>
> And here's the disassembly for iscsi_xmit_task:
>
> 8170efa0 :
> 8170efa0:   e8 0b fd 52 00  call
> 81c3ecb0 <__fentry__>
> 8170efa5:   55  push   rbp
> 8170efa6:   b8 c3 ff ff ff  moveax,0xffc3
> 8170efab:   48 89 e5movrbp,rsp
> 8170efae:   41 55   push   r13
> 8170efb0:   41 54   push   r12
> 8170efb2:   53  push   rbx
> 8170efb3:   48 8b 97 f0 00 00 00movrdx,QWORD PTR 
> [rdi+0xf0]
> 8170efba:   4c 8b af 90 00 00 00movr13,QWORD PTR 
> [rdi+0x90]
> 8170efc1:   83 e2 02andedx,0x2
> 8170efc4:   75 71   jne
> 8170f037 
> 8170efc6:   48 89 fbmovrbx,rdi
> ***
> 8170efc9:   f0 41 ff 45 78  lock inc DWORD PTR [r13+0x78]
> ***
> 8170efce:   48 8b 47 10 movrax,QWORD PTR 
> [rdi+0x10]
> 8170efd2:   48 8d b8 18 01 00 00leardi,[rax+0x118]
> 8170efd9:   e8 b2 d3 52 00  call
> 81c3c390 <_raw_spin_unlock_bh>
> 8170efde:   48 8b 43 10 movrax,QWORD PTR 
> [rbx+0x10]
> 8170efe2:   4c 89 efmovrdi,r13
> 8170efe5:   48 8b 80 00 01 00 00movrax,QWORD PTR 
> [rax+0x100]
> 8170efec:   ff 90 98 00 00 00   call   QWORD PTR [rax+0x98]
> 8170eff2:   41 89 c4movr12d,eax
> 8170eff5:   48 8b 43 10 movrax,QWORD PTR 
> [rbx+0x10]
> 

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-03-05 Thread Ilkka Sovanto
Hi!

Was running pretty nicely for a week or so, until we got this one:

[534630.679965] BUG: unable to handle kernel at 0078
[534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0
[534630.685724] PGD a1fcd3067
[534630.687346] Oops: 0002 [#1]
[534630.688846] Modules linked in:
[534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW
[534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014
[534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker
[534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti:
880c0d9d8000
[534630.700837] RIP: 0010:[] []
iscsi_xmit_task+0x29/0xc0
[534630.703689] RSP: 0018:880c0d9dbdb0  EFLAGS: 00010246
[534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX:
dead0200
[534630.708062] RDX:  RSI: 0200 RDI:
880fe3dfda10
[534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09:

[534630.713539] R10: 880fb799f710 R11: 0260 R12:
880fe3dfda10
[534630.716344] R13:  R14: 880fe3881c80 R15:
880fe3dfdae0
[534630.719143] FS:  () GS:880fff42()
[534630.722001] CS:  0010 DS:  ES:  CR0: 80050033
[534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4:
000406e0
[534630.726163] Stack:
[534630.727622]  880fe3dfdaa8 880fe3dfda10 880fe3dfdac0
880c0d9dbe18
[534630.730906]  8171226d 880fe3dfdad0 880fe3881c00
880fe3dfdab0
[534630.767489]  880ff7d98780 880fe3dfdae0 880ffec99400
880fe3dfdae8

[534630.770543] Call Trace:
[534630.771967]  [] iscsi_xmitworker+0x1dd/0x310
[534630.773523]  [] process_one_work+0x149/0x3e0
[534630.775084]  [] worker_thread+0x69/0x470
[534630.776585]  [] ? process_one_work+0x3e0/0x3e0
[534630.778131]  [] ? process_one_work+0x3e0/0x3e0
[534630.779682]  [] kthread+0xea/0x100
[534630.781300]  [] ? kthread_create_on_node+0x1a0/0x1a0
[534630.782872]  [] ret_from_fork+0x3f/0x70
[534630.784340]  [] ? kthread_create_on_node+0x1a0/0x1a0
[534630.785885] Code:
[534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0
[534630.794266]  RSP 
[534630.795617] CR2: 0078
[534630.798051] ---[ end trace 9d963f95212dfbe2 ]---
[534630.799788] Kernel panic - not syncing: Fatal exception in interrupt
[534630.802712] Kernel Offset: disabled
[534630.804304] ---[ end Kernel panic - not syncing: Fatal exception
in interrupt

And here's the disassembly for iscsi_xmit_task:

8170efa0 :
8170efa0:   e8 0b fd 52 00  call
81c3ecb0 <__fentry__>
8170efa5:   55  push   rbp
8170efa6:   b8 c3 ff ff ff  moveax,0xffc3
8170efab:   48 89 e5movrbp,rsp
8170efae:   41 55   push   r13
8170efb0:   41 54   push   r12
8170efb2:   53  push   rbx
8170efb3:   48 8b 97 f0 00 00 00movrdx,QWORD PTR [rdi+0xf0]
8170efba:   4c 8b af 90 00 00 00movr13,QWORD PTR [rdi+0x90]
8170efc1:   83 e2 02andedx,0x2
8170efc4:   75 71   jne
8170f037 
8170efc6:   48 89 fbmovrbx,rdi
***
8170efc9:   f0 41 ff 45 78  lock inc DWORD PTR [r13+0x78]
***
8170efce:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
8170efd2:   48 8d b8 18 01 00 00leardi,[rax+0x118]
8170efd9:   e8 b2 d3 52 00  call
81c3c390 <_raw_spin_unlock_bh>
8170efde:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170efe2:   4c 89 efmovrdi,r13
8170efe5:   48 8b 80 00 01 00 00movrax,QWORD PTR [rax+0x100]
8170efec:   ff 90 98 00 00 00   call   QWORD PTR [rax+0x98]
8170eff2:   41 89 c4movr12d,eax
8170eff5:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170eff9:   48 8d b8 18 01 00 00leardi,[rax+0x118]
8170f000:   e8 9b d4 52 00  call
81c3c4a0 <_raw_spin_lock_bh>
8170f005:   45 85 e4test   r12d,r12d
8170f008:   74 34   je
8170f03e 
8170f00a:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170f00e:   48 8d b8 1c 01 00 00leardi,[rax+0x11c]
8170f015:   e8 e6 d3 52 00  call
81c3c400 <_raw_spin_lock>
8170f01a:   4c 89 efmovrdi,r13
8170f01d:   e8 ce f5 ff ff  call
8170e5f0 <__iscsi_put_task>
8170f022:   48 8b 7b 10 movrdi,QWORD PTR [rbx+0x10]
8170f026:   48 81 c7 1c 01 00 00addrdi,0x11c
8170f02d: 

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-24 Thread Guilherme G. Piccoli
On 02/23/2017 07:25 PM, Chris Leech wrote:
> Yikes, my git-send-email settings suppressed the important CCs.  Sorry!
> 
> Guilherme and Ilkka, can you comment about your testing results or review 
> please?

Hi Chris, thanks for looping me. Patch seems nice, I do have some points
below, most nitpicks heheh
Feel free to accept or not the suggestions!

Also, you can add my:

Reviewed-by: Guilherme G. Piccoli 


> 
> - Chris Leech
> 
> - Original Message -
>> There's a rather long standing regression from commit
>> 659743b [SCSI] libiscsi: Reduce locking contention in fast path
>>

Perhaps worth to follow the checkpatch "rule" of citing commits here?
659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path")


>> Depending on iSCSI target behavior, it's possible to hit the case in
>> iscsi_complete_task where the task is still on a pending list
>> (!list_empty(&task->running)).  When that happens the task is removed
>> from the list while holding the session back_lock, but other task list
>> modification occur under the frwd_lock.  That leads to linked list
>> corruption and eventually a panicked system.
>>
>> Rather than back out the session lock split entirely, in order to try
>> and keep some of the performance gains this patch adds another lock to
>> maintain the task lists integrity.
>>
>> Major enterprise supported kernels have been backing out the lock split
>> for while now, thanks to the efforts at IBM where a lab setup has the
>> most reliable reproducer I've seen on this issue.  This patch has been
>> tested there successfully.
>>
>> Signed-off-by: Chris Leech 

Guess we're missing here:

(a) Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in
fast path")

(b) CC:  #v3.15+

Also, would be nice to point the original reporter, if possible:

Reported-by: Prashantha Subbarao 


>> ---
>>  drivers/scsi/libiscsi.c | 26 +-
>>  include/scsi/libiscsi.h |  1 +
>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
>> index 834d121..acb5ef3 100644
>> --- a/drivers/scsi/libiscsi.c
>> +++ b/drivers/scsi/libiscsi.c
>> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task,
>> int state)
>>  WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
>>  task->state = state;
>>  
>> -if (!list_empty(&task->running))
>> +spin_lock_bh(&conn->taskqueuelock);
>> +if (!list_empty(&task->running)) {
>> +WARN_ONCE(1, "iscsi_complete_task while task on list");

I'm retesting this patch right now, and again I saw this giant warning.
Perhaps worth to make it pr_debug()? So, it can be enabled as user
desire and don't alarm all users too much.

Thanks,


Guilherme


>>  list_del_init(&task->running);
>> +}
>> +spin_unlock_bh(&conn->taskqueuelock);
>>  
>>  if (conn->task == task)
>>  conn->task = NULL;
>> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
>> iscsi_hdr *hdr,
>>  if (session->tt->xmit_task(task))
>>  goto free_task;
>>  } else {
>> +spin_lock_bh(&conn->taskqueuelock);
>>  list_add_tail(&task->running, &conn->mgmtqueue);
>> +spin_unlock_bh(&conn->taskqueuelock);
>>  iscsi_conn_queue_work(conn);
>>  }
>>  
>> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
>>   * this may be on the requeue list already if the xmit_task callout
>>   * is handling the r2ts while we are adding new ones
>>   */
>> +spin_lock_bh(&conn->taskqueuelock);
>>  if (list_empty(&task->running))
>>  list_add_tail(&task->running, &conn->requeue);
>> +spin_unlock_bh(&conn->taskqueuelock);
>>  iscsi_conn_queue_work(conn);
>>  }
>>  EXPORT_SYMBOL_GPL(iscsi_requeue_task);
>> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>>   * only have one nop-out as a ping from us and targets should not
>>   * overflow us with nop-ins
>>   */
>> +spin_lock_bh(&conn->taskqueuelock);
>>  check_mgmt:
>>  while (!list_empty(&conn->mgmtqueue)) {
>>  conn->task = list_entry(conn->mgmtqueue.next,
>>   struct iscsi_task, running);
>>  list_del_init(&conn->task->running);
>> +spin_unlock_bh(&conn->taskqueuelock);
>>  if (iscsi_prep_mgmt_task(conn, conn->task)) {
>>  /* regular RX path uses back_lock */
>>  spin_lock_bh(&conn->session->back_lock);
>>  __iscsi_put_task(conn->task);
>>  spin_unlock_bh(&conn->session->back_lock);
>>  conn->task = NULL;
>> +spin_lock_bh(&conn->taskqueuelock);
>>  continue;
>>  }
>>  rc = iscsi_xmit_task(conn);
>>  if (rc)
>>  goto done;
>> + 

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-23 Thread Chris Leech
Yikes, my git-send-email settings suppressed the important CCs.  Sorry!

Guilherme and Ilkka, can you comment about your testing results or review 
please?

- Chris Leech

- Original Message -
> There's a rather long standing regression from commit
> 659743b [SCSI] libiscsi: Reduce locking contention in fast path
> 
> Depending on iSCSI target behavior, it's possible to hit the case in
> iscsi_complete_task where the task is still on a pending list
> (!list_empty(&task->running)).  When that happens the task is removed
> from the list while holding the session back_lock, but other task list
> modification occur under the frwd_lock.  That leads to linked list
> corruption and eventually a panicked system.
> 
> Rather than back out the session lock split entirely, in order to try
> and keep some of the performance gains this patch adds another lock to
> maintain the task lists integrity.
> 
> Major enterprise supported kernels have been backing out the lock split
> for while now, thanks to the efforts at IBM where a lab setup has the
> most reliable reproducer I've seen on this issue.  This patch has been
> tested there successfully.
> 
> Signed-off-by: Chris Leech 
> ---
>  drivers/scsi/libiscsi.c | 26 +-
>  include/scsi/libiscsi.h |  1 +
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 834d121..acb5ef3 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task,
> int state)
>   WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
>   task->state = state;
>  
> - if (!list_empty(&task->running))
> + spin_lock_bh(&conn->taskqueuelock);
> + if (!list_empty(&task->running)) {
> + WARN_ONCE(1, "iscsi_complete_task while task on list");
>   list_del_init(&task->running);
> + }
> + spin_unlock_bh(&conn->taskqueuelock);
>  
>   if (conn->task == task)
>   conn->task = NULL;
> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
> iscsi_hdr *hdr,
>   if (session->tt->xmit_task(task))
>   goto free_task;
>   } else {
> + spin_lock_bh(&conn->taskqueuelock);
>   list_add_tail(&task->running, &conn->mgmtqueue);
> + spin_unlock_bh(&conn->taskqueuelock);
>   iscsi_conn_queue_work(conn);
>   }
>  
> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
>* this may be on the requeue list already if the xmit_task callout
>* is handling the r2ts while we are adding new ones
>*/
> + spin_lock_bh(&conn->taskqueuelock);
>   if (list_empty(&task->running))
>   list_add_tail(&task->running, &conn->requeue);
> + spin_unlock_bh(&conn->taskqueuelock);
>   iscsi_conn_queue_work(conn);
>  }
>  EXPORT_SYMBOL_GPL(iscsi_requeue_task);
> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>* only have one nop-out as a ping from us and targets should not
>* overflow us with nop-ins
>*/
> + spin_lock_bh(&conn->taskqueuelock);
>  check_mgmt:
>   while (!list_empty(&conn->mgmtqueue)) {
>   conn->task = list_entry(conn->mgmtqueue.next,
>struct iscsi_task, running);
>   list_del_init(&conn->task->running);
> + spin_unlock_bh(&conn->taskqueuelock);
>   if (iscsi_prep_mgmt_task(conn, conn->task)) {
>   /* regular RX path uses back_lock */
>   spin_lock_bh(&conn->session->back_lock);
>   __iscsi_put_task(conn->task);
>   spin_unlock_bh(&conn->session->back_lock);
>   conn->task = NULL;
> + spin_lock_bh(&conn->taskqueuelock);
>   continue;
>   }
>   rc = iscsi_xmit_task(conn);
>   if (rc)
>   goto done;
> + spin_lock_bh(&conn->taskqueuelock);
>   }
>  
>   /* process pending command queue */
> @@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>   conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task,
>   running);
>   list_del_init(&conn->task->running);
> + spin_unlock_bh(&conn->taskqueuelock);
>   if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
>   fail_scsi_task(conn->task, DID_IMM_RETRY);
> + spin_lock_bh(&conn->taskqueuelock);
>   continue;
>   }
>   rc = iscsi_prep_scsi_cmd_pdu(conn->task);
>   if (rc) {
>   if (rc == -ENOMEM || rc == -EACCES) {
> + spin_lock_bh(&conn->taskqueuelock);
> 

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-23 Thread Martin K. Petersen
> "Chris" == Chris Leech  writes:

Chris> There's a rather long standing regression from commit 659743b
Chris> [SCSI] libiscsi: Reduce locking contention in fast path

Chris> Depending on iSCSI target behavior, it's possible to hit the case
Chris> in iscsi_complete_task where the task is still on a pending list
Chris> (!list_empty(&task->running)).  When that happens the task is
Chris> removed from the list while holding the session back_lock, but
Chris> other task list modification occur under the frwd_lock.  That
Chris> leads to linked list corruption and eventually a panicked system.

Chris> Rather than back out the session lock split entirely, in order to
Chris> try and keep some of the performance gains this patch adds
Chris> another lock to maintain the task lists integrity.

Chris> Major enterprise supported kernels have been backing out the lock
Chris> split for while now, thanks to the efforts at IBM where a lab
Chris> setup has the most reliable reproducer I've seen on this issue.
Chris> This patch has been tested there successfully.

Reviews, please!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-21 Thread Chris Leech
There's a rather long standing regression from commit
659743b [SCSI] libiscsi: Reduce locking contention in fast path

Depending on iSCSI target behavior, it's possible to hit the case in
iscsi_complete_task where the task is still on a pending list
(!list_empty(&task->running)).  When that happens the task is removed
from the list while holding the session back_lock, but other task list
modification occur under the frwd_lock.  That leads to linked list
corruption and eventually a panicked system.

Rather than back out the session lock split entirely, in order to try
and keep some of the performance gains this patch adds another lock to
maintain the task lists integrity.

Major enterprise supported kernels have been backing out the lock split
for while now, thanks to the efforts at IBM where a lab setup has the
most reliable reproducer I've seen on this issue.  This patch has been
tested there successfully.

Signed-off-by: Chris Leech 
---
 drivers/scsi/libiscsi.c | 26 +-
 include/scsi/libiscsi.h |  1 +
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 834d121..acb5ef3 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, 
int state)
WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
task->state = state;
 
-   if (!list_empty(&task->running))
+   spin_lock_bh(&conn->taskqueuelock);
+   if (!list_empty(&task->running)) {
+   WARN_ONCE(1, "iscsi_complete_task while task on list");
list_del_init(&task->running);
+   }
+   spin_unlock_bh(&conn->taskqueuelock);
 
if (conn->task == task)
conn->task = NULL;
@@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct 
iscsi_hdr *hdr,
if (session->tt->xmit_task(task))
goto free_task;
} else {
+   spin_lock_bh(&conn->taskqueuelock);
list_add_tail(&task->running, &conn->mgmtqueue);
+   spin_unlock_bh(&conn->taskqueuelock);
iscsi_conn_queue_work(conn);
}
 
@@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
 * this may be on the requeue list already if the xmit_task callout
 * is handling the r2ts while we are adding new ones
 */
+   spin_lock_bh(&conn->taskqueuelock);
if (list_empty(&task->running))
list_add_tail(&task->running, &conn->requeue);
+   spin_unlock_bh(&conn->taskqueuelock);
iscsi_conn_queue_work(conn);
 }
 EXPORT_SYMBOL_GPL(iscsi_requeue_task);
@@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 * only have one nop-out as a ping from us and targets should not
 * overflow us with nop-ins
 */
+   spin_lock_bh(&conn->taskqueuelock);
 check_mgmt:
while (!list_empty(&conn->mgmtqueue)) {
conn->task = list_entry(conn->mgmtqueue.next,
 struct iscsi_task, running);
list_del_init(&conn->task->running);
+   spin_unlock_bh(&conn->taskqueuelock);
if (iscsi_prep_mgmt_task(conn, conn->task)) {
/* regular RX path uses back_lock */
spin_lock_bh(&conn->session->back_lock);
__iscsi_put_task(conn->task);
spin_unlock_bh(&conn->session->back_lock);
conn->task = NULL;
+   spin_lock_bh(&conn->taskqueuelock);
continue;
}
rc = iscsi_xmit_task(conn);
if (rc)
goto done;
+   spin_lock_bh(&conn->taskqueuelock);
}
 
/* process pending command queue */
@@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task,
running);
list_del_init(&conn->task->running);
+   spin_unlock_bh(&conn->taskqueuelock);
if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
fail_scsi_task(conn->task, DID_IMM_RETRY);
+   spin_lock_bh(&conn->taskqueuelock);
continue;
}
rc = iscsi_prep_scsi_cmd_pdu(conn->task);
if (rc) {
if (rc == -ENOMEM || rc == -EACCES) {
+   spin_lock_bh(&conn->taskqueuelock);
list_add_tail(&conn->task->running,
  &conn->cmdqueue);
conn->task = NULL;
+   spin_unlock_bh(&conn->taskqueuelock);
goto d