Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Hi, Was this issue ever resolved? We are seeing this on 4.14.35. Thanks, Alan Adamson On 5/9/18 3:27 PM, Ilkka Sovanto wrote: Hi again, Chris and others! We're hitting this again on 4.14.32. Looks like the path iscsi_queuecommand -> prepd_reject/prepd_fault results in iscsi_complete_task getting called with frwd_lock held instead of the requisite back_lock which might explain the task list corruption under heavy loads. An alternate possibly problematic situation seems to be fail_mgmt_tasks which calls iscsi_complete_task while holding frwd_lock from iscsi_start_session_recovery. See also: https://github.com/open-iscsi/open-iscsi/issues/56 Attached an ugly attempt at ensuring back_lock is held instead. - Ilkka On 5 March 2017 at 22:49, Ilkka Sovanto wrote: Hi! Was running pretty nicely for a week or so, until we got this one: [534630.679965] BUG: unable to handle kernel at 0078 [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0 [534630.685724] PGD a1fcd3067 [534630.687346] Oops: 0002 [#1] [534630.688846] Modules linked in: [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014 [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti: 880c0d9d8000 [534630.700837] RIP: 0010:[] [] iscsi_xmit_task+0x29/0xc0 [534630.703689] RSP: 0018:880c0d9dbdb0 EFLAGS: 00010246 [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX: dead0200 [534630.708062] RDX: RSI: 0200 RDI: 880fe3dfda10 [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09: [534630.713539] R10: 880fb799f710 R11: 0260 R12: 880fe3dfda10 [534630.716344] R13: R14: 880fe3881c80 R15: 880fe3dfdae0 [534630.719143] FS: () GS:880fff42() [534630.722001] CS: 0010 DS: ES: CR0: 80050033 [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4: 000406e0 [534630.726163] Stack: [534630.727622] 880fe3dfdaa8 880fe3dfda10 880fe3dfdac0 880c0d9dbe18 [534630.730906] 8171226d 880fe3dfdad0 880fe3881c00 880fe3dfdab0 [534630.767489] 880ff7d98780 880fe3dfdae0 880ffec99400 880fe3dfdae8 [534630.770543] Call Trace: [534630.771967] [] iscsi_xmitworker+0x1dd/0x310 [534630.773523] [] process_one_work+0x149/0x3e0 [534630.775084] [] worker_thread+0x69/0x470 [534630.776585] [] ? process_one_work+0x3e0/0x3e0 [534630.778131] [] ? process_one_work+0x3e0/0x3e0 [534630.779682] [] kthread+0xea/0x100 [534630.781300] [] ? kthread_create_on_node+0x1a0/0x1a0 [534630.782872] [] ret_from_fork+0x3f/0x70 [534630.784340] [] ? kthread_create_on_node+0x1a0/0x1a0 [534630.785885] Code: [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0 [534630.794266] RSP [534630.795617] CR2: 0078 [534630.798051] ---[ end trace 9d963f95212dfbe2 ]--- [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt [534630.802712] Kernel Offset: disabled [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception in interrupt And here's the disassembly for iscsi_xmit_task: 8170efa0 : 8170efa0: e8 0b fd 52 00 call 81c3ecb0 <__fentry__> 8170efa5: 55 push rbp 8170efa6: b8 c3 ff ff ff moveax,0xffc3 8170efab: 48 89 e5movrbp,rsp 8170efae: 41 55 push r13 8170efb0: 41 54 push r12 8170efb2: 53 push rbx 8170efb3: 48 8b 97 f0 00 00 00movrdx,QWORD PTR [rdi+0xf0] 8170efba: 4c 8b af 90 00 00 00movr13,QWORD PTR [rdi+0x90] 8170efc1: 83 e2 02andedx,0x2 8170efc4: 75 71 jne 8170f037 8170efc6: 48 89 fbmovrbx,rdi *** 8170efc9: f0 41 ff 45 78 lock inc DWORD PTR [r13+0x78] *** 8170efce: 48 8b 47 10 movrax,QWORD PTR [rdi+0x10] 8170efd2: 48 8d b8 18 01 00 00leardi,[rax+0x118] 8170efd9: e8 b2 d3 52 00 call 81c3c390 <_raw_spin_unlock_bh> 8170efde: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170efe2: 4c 89 efmovrdi,r13 8170efe5: 48 8b 80 00 01 00 00movrax,QWORD PTR [rax+0x100] 8170efec: ff 90 98 00 00 00 call QWORD PTR [rax+0x98] 8170eff2: 41 89 c4movr12d,eax 8170eff5: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170eff9: 48 8d b8 18 01 00 00leardi,[rax+0x11
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Hi again, Chris and others! We're hitting this again on 4.14.32. Looks like the path iscsi_queuecommand -> prepd_reject/prepd_fault results in iscsi_complete_task getting called with frwd_lock held instead of the requisite back_lock which might explain the task list corruption under heavy loads. An alternate possibly problematic situation seems to be fail_mgmt_tasks which calls iscsi_complete_task while holding frwd_lock from iscsi_start_session_recovery. See also: https://github.com/open-iscsi/open-iscsi/issues/56 Attached an ugly attempt at ensuring back_lock is held instead. - Ilkka On 5 March 2017 at 22:49, Ilkka Sovanto wrote: > Hi! > > Was running pretty nicely for a week or so, until we got this one: > > [534630.679965] BUG: unable to handle kernel at 0078 > [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0 > [534630.685724] PGD a1fcd3067 > [534630.687346] Oops: 0002 [#1] > [534630.688846] Modules linked in: > [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW > [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014 > [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker > [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti: > 880c0d9d8000 > [534630.700837] RIP: 0010:[] [] > iscsi_xmit_task+0x29/0xc0 > [534630.703689] RSP: 0018:880c0d9dbdb0 EFLAGS: 00010246 > [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX: > dead0200 > [534630.708062] RDX: RSI: 0200 RDI: > 880fe3dfda10 > [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09: > > [534630.713539] R10: 880fb799f710 R11: 0260 R12: > 880fe3dfda10 > [534630.716344] R13: R14: 880fe3881c80 R15: > 880fe3dfdae0 > [534630.719143] FS: () GS:880fff42() > [534630.722001] CS: 0010 DS: ES: CR0: 80050033 > [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4: > 000406e0 > [534630.726163] Stack: > [534630.727622] 880fe3dfdaa8 880fe3dfda10 880fe3dfdac0 > 880c0d9dbe18 > [534630.730906] 8171226d 880fe3dfdad0 880fe3881c00 > 880fe3dfdab0 > [534630.767489] 880ff7d98780 880fe3dfdae0 880ffec99400 > 880fe3dfdae8 > > [534630.770543] Call Trace: > [534630.771967] [] iscsi_xmitworker+0x1dd/0x310 > [534630.773523] [] process_one_work+0x149/0x3e0 > [534630.775084] [] worker_thread+0x69/0x470 > [534630.776585] [] ? process_one_work+0x3e0/0x3e0 > [534630.778131] [] ? process_one_work+0x3e0/0x3e0 > [534630.779682] [] kthread+0xea/0x100 > [534630.781300] [] ? kthread_create_on_node+0x1a0/0x1a0 > [534630.782872] [] ret_from_fork+0x3f/0x70 > [534630.784340] [] ? kthread_create_on_node+0x1a0/0x1a0 > [534630.785885] Code: > [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0 > [534630.794266] RSP > [534630.795617] CR2: 0078 > [534630.798051] ---[ end trace 9d963f95212dfbe2 ]--- > [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt > [534630.802712] Kernel Offset: disabled > [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception > in interrupt > > And here's the disassembly for iscsi_xmit_task: > > 8170efa0 : > 8170efa0: e8 0b fd 52 00 call > 81c3ecb0 <__fentry__> > 8170efa5: 55 push rbp > 8170efa6: b8 c3 ff ff ff moveax,0xffc3 > 8170efab: 48 89 e5movrbp,rsp > 8170efae: 41 55 push r13 > 8170efb0: 41 54 push r12 > 8170efb2: 53 push rbx > 8170efb3: 48 8b 97 f0 00 00 00movrdx,QWORD PTR > [rdi+0xf0] > 8170efba: 4c 8b af 90 00 00 00movr13,QWORD PTR > [rdi+0x90] > 8170efc1: 83 e2 02andedx,0x2 > 8170efc4: 75 71 jne > 8170f037 > 8170efc6: 48 89 fbmovrbx,rdi > *** > 8170efc9: f0 41 ff 45 78 lock inc DWORD PTR [r13+0x78] > *** > 8170efce: 48 8b 47 10 movrax,QWORD PTR > [rdi+0x10] > 8170efd2: 48 8d b8 18 01 00 00leardi,[rax+0x118] > 8170efd9: e8 b2 d3 52 00 call > 81c3c390 <_raw_spin_unlock_bh> > 8170efde: 48 8b 43 10 movrax,QWORD PTR > [rbx+0x10] > 8170efe2: 4c 89 efmovrdi,r13 > 8170efe5: 48 8b 80 00 01 00 00movrax,QWORD PTR > [rax+0x100] > 8170efec: ff 90 98 00 00 00 call QWORD PTR [rax+0x98] > 8170eff2: 41 89 c4movr12d,eax > 8170eff5: 48 8b 43 10 movrax,QWORD PTR > [rbx+0x10] >
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Hi! Was running pretty nicely for a week or so, until we got this one: [534630.679965] BUG: unable to handle kernel at 0078 [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0 [534630.685724] PGD a1fcd3067 [534630.687346] Oops: 0002 [#1] [534630.688846] Modules linked in: [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014 [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti: 880c0d9d8000 [534630.700837] RIP: 0010:[] [] iscsi_xmit_task+0x29/0xc0 [534630.703689] RSP: 0018:880c0d9dbdb0 EFLAGS: 00010246 [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX: dead0200 [534630.708062] RDX: RSI: 0200 RDI: 880fe3dfda10 [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09: [534630.713539] R10: 880fb799f710 R11: 0260 R12: 880fe3dfda10 [534630.716344] R13: R14: 880fe3881c80 R15: 880fe3dfdae0 [534630.719143] FS: () GS:880fff42() [534630.722001] CS: 0010 DS: ES: CR0: 80050033 [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4: 000406e0 [534630.726163] Stack: [534630.727622] 880fe3dfdaa8 880fe3dfda10 880fe3dfdac0 880c0d9dbe18 [534630.730906] 8171226d 880fe3dfdad0 880fe3881c00 880fe3dfdab0 [534630.767489] 880ff7d98780 880fe3dfdae0 880ffec99400 880fe3dfdae8 [534630.770543] Call Trace: [534630.771967] [] iscsi_xmitworker+0x1dd/0x310 [534630.773523] [] process_one_work+0x149/0x3e0 [534630.775084] [] worker_thread+0x69/0x470 [534630.776585] [] ? process_one_work+0x3e0/0x3e0 [534630.778131] [] ? process_one_work+0x3e0/0x3e0 [534630.779682] [] kthread+0xea/0x100 [534630.781300] [] ? kthread_create_on_node+0x1a0/0x1a0 [534630.782872] [] ret_from_fork+0x3f/0x70 [534630.784340] [] ? kthread_create_on_node+0x1a0/0x1a0 [534630.785885] Code: [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0 [534630.794266] RSP [534630.795617] CR2: 0078 [534630.798051] ---[ end trace 9d963f95212dfbe2 ]--- [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt [534630.802712] Kernel Offset: disabled [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception in interrupt And here's the disassembly for iscsi_xmit_task: 8170efa0 : 8170efa0: e8 0b fd 52 00 call 81c3ecb0 <__fentry__> 8170efa5: 55 push rbp 8170efa6: b8 c3 ff ff ff moveax,0xffc3 8170efab: 48 89 e5movrbp,rsp 8170efae: 41 55 push r13 8170efb0: 41 54 push r12 8170efb2: 53 push rbx 8170efb3: 48 8b 97 f0 00 00 00movrdx,QWORD PTR [rdi+0xf0] 8170efba: 4c 8b af 90 00 00 00movr13,QWORD PTR [rdi+0x90] 8170efc1: 83 e2 02andedx,0x2 8170efc4: 75 71 jne 8170f037 8170efc6: 48 89 fbmovrbx,rdi *** 8170efc9: f0 41 ff 45 78 lock inc DWORD PTR [r13+0x78] *** 8170efce: 48 8b 47 10 movrax,QWORD PTR [rdi+0x10] 8170efd2: 48 8d b8 18 01 00 00leardi,[rax+0x118] 8170efd9: e8 b2 d3 52 00 call 81c3c390 <_raw_spin_unlock_bh> 8170efde: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170efe2: 4c 89 efmovrdi,r13 8170efe5: 48 8b 80 00 01 00 00movrax,QWORD PTR [rax+0x100] 8170efec: ff 90 98 00 00 00 call QWORD PTR [rax+0x98] 8170eff2: 41 89 c4movr12d,eax 8170eff5: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170eff9: 48 8d b8 18 01 00 00leardi,[rax+0x118] 8170f000: e8 9b d4 52 00 call 81c3c4a0 <_raw_spin_lock_bh> 8170f005: 45 85 e4test r12d,r12d 8170f008: 74 34 je 8170f03e 8170f00a: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170f00e: 48 8d b8 1c 01 00 00leardi,[rax+0x11c] 8170f015: e8 e6 d3 52 00 call 81c3c400 <_raw_spin_lock> 8170f01a: 4c 89 efmovrdi,r13 8170f01d: e8 ce f5 ff ff call 8170e5f0 <__iscsi_put_task> 8170f022: 48 8b 7b 10 movrdi,QWORD PTR [rbx+0x10] 8170f026: 48 81 c7 1c 01 00 00addrdi,0x11c 8170f02d:
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
On 02/23/2017 07:25 PM, Chris Leech wrote: > Yikes, my git-send-email settings suppressed the important CCs. Sorry! > > Guilherme and Ilkka, can you comment about your testing results or review > please? Hi Chris, thanks for looping me. Patch seems nice, I do have some points below, most nitpicks heheh Feel free to accept or not the suggestions! Also, you can add my: Reviewed-by: Guilherme G. Piccoli > > - Chris Leech > > - Original Message - >> There's a rather long standing regression from commit >> 659743b [SCSI] libiscsi: Reduce locking contention in fast path >> Perhaps worth to follow the checkpatch "rule" of citing commits here? 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path") >> Depending on iSCSI target behavior, it's possible to hit the case in >> iscsi_complete_task where the task is still on a pending list >> (!list_empty(&task->running)). When that happens the task is removed >> from the list while holding the session back_lock, but other task list >> modification occur under the frwd_lock. That leads to linked list >> corruption and eventually a panicked system. >> >> Rather than back out the session lock split entirely, in order to try >> and keep some of the performance gains this patch adds another lock to >> maintain the task lists integrity. >> >> Major enterprise supported kernels have been backing out the lock split >> for while now, thanks to the efforts at IBM where a lab setup has the >> most reliable reproducer I've seen on this issue. This patch has been >> tested there successfully. >> >> Signed-off-by: Chris Leech Guess we're missing here: (a) Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path") (b) CC: #v3.15+ Also, would be nice to point the original reporter, if possible: Reported-by: Prashantha Subbarao >> --- >> drivers/scsi/libiscsi.c | 26 +- >> include/scsi/libiscsi.h | 1 + >> 2 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c >> index 834d121..acb5ef3 100644 >> --- a/drivers/scsi/libiscsi.c >> +++ b/drivers/scsi/libiscsi.c >> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, >> int state) >> WARN_ON_ONCE(task->state == ISCSI_TASK_FREE); >> task->state = state; >> >> -if (!list_empty(&task->running)) >> +spin_lock_bh(&conn->taskqueuelock); >> +if (!list_empty(&task->running)) { >> +WARN_ONCE(1, "iscsi_complete_task while task on list"); I'm retesting this patch right now, and again I saw this giant warning. Perhaps worth to make it pr_debug()? So, it can be enabled as user desire and don't alarm all users too much. Thanks, Guilherme >> list_del_init(&task->running); >> +} >> +spin_unlock_bh(&conn->taskqueuelock); >> >> if (conn->task == task) >> conn->task = NULL; >> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct >> iscsi_hdr *hdr, >> if (session->tt->xmit_task(task)) >> goto free_task; >> } else { >> +spin_lock_bh(&conn->taskqueuelock); >> list_add_tail(&task->running, &conn->mgmtqueue); >> +spin_unlock_bh(&conn->taskqueuelock); >> iscsi_conn_queue_work(conn); >> } >> >> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task) >> * this may be on the requeue list already if the xmit_task callout >> * is handling the r2ts while we are adding new ones >> */ >> +spin_lock_bh(&conn->taskqueuelock); >> if (list_empty(&task->running)) >> list_add_tail(&task->running, &conn->requeue); >> +spin_unlock_bh(&conn->taskqueuelock); >> iscsi_conn_queue_work(conn); >> } >> EXPORT_SYMBOL_GPL(iscsi_requeue_task); >> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) >> * only have one nop-out as a ping from us and targets should not >> * overflow us with nop-ins >> */ >> +spin_lock_bh(&conn->taskqueuelock); >> check_mgmt: >> while (!list_empty(&conn->mgmtqueue)) { >> conn->task = list_entry(conn->mgmtqueue.next, >> struct iscsi_task, running); >> list_del_init(&conn->task->running); >> +spin_unlock_bh(&conn->taskqueuelock); >> if (iscsi_prep_mgmt_task(conn, conn->task)) { >> /* regular RX path uses back_lock */ >> spin_lock_bh(&conn->session->back_lock); >> __iscsi_put_task(conn->task); >> spin_unlock_bh(&conn->session->back_lock); >> conn->task = NULL; >> +spin_lock_bh(&conn->taskqueuelock); >> continue; >> } >> rc = iscsi_xmit_task(conn); >> if (rc) >> goto done; >> +
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Yikes, my git-send-email settings suppressed the important CCs. Sorry! Guilherme and Ilkka, can you comment about your testing results or review please? - Chris Leech - Original Message - > There's a rather long standing regression from commit > 659743b [SCSI] libiscsi: Reduce locking contention in fast path > > Depending on iSCSI target behavior, it's possible to hit the case in > iscsi_complete_task where the task is still on a pending list > (!list_empty(&task->running)). When that happens the task is removed > from the list while holding the session back_lock, but other task list > modification occur under the frwd_lock. That leads to linked list > corruption and eventually a panicked system. > > Rather than back out the session lock split entirely, in order to try > and keep some of the performance gains this patch adds another lock to > maintain the task lists integrity. > > Major enterprise supported kernels have been backing out the lock split > for while now, thanks to the efforts at IBM where a lab setup has the > most reliable reproducer I've seen on this issue. This patch has been > tested there successfully. > > Signed-off-by: Chris Leech > --- > drivers/scsi/libiscsi.c | 26 +- > include/scsi/libiscsi.h | 1 + > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 834d121..acb5ef3 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, > int state) > WARN_ON_ONCE(task->state == ISCSI_TASK_FREE); > task->state = state; > > - if (!list_empty(&task->running)) > + spin_lock_bh(&conn->taskqueuelock); > + if (!list_empty(&task->running)) { > + WARN_ONCE(1, "iscsi_complete_task while task on list"); > list_del_init(&task->running); > + } > + spin_unlock_bh(&conn->taskqueuelock); > > if (conn->task == task) > conn->task = NULL; > @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct > iscsi_hdr *hdr, > if (session->tt->xmit_task(task)) > goto free_task; > } else { > + spin_lock_bh(&conn->taskqueuelock); > list_add_tail(&task->running, &conn->mgmtqueue); > + spin_unlock_bh(&conn->taskqueuelock); > iscsi_conn_queue_work(conn); > } > > @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task) >* this may be on the requeue list already if the xmit_task callout >* is handling the r2ts while we are adding new ones >*/ > + spin_lock_bh(&conn->taskqueuelock); > if (list_empty(&task->running)) > list_add_tail(&task->running, &conn->requeue); > + spin_unlock_bh(&conn->taskqueuelock); > iscsi_conn_queue_work(conn); > } > EXPORT_SYMBOL_GPL(iscsi_requeue_task); > @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) >* only have one nop-out as a ping from us and targets should not >* overflow us with nop-ins >*/ > + spin_lock_bh(&conn->taskqueuelock); > check_mgmt: > while (!list_empty(&conn->mgmtqueue)) { > conn->task = list_entry(conn->mgmtqueue.next, >struct iscsi_task, running); > list_del_init(&conn->task->running); > + spin_unlock_bh(&conn->taskqueuelock); > if (iscsi_prep_mgmt_task(conn, conn->task)) { > /* regular RX path uses back_lock */ > spin_lock_bh(&conn->session->back_lock); > __iscsi_put_task(conn->task); > spin_unlock_bh(&conn->session->back_lock); > conn->task = NULL; > + spin_lock_bh(&conn->taskqueuelock); > continue; > } > rc = iscsi_xmit_task(conn); > if (rc) > goto done; > + spin_lock_bh(&conn->taskqueuelock); > } > > /* process pending command queue */ > @@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) > conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task, > running); > list_del_init(&conn->task->running); > + spin_unlock_bh(&conn->taskqueuelock); > if (conn->session->state == ISCSI_STATE_LOGGING_OUT) { > fail_scsi_task(conn->task, DID_IMM_RETRY); > + spin_lock_bh(&conn->taskqueuelock); > continue; > } > rc = iscsi_prep_scsi_cmd_pdu(conn->task); > if (rc) { > if (rc == -ENOMEM || rc == -EACCES) { > + spin_lock_bh(&conn->taskqueuelock); >
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
> "Chris" == Chris Leech writes: Chris> There's a rather long standing regression from commit 659743b Chris> [SCSI] libiscsi: Reduce locking contention in fast path Chris> Depending on iSCSI target behavior, it's possible to hit the case Chris> in iscsi_complete_task where the task is still on a pending list Chris> (!list_empty(&task->running)). When that happens the task is Chris> removed from the list while holding the session back_lock, but Chris> other task list modification occur under the frwd_lock. That Chris> leads to linked list corruption and eventually a panicked system. Chris> Rather than back out the session lock split entirely, in order to Chris> try and keep some of the performance gains this patch adds Chris> another lock to maintain the task lists integrity. Chris> Major enterprise supported kernels have been backing out the lock Chris> split for while now, thanks to the efforts at IBM where a lab Chris> setup has the most reliable reproducer I've seen on this issue. Chris> This patch has been tested there successfully. Reviews, please! -- Martin K. Petersen Oracle Linux Engineering
[PATCH] libiscsi: add lock around task lists to fix list corruption regression
There's a rather long standing regression from commit 659743b [SCSI] libiscsi: Reduce locking contention in fast path Depending on iSCSI target behavior, it's possible to hit the case in iscsi_complete_task where the task is still on a pending list (!list_empty(&task->running)). When that happens the task is removed from the list while holding the session back_lock, but other task list modification occur under the frwd_lock. That leads to linked list corruption and eventually a panicked system. Rather than back out the session lock split entirely, in order to try and keep some of the performance gains this patch adds another lock to maintain the task lists integrity. Major enterprise supported kernels have been backing out the lock split for while now, thanks to the efforts at IBM where a lab setup has the most reliable reproducer I've seen on this issue. This patch has been tested there successfully. Signed-off-by: Chris Leech --- drivers/scsi/libiscsi.c | 26 +- include/scsi/libiscsi.h | 1 + 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 834d121..acb5ef3 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, int state) WARN_ON_ONCE(task->state == ISCSI_TASK_FREE); task->state = state; - if (!list_empty(&task->running)) + spin_lock_bh(&conn->taskqueuelock); + if (!list_empty(&task->running)) { + WARN_ONCE(1, "iscsi_complete_task while task on list"); list_del_init(&task->running); + } + spin_unlock_bh(&conn->taskqueuelock); if (conn->task == task) conn->task = NULL; @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, if (session->tt->xmit_task(task)) goto free_task; } else { + spin_lock_bh(&conn->taskqueuelock); list_add_tail(&task->running, &conn->mgmtqueue); + spin_unlock_bh(&conn->taskqueuelock); iscsi_conn_queue_work(conn); } @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task) * this may be on the requeue list already if the xmit_task callout * is handling the r2ts while we are adding new ones */ + spin_lock_bh(&conn->taskqueuelock); if (list_empty(&task->running)) list_add_tail(&task->running, &conn->requeue); + spin_unlock_bh(&conn->taskqueuelock); iscsi_conn_queue_work(conn); } EXPORT_SYMBOL_GPL(iscsi_requeue_task); @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) * only have one nop-out as a ping from us and targets should not * overflow us with nop-ins */ + spin_lock_bh(&conn->taskqueuelock); check_mgmt: while (!list_empty(&conn->mgmtqueue)) { conn->task = list_entry(conn->mgmtqueue.next, struct iscsi_task, running); list_del_init(&conn->task->running); + spin_unlock_bh(&conn->taskqueuelock); if (iscsi_prep_mgmt_task(conn, conn->task)) { /* regular RX path uses back_lock */ spin_lock_bh(&conn->session->back_lock); __iscsi_put_task(conn->task); spin_unlock_bh(&conn->session->back_lock); conn->task = NULL; + spin_lock_bh(&conn->taskqueuelock); continue; } rc = iscsi_xmit_task(conn); if (rc) goto done; + spin_lock_bh(&conn->taskqueuelock); } /* process pending command queue */ @@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task, running); list_del_init(&conn->task->running); + spin_unlock_bh(&conn->taskqueuelock); if (conn->session->state == ISCSI_STATE_LOGGING_OUT) { fail_scsi_task(conn->task, DID_IMM_RETRY); + spin_lock_bh(&conn->taskqueuelock); continue; } rc = iscsi_prep_scsi_cmd_pdu(conn->task); if (rc) { if (rc == -ENOMEM || rc == -EACCES) { + spin_lock_bh(&conn->taskqueuelock); list_add_tail(&conn->task->running, &conn->cmdqueue); conn->task = NULL; + spin_unlock_bh(&conn->taskqueuelock); goto d