Commit 659743b02c411075b26601725947b21df0bb29c8 [SCSI] libiscsi: Reduce
locking contention in fast path introduced a locking regression.

According to the comment for iscsi_complete_task back_lock must be held
while calling it:
/*
 * iscsi_complete_task - finish a task
 * @task: iscsi cmd task
 * @state: state to complete task with
 *
 * Must be called with session back_lock.
 * /

However, at three locations in libiscsi.c iscsi_complete_task is called
without holding back_lock.

This causes a race condition when processing certain iSCSI responses
leading to list corruption. This causes iSCSI traffic to stall.

A production box here at Kapsi with 44 active iSCSI paths was crashing
because of this about five times a day at worst.

Fixed by aquiring back_lock before calling iscsi_complete_task at the
remaining spots.

Please check if spin_lock_bh or spin_lock is more appropriate.

Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast
path")
Signed-off-by: Ilkka Sovanto <il...@kapsi.fi>
Tested-by: Ilkka Sovanto <il...@kapsi.fi>
Cc: sta...@vger.kernel.org
---

------------[ cut here ]------------
WARNING: CPU: 8 PID: 155 at lib/list_debug.c:62 __list_del_entry+0xc3/0xd0()
list_del corruption. next->prev should be ffff880feaff3080, but was
ffff880ff7eddac8
Modules linked in:
CPU: 8 PID: 155 Comm: kworker/u56:4 Not tainted 4.4.48-kapsi #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
Workqueue: iscsi_q_2 iscsi_xmitworker
 0000000000000286 00000000764ebb7a ffff880ff8383d20 ffffffff81589063
 ffff880ff8383d68 ffffffff820ff7b7 ffff880ff8383d58 ffffffff81090b96
 ffff880ff7eddad8 ffff880ff7edda10 ffff880ff7eddab8 ffff880ff7eddaa8
Call Trace:
 [<ffffffff81589063>] dump_stack+0x63/0x90
 [<ffffffff81090b96>] warn_slowpath_common+0x86/0xc0
 [<ffffffff81090c2c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff815a4983>] __list_del_entry+0xc3/0xd0
 [<ffffffff817128a6>] iscsi_xmitworker+0x186/0x2d0
 [<ffffffff810a7e29>] process_one_work+0x149/0x3e0
 [<ffffffff810a8129>] worker_thread+0x69/0x470
 [<ffffffff810a80c0>] ? process_one_work+0x3e0/0x3e0
 [<ffffffff810ad72a>] kthread+0xea/0x100
 [<ffffffff810ad640>] ? kthread_create_on_node+0x1a0/0x1a0
 [<ffffffff81c3c90f>] ret_from_fork+0x3f/0x70
 [<ffffffff810ad640>] ? kthread_create_on_node+0x1a0/0x1a0
---[ end trace 3b6d957784e71ffe ]---
------------[ cut here ]------------
WARNING: CPU: 12 PID: 13505 at lib/list_debug.c:33 __list_add+0x8e/0xc0()
list_add corruption. prev->next should be next (ffff880ff7eddab8), but was
ffff880feaff3080. (prev=ffff880feaff3080).
Modules linked in:
CPU: 12 PID: 13505 Comm: nfsd Tainted: G        W       4.4.48-kapsi #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
 0000000000000286 000000005c16edac ffff880fb6167000 ffffffff81589063
 ffff880fb6167048 ffffffff820ff7b7 ffff880fb6167038 ffffffff81090b96
 ffff880feaff3080 ffff880ff7eddab8 ffff880feaff3080 ffff880ff7ede800
Call Trace:
 [<ffffffff81589063>] dump_stack+0x63/0x90
 [<ffffffff81090b96>] warn_slowpath_common+0x86/0xc0
 [<ffffffff81090c2c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff8159dc59>] ? kfifo_copy_out.isra.5+0x59/0x70
 [<ffffffff815a488e>] __list_add+0x8e/0xc0
 [<ffffffff81712658>] iscsi_queuecommand+0x3a8/0x470
 [<ffffffff816e62e6>] scsi_dispatch_cmd+0xb6/0x240
 [<ffffffff816e7a35>] scsi_queue_rq+0x5c5/0x6c0
 [<ffffffff8156ad37>] __blk_mq_run_hw_queue+0x237/0x380
 [<ffffffff8156aaf7>] blk_mq_run_hw_queue+0x77/0x80
 [<ffffffff8156c5d3>] blk_mq_insert_request+0xa3/0xc0
 ...
---[ end trace 3b6d957784e71fff ]---
------------[ cut here ]------------
WARNING: CPU: 12 PID: 13505 at lib/list_debug.c:36 __list_add+0xb3/0xc0()
list_add double add: new=ffff880feaff3080, prev=ffff880feaff3080,
next=ffff880ff7eddab8.
Modules linked in:
CPU: 12 PID: 13505 Comm: nfsd Tainted: G        W       4.4.48-kapsi #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
 0000000000000286 000000005c16edac ffff880fb6167000 ffffffff81589063
 ffff880fb6167048 ffffffff820ff7b7 ffff880fb6167038 ffffffff81090b96
 ffff880feaff3080 ffff880ff7eddab8 ffff880feaff3080 ffff880ff7ede800
Call Trace:
 [<ffffffff81589063>] dump_stack+0x63/0x90
 [<ffffffff81090b96>] warn_slowpath_common+0x86/0xc0
 [<ffffffff81090c2c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff8159dc59>] ? kfifo_copy_out.isra.5+0x59/0x70
 [<ffffffff815a48b3>] __list_add+0xb3/0xc0
 [<ffffffff81712658>] iscsi_queuecommand+0x3a8/0x470
 [<ffffffff816e62e6>] scsi_dispatch_cmd+0xb6/0x240
 [<ffffffff816e7a35>] scsi_queue_rq+0x5c5/0x6c0
 [<ffffffff8156ad37>] __blk_mq_run_hw_queue+0x237/0x380
 [<ffffffff8156aaf7>] blk_mq_run_hw_queue+0x77/0x80
 [<ffffffff8156c5d3>] blk_mq_insert_request+0xa3/0xc0
 ...
 [<ffffffff810ad640>] ? kthread_create_on_node+0x1a0/0x1a0
---[ end trace 3b6d957784e72000 ]---
...

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1747,7 +1747,9 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct
scsi_cmnd *sc)
  return 0;

 prepd_reject:
+ spin_lock_bh(&session->back_lock);
  iscsi_complete_task(task, ISCSI_TASK_REQUEUE_SCSIQ);
+ spin_unlock_bh(&session->back_lock);
 reject:
  spin_unlock_bh(&session->frwd_lock);
  ISCSI_DBG_SESSION(session, "cmd 0x%x rejected (%d)\n",
@@ -1755,7 +1757,9 @@ reject:
  return SCSI_MLQUEUE_TARGET_BUSY;

 prepd_fault:
+ spin_lock_bh(&session->back_lock);
  iscsi_complete_task(task, ISCSI_TASK_REQUEUE_SCSIQ);
+ spin_lock_bh(&session->back_lock);
 fault:
  spin_unlock_bh(&session->frwd_lock);
  ISCSI_DBG_SESSION(session, "iscsi: cmd 0x%x is not queued (%d)\n",
@@ -3068,7 +3072,9 @@ fail_mgmt_tasks(struct iscsi_session *session, struct
iscsi_conn *conn)
  state = ISCSI_TASK_ABRT_SESS_RECOV;
  if (task->state == ISCSI_TASK_PENDING)
  state = ISCSI_TASK_COMPLETED;
+ spin_lock_bh(&session->back_lock);
  iscsi_complete_task(task, state);
+ spin_unlock_bh(&session->back_lock);

  }
 }

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to