subject:"\[PATCH 6\/7\] Fix race between starved list processing and device removal"

Re: [PATCH 6/7] Fix race between starved list processing and device removal

2012-11-21 Thread Bart Van Assche

On 11/02/12 07:32, Chanho Min wrote:
>> Yes. Here's the warning.
>> For the trace below, I used scsi_device_get/scsi_device_put() in 
>> scsi_run_queue(). (A little different
>>from your patch). But I think it's the same.
> 
> I think it's correct. cancel_work_sync can sleep. It is caught under 
> CONFIG_DEBUG_ATOMIC_SLEEP.
> What if we only enable irq at cancel_work_sync as the patch bellows?
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index bb7c482..6e17db9 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -350,7 +350,9 @@ static void scsi_device_dev_release_usercontext(struct 
> work_struct *work)
>  list_del(&sdev->starved_entry);
>  spin_unlock_irqrestore(sdev->host->host_lock, flags);
>   
> +   local_irq_enable();
>  cancel_work_sync(&sdev->event_work);
> +   local_irq_restore(flags);
>   
>  list_for_each_safe(this, tmp, &sdev->event_list) {
>  struct scsi_event *evt;
> 

As far as I can see this should work but unfortunately this change
creates a nontrivial dependency between scsi_run_queue() and
scsi_device_dev_release_usercontext(). Personally I would prefer
something like this follow-up patch:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 71bddec..20ea2e9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -453,15 +453,12 @@ static void scsi_run_queue(struct request_queue *q)
}
 
get_device(&sdev->sdev_gendev);
-   spin_unlock(shost->host_lock);
-
-   spin_lock(sdev->request_queue->queue_lock);
-   __blk_run_queue(sdev->request_queue);
-   spin_unlock(sdev->request_queue->queue_lock);
+   spin_unlock_irqrestore(shost->host_lock, flags);
 
+   blk_run_queue(sdev->request_queue);
put_device(&sdev->sdev_gendev);
 
-   spin_lock(shost->host_lock);
+   spin_lock_irqsave(shost->host_lock, flags);
}
/* put any unprocessed entries back */
list_splice(&starved_list, &shost->starved_list);

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] Fix race between starved list processing and device removal

2012-11-21 Thread Bart Van Assche


On 11/02/12 11:48, Bart Van Assche wrote:

[PATCH] Fix race between starved list processing and device removal
[ ... ]
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index ce5224c..2f0f31e 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
starget->reap_ref++;
list_del(&sdev->siblings);
list_del(&sdev->same_target_siblings);
-   list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);

cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
  void __scsi_remove_device(struct scsi_device *sdev)
  {
struct device *dev = &sdev->sdev_gendev;
+   struct Scsi_Host *shost = sdev->host;
+   unsigned long flags;

if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -973,7 +974,13 @@ void __scsi_remove_device(struct scsi_device *sdev)
 * scsi_run_queue() invocations have finished before tearing down the
 * device.
 */
+
scsi_device_set_state(sdev, SDEV_DEL);
+
+   spin_lock_irqsave(shost->host_lock, flags);
+   list_del(&sdev->starved_entry);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);



Please ignore this patch. Even with this patch applied there is still a 
race condition present, namely that the __blk_run_queue() call in 
scsi_run_queue() can get invoked after __scsi_remove_device() invoked 
put_device().


Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] Fix race between starved list processing and device removal

2012-11-02 Thread Bart Van Assche

On 10/30/12 06:40, Zhuang, Jin Can wrote:
> Yes. Here's the warning.
> For the trace below, I used scsi_device_get/scsi_device_put() in 
> scsi_run_queue(). (A little different from your patch). But I think it's the 
> same.
> 
> 10-23 18:15:53.309 8 8 I KERNEL  : [  268.994556] BUG: sleeping 
> function called from invalid context at linux-2.6/kernel/workqueue.c:2500
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.006898] in_atomic(): 0, 
> irqs_disabled(): 1, pid: 8, name: kworker/0:1
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.013689] Pid: 8, comm: 
> kworker/0:1 Tainted: GWC  3.0.34-140359-g85a6d67-dirty #43
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.022113] Call Trace:
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.028828]  [] 
> __might_sleep+0x10a/0x110
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.033695]  [] 
> wait_on_work+0x23/0x1a0
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.054913]  [] 
> __cancel_work_timer+0x6a/0x110
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.060217]  [] 
> cancel_work_sync+0xf/0x20
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.065087]  [] 
> scsi_device_dev_release_usercontext+0x6d/0x100
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.071785]  [] 
> execute_in_process_context+0x42/0x50
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.077609]  [] 
> scsi_device_dev_release+0x18/0x20
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.083174]  [] 
> device_release+0x20/0x80
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.092479]  [] 
> kobject_release+0x84/0x1f0
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.107430]  [] 
> kref_put+0x2c/0x60
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.111688]  [] 
> kobject_put+0x1d/0x50
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.116209]  [] 
> put_device+0x14/0x20
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.120646]  [] 
> scsi_device_put+0x37/0x60
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.125515]  [] 
> scsi_run_queue+0x247/0x320
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.130470]  [] 
> scsi_requeue_run_queue+0x13/0x20
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.135941]  [] 
> process_one_work+0xfe/0x3f0
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.146384]  [] 
> worker_thread+0x121/0x2f0
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.156383]  [] 
> kthread+0x6d/0x80
> 10-23 18:15:53.309 8 8 I KERNEL  : [  269.166124]  [] 
> kernel_thread_helper+0x6/0x10

Thanks for the feedback. Something that kept me busy since I posted
the patch at the start of this thread is how to avoid adding two
atomic operations in a hot path (the get_device() and put_device()
calls in scsi_run_queue()). The patch below should realize that.
However, since I haven't been able so far to trigger the above call
trace that means that the test I ran wasn't sufficient to trigger
all code paths. So it would be appreciated if anyone could help
testing the patch below.

[PATCH] Fix race between starved list processing and device removal

---
 block/blk-core.c  |9 +
 drivers/scsi/scsi_lib.c   |   20 ++--
 drivers/scsi/scsi_sysfs.c |9 -
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e4f4e06..565484f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -407,10 +407,11 @@ static void __blk_drain_queue(struct request_queue *q, 
bool drain_all)
 
/*
 * This function might be called on a queue which failed
-* driver init after queue creation or is not yet fully
-* active yet.  Some drivers (e.g. fd and loop) get unhappy
-* in such cases.  Kick queue iff dispatch queue has
-* something on it and @q has request_fn set.
+* driver init after queue creation, is not yet fully active
+* or is being cleaned up and doesn't make progress anymore
+* (e.g. a SCSI device in state SDEV_DEL). Kick queue iff
+* dispatch queue has something on it and @q has request_fn
+* set.
 */
if (!list_empty(&q->queue_head) && q->request_fn)
__blk_run_queue(q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 488035b..1763181 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -447,8 +447,9 @@ static void scsi_run_queue(struct request_queue *q)
  struct scsi_device, starved_entry);
list_del_init(&sdev->starved_entry);
if (scsi_target_is_busy(scsi_target(sdev))) {
-   list_move_tail(&sdev->starved_entry,
-  &shost->starved_list);
+   if (sdev->sdev_state != SDEV_DEL)
+   list_add_tail(&sdev->starved_entry,
+

RE: [PATCH 6/7] Fix race between starved list processing and device removal

2012-10-29 Thread Zhuang, Jin Can

Hi Bart,

Yes. Here's the warning. 
For the trace below, I used scsi_device_get/scsi_device_put() in 
scsi_run_queue(). (A little different from your patch). But I think it's the 
same.

10-23 18:15:53.309 8 8 I KERNEL  : [  268.994556] BUG: sleeping 
function called from invalid context at linux-2.6/kernel/workqueue.c:2500
10-23 18:15:53.309 8 8 I KERNEL  : [  269.006898] in_atomic(): 0, 
irqs_disabled(): 1, pid: 8, name: kworker/0:1
10-23 18:15:53.309 8 8 I KERNEL  : [  269.013689] Pid: 8, comm: 
kworker/0:1 Tainted: GWC  3.0.34-140359-g85a6d67-dirty #43
10-23 18:15:53.309 8 8 I KERNEL  : [  269.022113] Call Trace:
10-23 18:15:53.309 8 8 I KERNEL  : [  269.024567]  [] ? 
printk+0x1d/0x1f
10-23 18:15:53.309 8 8 I KERNEL  : [  269.028828]  [] 
__might_sleep+0x10a/0x110
10-23 18:15:53.309 8 8 I KERNEL  : [  269.033695]  [] 
wait_on_work+0x23/0x1a0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.038390]  [] ? 
_raw_spin_unlock_irqrestore+0x26/0x50
10-23 18:15:53.309 8 8 I KERNEL  : [  269.044476]  [] ? 
__pm_runtime_idle+0x66/0xf0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.049706]  [] ? 
ram_console_write+0x4e/0xa0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.054913]  [] 
__cancel_work_timer+0x6a/0x110
10-23 18:15:53.309 8 8 I KERNEL  : [  269.060217]  [] 
cancel_work_sync+0xf/0x20
10-23 18:15:53.309 8 8 I KERNEL  : [  269.065087]  [] 
scsi_device_dev_release_usercontext+0x6d/0x100
10-23 18:15:53.309 8 8 I KERNEL  : [  269.071785]  [] 
execute_in_process_context+0x42/0x50
10-23 18:15:53.309 8 8 I KERNEL  : [  269.077609]  [] 
scsi_device_dev_release+0x18/0x20
10-23 18:15:53.309 8 8 I KERNEL  : [  269.083174]  [] 
device_release+0x20/0x80
10-23 18:15:53.309 8 8 I KERNEL  : [  269.087958]  [] ? 
vprintk+0x2be/0x4e0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.092479]  [] 
kobject_release+0x84/0x1f0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.097439]  [] ? 
_raw_spin_lock_irq+0x22/0x30
10-23 18:15:53.309 8 8 I KERNEL  : [  269.102732]  [] ? 
kobject_del+0x70/0x70
10-23 18:15:53.309 8 8 I KERNEL  : [  269.107430]  [] 
kref_put+0x2c/0x60
10-23 18:15:53.309 8 8 I KERNEL  : [  269.111688]  [] 
kobject_put+0x1d/0x50
10-23 18:15:53.309 8 8 I KERNEL  : [  269.116209]  [] 
put_device+0x14/0x20
10-23 18:15:53.309 8 8 I KERNEL  : [  269.120646]  [] 
scsi_device_put+0x37/0x60
10-23 18:15:53.309 8 8 I KERNEL  : [  269.125515]  [] 
scsi_run_queue+0x247/0x320
10-23 18:15:53.309 8 8 I KERNEL  : [  269.130470]  [] 
scsi_requeue_run_queue+0x13/0x20
10-23 18:15:53.309 8 8 I KERNEL  : [  269.135941]  [] 
process_one_work+0xfe/0x3f0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.140997]  [] ? 
scsi_softirq_done+0x120/0x120
10-23 18:15:53.309 8 8 I KERNEL  : [  269.146384]  [] 
worker_thread+0x121/0x2f0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.151254]  [] ? 
rescuer_thread+0x1e0/0x1e0
10-23 18:15:53.309 8 8 I KERNEL  : [  269.156383]  [] 
kthread+0x6d/0x80
10-23 18:15:53.309 8 8 I KERNEL  : [  269.160558]  [] ? 
__init_kthread_worker+0x30/0x30
10-23 18:15:53.309 8 8 I KERNEL  : [  269.166124]  [] 
kernel_thread_helper+0x6/0x10

-Jincan

-Original Message-
From: linux-scsi-ow...@vger.kernel.org 
[mailto:linux-scsi-ow...@vger.kernel.org] On Behalf Of Bart Van Assche
Sent: Monday, October 29, 2012 10:32 PM
To: Zhuang, Jin Can
Cc: linux-scsi; James Bottomley; Mike Christie; Jens Axboe; Tejun Heo; Chanho 
Min
Subject: Re: [PATCH 6/7] Fix race between starved list processing and device 
removal

On 10/28/12 19:01, Zhuang, Jin Can wrote:
> I recently ran into the same issue
> The test I did is plug/unplug u-disk in an interval of 1 second. And
 > I found when sdev1 is being removed, scsi_run_queue is triggered by  > 
 > sdev2, which then accesses all the starving scsi device including sdev1.
>
> I have adopted the solution below which works fine for me so far.
> But there's one thing to fix in the patch below. When it put_device
 > in scsi_run_queue, irq is disabled. As put_device may get into sleep,  > irq 
 > should be enabled before it's called.

Hello Jincan,

Thanks for testing and the feedback. However, are you sure that
put_device() for a SCSI device may sleep ? Have you noticed the
execute_in_process_context() call in scsi_device_dev_release() ?

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] Fix race between starved list processing and device removal

2012-10-29 Thread Bart Van Assche

On 10/28/12 19:01, Zhuang, Jin Can wrote:

I recently ran into the same issue
The test I did is plug/unplug u-disk in an interval of 1 second. And

> I found when sdev1 is being removed, scsi_run_queue is triggered by
> sdev2, which then accesses all the starving scsi device including sdev1.

I have adopted the solution below which works fine for me so far.
But there's one thing to fix in the patch below. When it put_device

> in scsi_run_queue, irq is disabled. As put_device may get into sleep,
> irq should be enabled before it's called.

Hello Jincan,

Thanks for testing and the feedback. However, are you sure that 
put_device() for a SCSI device may sleep ? Have you noticed the 
execute_in_process_context() call in scsi_device_dev_release() ?

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] Fix race between starved list processing and device removal

2012-10-28 Thread Tejun Heo

Hello, Bart.

On Fri, Oct 26, 2012 at 02:05:01PM +0200, Bart Van Assche wrote:
> The SCSI core maintains a "starved list" per SCSI host. This is a
> list of devices for which one or more requests have been queued
> but that have not yet been passed to the SCSI LLD. The function
> scsi_run_queue() examines all SCSI devices on the starved list.

New paragraph.

> Since scsi_remove_device() can be invoked concurrently with
> scsi_run_queue() it is important to avoid that a SCSI device is
> accessed by that function after it has been freed.

New paragraph.

> Avoid that the
> sdev reference count can drop to zero before the queue is run by
> scsi_run_queue() by inserting a get_device() / put_device() pair
> in that function. Move the code for removing a device from the
> starved list from scsi_device_dev_release_usercontext() to
> __scsi_remove_device() such that it is guaranteed that the newly
> added get_device() call succeeds.
>
> Reported-and-tested-by: Chanho Min 
> Reference: http://lkml.org/lkml/2012/8/2/96
> Cc: Jens Axboe 
> Cc: Tejun Heo 
> Reviewed-by: Mike Christie 
> Signed-off-by: Bart Van Assche 

Heh, for some reason, the commit message is a hard read for me but I
think it should do.

 Acked-by: Tejun Heo 

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 6/7] Fix race between starved list processing and device removal

2012-10-28 Thread Zhuang, Jin Can

I recently ran into the same issue
The test I did is plug/unplug u-disk in an interval of 1 second. And I found 
when sdev1 is being removed, scsi_run_queue is triggered by sdev2, which then 
accesses all the starving scsi device including sdev1.

I have adopted the solution below which works fine for me so far.
But there's one thing to fix in the patch below. When it put_device in 
scsi_run_queue, irq is disabled. As put_device may get into sleep, irq should 
be enabled before it's called.
So I change it to:
spin_unlock_irq(sdev->request_queue->queue_lock);

put_device(&sdev->sdev_gendev);

spin_lock_irq(shost->host_lock);

-Jincan

-Original Message-
From: linux-scsi-ow...@vger.kernel.org 
[mailto:linux-scsi-ow...@vger.kernel.org] On Behalf Of Bart Van Assche
Sent: Friday, October 26, 2012 8:05 PM
To: linux-scsi
Cc: James Bottomley; Mike Christie; Jens Axboe; Tejun Heo; Chanho Min
Subject: [PATCH 6/7] Fix race between starved list processing and device removal

The SCSI core maintains a "starved list" per SCSI host. This is a list of 
devices for which one or more requests have been queued but that have not yet 
been passed to the SCSI LLD. The function
scsi_run_queue() examines all SCSI devices on the starved list.
Since scsi_remove_device() can be invoked concurrently with
scsi_run_queue() it is important to avoid that a SCSI device is accessed by 
that function after it has been freed. Avoid that the sdev reference count can 
drop to zero before the queue is run by
scsi_run_queue() by inserting a get_device() / put_device() pair in that 
function. Move the code for removing a device from the starved list from 
scsi_device_dev_release_usercontext() to
__scsi_remove_device() such that it is guaranteed that the newly added 
get_device() call succeeds.

Reported-and-tested-by: Chanho Min 
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Jens Axboe 
Cc: Tejun Heo 
Reviewed-by: Mike Christie 
Signed-off-by: Bart Van Assche 
---
 drivers/scsi/scsi_lib.c   |5 +
 drivers/scsi/scsi_sysfs.c |7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 
f29a1a9..c5d4ec2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
continue;
}
 
+   get_device(&sdev->sdev_gendev);
spin_unlock(shost->host_lock);
+
spin_lock(sdev->request_queue->queue_lock);
__blk_run_queue(sdev->request_queue);
spin_unlock(sdev->request_queue->queue_lock);
+
+   put_device(&sdev->sdev_gendev);
+
spin_lock(shost->host_lock);
}
/* put any unprocessed entries back */ diff --git 
a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index ce5224c..2661a957 
100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
starget->reap_ref++;
list_del(&sdev->siblings);
list_del(&sdev->same_target_siblings);
-   list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)  void 
__scsi_remove_device(struct scsi_device *sdev)  {
struct device *dev = &sdev->sdev_gendev;
+   struct Scsi_Host *shost = sdev->host;
+   unsigned long flags;
 
if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) @@ -977,6 
+978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);
 
+   spin_lock_irqsave(shost->host_lock, flags);
+   list_del(&sdev->starved_entry);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/7] Fix race between starved list processing and device removal

2012-10-26 Thread Bart Van Assche

The SCSI core maintains a "starved list" per SCSI host. This is a
list of devices for which one or more requests have been queued
but that have not yet been passed to the SCSI LLD. The function
scsi_run_queue() examines all SCSI devices on the starved list.
Since scsi_remove_device() can be invoked concurrently with
scsi_run_queue() it is important to avoid that a SCSI device is
accessed by that function after it has been freed. Avoid that the
sdev reference count can drop to zero before the queue is run by
scsi_run_queue() by inserting a get_device() / put_device() pair
in that function. Move the code for removing a device from the
starved list from scsi_device_dev_release_usercontext() to
__scsi_remove_device() such that it is guaranteed that the newly
added get_device() call succeeds.

Reported-and-tested-by: Chanho Min 
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Jens Axboe 
Cc: Tejun Heo 
Reviewed-by: Mike Christie 
Signed-off-by: Bart Van Assche 
---
 drivers/scsi/scsi_lib.c   |5 +
 drivers/scsi/scsi_sysfs.c |7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f29a1a9..c5d4ec2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
continue;
}
 
+   get_device(&sdev->sdev_gendev);
spin_unlock(shost->host_lock);
+
spin_lock(sdev->request_queue->queue_lock);
__blk_run_queue(sdev->request_queue);
spin_unlock(sdev->request_queue->queue_lock);
+
+   put_device(&sdev->sdev_gendev);
+
spin_lock(shost->host_lock);
}
/* put any unprocessed entries back */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index ce5224c..2661a957 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
starget->reap_ref++;
list_del(&sdev->siblings);
list_del(&sdev->same_target_siblings);
-   list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
struct device *dev = &sdev->sdev_gendev;
+   struct Scsi_Host *shost = sdev->host;
+   unsigned long flags;
 
if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);
 
+   spin_lock_irqsave(shost->host_lock, flags);
+   list_del(&sdev->starved_entry);
+   spin_unlock_irqrestore(shost->host_lock, flags);
+
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/7] Fix race between starved list processing and device removal

Re: [PATCH 6/7] Fix race between starved list processing and device removal

Re: [PATCH 6/7] Fix race between starved list processing and device removal

RE: [PATCH 6/7] Fix race between starved list processing and device removal

Re: [PATCH 6/7] Fix race between starved list processing and device removal

Re: [PATCH 6/7] Fix race between starved list processing and device removal

RE: [PATCH 6/7] Fix race between starved list processing and device removal

[PATCH 6/7] Fix race between starved list processing and device removal

8 matches

Site Navigation

Mail list logo

Footer information