Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()

2015-10-23 Thread Vitaly Kuznetsov
Bart Van Assche  writes:

> On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:
>> On some host errors storvsc module tries to remove sdev by scheduling a job
>> which does the following:
>>
>> sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
>> if (sdev) {
>> scsi_remove_device(sdev);
>> scsi_device_put(sdev);
>> }
>>
>> While this code seems correct the following crash is observed:
>>
>>   general protection fault:  [#1] SMP DEBUG_PAGEALLOC
>>   RIP: 0010:[]  [] bdi_destroy+0x39/0x220
>>   ...
>>   [] ? _raw_spin_unlock_irq+0x2c/0x40
>>   [] blk_cleanup_queue+0x17b/0x270
>>   [] __scsi_remove_device+0x54/0xd0 [scsi_mod]
>>   [] scsi_remove_device+0x2b/0x40 [scsi_mod]
>>   [] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
>>   [] process_one_work+0x1b1/0x530
>>   ...
>>
>> The problem comes with the fact that many such jobs (for the same device)
>> are being scheduled simultaneously. While scsi_remove_device() uses
>> shost->scan_mutex and scsi_device_lookup() will fail for a device in
>> SDEV_DEL state there is no protection against someone who did
>> scsi_device_lookup() before we actually entered __scsi_remove_device(). So
>> the whole scenario looks like that: two callers do simultaneous (or
>> preemption happens) calls to scsi_device_lookup() ant these calls succeed
>> for all of them, after that both callers try doing scsi_remove_device().
>> shost->scan_mutex only serializes their calls to __scsi_remove_device()
>> and we end up doing the cleanup path twice.
>>
>> Signed-off-by: Vitaly Kuznetsov 
>> ---
>>   drivers/scsi/scsi_sysfs.c | 8 
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index b89..e0d2707 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>   {
>>  struct device *dev = >sdev_gendev;
>>
>> +/*
>> + * This cleanup path is not reentrant and while it is impossible
>> + * to get a new reference with scsi_device_get() someone can still
>> + * hold a previously acquired one.
>> + */
>> +if (sdev->sdev_state == SDEV_DEL)
>> +return;
>> +
>>  if (sdev->is_visible) {
>>  if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>>  return;
>
> Hello Vitaly,
>
> Sorry but I don't see how the above patch could be a proper fix. If
> two calls to __scsi_remove_device() occur concurrently the crash
> explained above can still occur. The storsvc driver should be modified
> such that concurrent __scsi_remove_device() calls do not occur. How
> about preventing concurrent calls via a mutex ?

Nobody is supposed to call __scsi_remove_device() without holding
shost->scan_mutex and scsi_remove_device() does that. Here I'm trying to
protect against two *consequent* calls to the __scsi_remove_device(). As
we set sdev_state to SDEV_DEL on the cleanup path checking it should be
enough.

> Another possible
> approach is to use the workqueue mechanism. An example can be found in
> the SRP initiator driver (ib_srp).

Yes, but I think the existent approach is good enough:
1) Every caller is supposed to get a reference to the device with
scsi_device_get() (scsi_device_lookup() does that).
2) shost->scan_mutex is suppose to be held by all __scsi_remove_device()
callers.

-- 
  Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()

2015-10-22 Thread Bart Van Assche

On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:

On some host errors storvsc module tries to remove sdev by scheduling a job
which does the following:

sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
if (sdev) {
scsi_remove_device(sdev);
scsi_device_put(sdev);
}

While this code seems correct the following crash is observed:

  general protection fault:  [#1] SMP DEBUG_PAGEALLOC
  RIP: 0010:[]  [] bdi_destroy+0x39/0x220
  ...
  [] ? _raw_spin_unlock_irq+0x2c/0x40
  [] blk_cleanup_queue+0x17b/0x270
  [] __scsi_remove_device+0x54/0xd0 [scsi_mod]
  [] scsi_remove_device+0x2b/0x40 [scsi_mod]
  [] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
  [] process_one_work+0x1b1/0x530
  ...

The problem comes with the fact that many such jobs (for the same device)
are being scheduled simultaneously. While scsi_remove_device() uses
shost->scan_mutex and scsi_device_lookup() will fail for a device in
SDEV_DEL state there is no protection against someone who did
scsi_device_lookup() before we actually entered __scsi_remove_device(). So
the whole scenario looks like that: two callers do simultaneous (or
preemption happens) calls to scsi_device_lookup() ant these calls succeed
for all of them, after that both callers try doing scsi_remove_device().
shost->scan_mutex only serializes their calls to __scsi_remove_device()
and we end up doing the cleanup path twice.

Signed-off-by: Vitaly Kuznetsov 
---
  drivers/scsi/scsi_sysfs.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b89..e0d2707 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
  {
struct device *dev = >sdev_gendev;

+   /*
+* This cleanup path is not reentrant and while it is impossible
+* to get a new reference with scsi_device_get() someone can still
+* hold a previously acquired one.
+*/
+   if (sdev->sdev_state == SDEV_DEL)
+   return;
+
if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
return;


Hello Vitaly,

Sorry but I don't see how the above patch could be a proper fix. If two 
calls to __scsi_remove_device() occur concurrently the crash explained 
above can still occur. The storsvc driver should be modified such that 
concurrent __scsi_remove_device() calls do not occur. How about 
preventing concurrent calls via a mutex ? Another possible approach is 
to use the workqueue mechanism. An example can be found in the SRP 
initiator driver (ib_srp).


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()

2015-10-22 Thread Vitaly Kuznetsov
On some host errors storvsc module tries to remove sdev by scheduling a job
which does the following:

   sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
   if (sdev) {
   scsi_remove_device(sdev);
   scsi_device_put(sdev);
   }

While this code seems correct the following crash is observed:

 general protection fault:  [#1] SMP DEBUG_PAGEALLOC
 RIP: 0010:[]  [] bdi_destroy+0x39/0x220
 ...
 [] ? _raw_spin_unlock_irq+0x2c/0x40
 [] blk_cleanup_queue+0x17b/0x270
 [] __scsi_remove_device+0x54/0xd0 [scsi_mod]
 [] scsi_remove_device+0x2b/0x40 [scsi_mod]
 [] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
 [] process_one_work+0x1b1/0x530
 ...

The problem comes with the fact that many such jobs (for the same device)
are being scheduled simultaneously. While scsi_remove_device() uses
shost->scan_mutex and scsi_device_lookup() will fail for a device in
SDEV_DEL state there is no protection against someone who did
scsi_device_lookup() before we actually entered __scsi_remove_device(). So
the whole scenario looks like that: two callers do simultaneous (or
preemption happens) calls to scsi_device_lookup() ant these calls succeed
for all of them, after that both callers try doing scsi_remove_device().
shost->scan_mutex only serializes their calls to __scsi_remove_device()
and we end up doing the cleanup path twice.

Signed-off-by: Vitaly Kuznetsov 
---
 drivers/scsi/scsi_sysfs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b89..e0d2707 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
 {
struct device *dev = >sdev_gendev;
 
+   /*
+* This cleanup path is not reentrant and while it is impossible
+* to get a new reference with scsi_device_get() someone can still
+* hold a previously acquired one.
+*/
+   if (sdev->sdev_state == SDEV_DEL)
+   return;
+
if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
return;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html