Re: [RFC 2/2] dual scan thread bug fix

2014-01-08 Thread Alan Stern
On Wed, 8 Jan 2014, James Bottomley wrote:

 OK, Agreed, but that means modifying the 1/2 patch with the below.  This
 should make the proposed diff to 2/2 correct.
 
 James
 
 ---
 diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
 index ef3f958..5fad646 100644
 --- a/drivers/scsi/scsi_scan.c
 +++ b/drivers/scsi/scsi_scan.c
 @@ -417,7 +417,7 @@ static struct scsi_target *scsi_alloc_target(struct 
 device *parent,
   + shost-transportt-target_size;
   struct scsi_target *starget;
   struct scsi_target *found_target;
 - int error;
 + int error, ref_got;
  
   starget = kzalloc(size, GFP_KERNEL);
   if (!starget) {
 @@ -466,15 +466,15 @@ static struct scsi_target *scsi_alloc_target(struct 
 device *parent,
   return starget;
  
   found:
 - if (!kref_get_unless_zero(found_target-reap_ref))
 - /*
 -  * release routine already fired.  Target is dead, but
 -  * STARGET_DEL may not yet be set (set in the release
 -  * routine), so set here as well, just in case
 -  */
 - found_target-state = STARGET_DEL;
 + /*
 +  * release routine already fired if kref is zero, so if we can still
 +  * take the reference, the target must be alive.  If we can't, it must
 +  * be dying and we need to wait for a new target
 +  */
 + ref_got = kref_get_unless_zero(found_target-reap_ref);
 +
   spin_unlock_irqrestore(shost-host_lock, flags);
 - if (found_target-state != STARGET_DEL) {
 + if (ref_got) {
   put_device(dev);
   return found_target;
   }
 @@ -482,8 +482,8 @@ static struct scsi_target *scsi_alloc_target(struct 
 device *parent,
* Unfortunately, we found a dying target; need to wait until it's
* dead before we can get a new one.  There is an anomaly here.  We
* *should* call scsi_target_reap() to balance the kref_get() of the
 -  * reap_ref above.  However, since the target is in state STARGET_DEL,
 -  * it's already invisible and the reap_ref is irrelevant.  If we call
 +  * reap_ref above.  However, since the target being released, it's
 +  * already invisible and the reap_ref is irrelevant.  If we call
* scsi_target_reap() we might spuriously do another device_del() on
* an already invisible target.
*/

In fact, most of this comment (everything after the first sentence) is
no longer needed.  If the target is dying then kref_get_unless_zero
must have failed, so there is no need to worry about unbalanced
refcounts.

Otherwise this looks good.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] dual scan thread bug fix

2014-01-07 Thread James Bottomley
On Sat, 2014-01-04 at 11:27 -0500, Alan Stern wrote:
 On Fri, 3 Jan 2014, James Bottomley wrote:
 
   I'm still concerned about one thing.  The previous patch does this in
   scsi_alloc_target():
   
  found:
-   found_target-reap_ref++;
+   if (!kref_get_unless_zero(found_target-reap_ref))
+   /*
+* release routine already fired.  Target is dead, but
+* STARGET_DEL may not yet be set (set in the release
+* routine), so set here as well, just in case
+*/
+   found_target-state = STARGET_DEL;
spin_unlock_irqrestore(shost-host_lock, flags);
   
   As a result, the two comments in this patch aren't right:
   
@@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct 
kref *kref)
struct scsi_target *starget
= container_of(kref, struct scsi_target, reap_ref);
 
-   transport_remove_device(starget-dev);
-   device_del(starget-dev);
-   starget-state = STARGET_DEL;
+   /*
+* if we get here and the target is still in the CREATED state 
that
+* means it was allocated but never made visible (because a scan
+* turned up no LUNs), so don't call device_del() on it.
+*/
+   if (starget-state == STARGET_RUNNING) {
+   transport_remove_device(starget-dev);
+   device_del(starget-dev);
+   }
   
   Here the state could already be STARGET_DEL, even though the target is
   still visible.
  
  Well, I agree with the theory.  In practise, there are only a few
  machine instructions between the kref going to zero and us reaching that
  point, because kref_release will jump into this routine next, so the
  condition would be very hard to see.
 
 It's true that the window is very small and not at all likely to be
 hit.  Still, I prefer eliminating such things entirely.
 
   However, I suppose it's easy to
  close by checking for != STARGET_CREATED and there's no reason not to do
  that, so I'll change it.
 
 Checking for != STARGET_CREATED is also wrong in principle.  The state
 could already be STARGET_DEL even though the target was never made
 visible.
 
 The basic problem is that you are relying on the state to be an
 accurate description of the target's visibility, but
 scsi_alloc_target() changes the state without changing the visibility.  
 I really think you should get rid of that extra assignment in
 scsi_alloc_target().

OK, Agreed, but that means modifying the 1/2 patch with the below.  This
should make the proposed diff to 2/2 correct.

James

---
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index ef3f958..5fad646 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -417,7 +417,7 @@ static struct scsi_target *scsi_alloc_target(struct device 
*parent,
+ shost-transportt-target_size;
struct scsi_target *starget;
struct scsi_target *found_target;
-   int error;
+   int error, ref_got;
 
starget = kzalloc(size, GFP_KERNEL);
if (!starget) {
@@ -466,15 +466,15 @@ static struct scsi_target *scsi_alloc_target(struct 
device *parent,
return starget;
 
  found:
-   if (!kref_get_unless_zero(found_target-reap_ref))
-   /*
-* release routine already fired.  Target is dead, but
-* STARGET_DEL may not yet be set (set in the release
-* routine), so set here as well, just in case
-*/
-   found_target-state = STARGET_DEL;
+   /*
+* release routine already fired if kref is zero, so if we can still
+* take the reference, the target must be alive.  If we can't, it must
+* be dying and we need to wait for a new target
+*/
+   ref_got = kref_get_unless_zero(found_target-reap_ref);
+
spin_unlock_irqrestore(shost-host_lock, flags);
-   if (found_target-state != STARGET_DEL) {
+   if (ref_got) {
put_device(dev);
return found_target;
}
@@ -482,8 +482,8 @@ static struct scsi_target *scsi_alloc_target(struct device 
*parent,
 * Unfortunately, we found a dying target; need to wait until it's
 * dead before we can get a new one.  There is an anomaly here.  We
 * *should* call scsi_target_reap() to balance the kref_get() of the
-* reap_ref above.  However, since the target is in state STARGET_DEL,
-* it's already invisible and the reap_ref is irrelevant.  If we call
+* reap_ref above.  However, since the target being released, it's
+* already invisible and the reap_ref is irrelevant.  If we call
 * scsi_target_reap() we might spuriously do another device_del() on
 * an already invisible target.
 */




--
To unsubscribe from this list: send the line 

Re: [RFC 2/2] dual scan thread bug fix

2014-01-04 Thread Alan Stern
On Fri, 3 Jan 2014, James Bottomley wrote:

  I'm still concerned about one thing.  The previous patch does this in
  scsi_alloc_target():
  
 found:
   - found_target-reap_ref++;
   + if (!kref_get_unless_zero(found_target-reap_ref))
   + /*
   +  * release routine already fired.  Target is dead, but
   +  * STARGET_DEL may not yet be set (set in the release
   +  * routine), so set here as well, just in case
   +  */
   + found_target-state = STARGET_DEL;
 spin_unlock_irqrestore(shost-host_lock, flags);
  
  As a result, the two comments in this patch aren't right:
  
   @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref 
   *kref)
 struct scsi_target *starget
 = container_of(kref, struct scsi_target, reap_ref);

   - transport_remove_device(starget-dev);
   - device_del(starget-dev);
   - starget-state = STARGET_DEL;
   + /*
   +  * if we get here and the target is still in the CREATED state that
   +  * means it was allocated but never made visible (because a scan
   +  * turned up no LUNs), so don't call device_del() on it.
   +  */
   + if (starget-state == STARGET_RUNNING) {
   + transport_remove_device(starget-dev);
   + device_del(starget-dev);
   + }
  
  Here the state could already be STARGET_DEL, even though the target is
  still visible.
 
 Well, I agree with the theory.  In practise, there are only a few
 machine instructions between the kref going to zero and us reaching that
 point, because kref_release will jump into this routine next, so the
 condition would be very hard to see.

It's true that the window is very small and not at all likely to be
hit.  Still, I prefer eliminating such things entirely.

  However, I suppose it's easy to
 close by checking for != STARGET_CREATED and there's no reason not to do
 that, so I'll change it.

Checking for != STARGET_CREATED is also wrong in principle.  The state
could already be STARGET_DEL even though the target was never made
visible.

The basic problem is that you are relying on the state to be an
accurate description of the target's visibility, but
scsi_alloc_target() changes the state without changing the visibility.  
I really think you should get rid of that extra assignment in
scsi_alloc_target().

 How does the attached diff look?

Better but still not right.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] dual scan thread bug fix

2014-01-03 Thread Alan Stern
On Thu, 2 Jan 2014, James Bottomley wrote:

 In the highly unusual case where two threads are running concurrently through
 the scanning code scanning the same target, we run into the situation where
 one may allocate the target while the other is still using it.  In this case,
 because the reap checks for STARGET_CREATED and kills the target without
 reference counting, the second thread will do the wrong thing on reap.
 
 Fix this by reference counting even creates and doing the STARGET_CREATED
 check in the final put.

I'm still concerned about one thing.  The previous patch does this in
scsi_alloc_target():

   found:
 - found_target-reap_ref++;
 + if (!kref_get_unless_zero(found_target-reap_ref))
 + /*
 +  * release routine already fired.  Target is dead, but
 +  * STARGET_DEL may not yet be set (set in the release
 +  * routine), so set here as well, just in case
 +  */
 + found_target-state = STARGET_DEL;
   spin_unlock_irqrestore(shost-host_lock, flags);

As a result, the two comments in this patch aren't right:

 @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref 
 *kref)
   struct scsi_target *starget
   = container_of(kref, struct scsi_target, reap_ref);
  
 - transport_remove_device(starget-dev);
 - device_del(starget-dev);
 - starget-state = STARGET_DEL;
 + /*
 +  * if we get here and the target is still in the CREATED state that
 +  * means it was allocated but never made visible (because a scan
 +  * turned up no LUNs), so don't call device_del() on it.
 +  */
 + if (starget-state == STARGET_RUNNING) {
 + transport_remove_device(starget-dev);
 + device_del(starget-dev);
 + }

Here the state could already be STARGET_DEL, even though the target is
still visible.

Also, it's a little odd that the comment talks about CREATED but the 
code really checks for RUNNING.  They should be consistent.

 @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct 
 device *parent,
   */
  void scsi_target_reap(struct scsi_target *starget)
  {
 + /*
 +  * serious problem if this triggers: STARGET_DEL is only set in the
 +  * kref release routine, so we're doing another final put on an
 +  * already released kref
 +  */
   BUG_ON(starget-state == STARGET_DEL);

Here the code is okay but the comment is wrong: STARGET_DEL is set in 
_two_ places (but neither of them runs until reap_ref has reached 0).

Would it be better in scsi_alloc_target() to behave as though the state 
were STARGET_DEL without actually setting it?

Alan Stern


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] dual scan thread bug fix

2014-01-03 Thread James Bottomley

On Fri, 2014-01-03 at 10:58 -0500, Alan Stern wrote:
 On Thu, 2 Jan 2014, James Bottomley wrote:
 
  In the highly unusual case where two threads are running concurrently 
  through
  the scanning code scanning the same target, we run into the situation where
  one may allocate the target while the other is still using it.  In this 
  case,
  because the reap checks for STARGET_CREATED and kills the target without
  reference counting, the second thread will do the wrong thing on reap.
  
  Fix this by reference counting even creates and doing the STARGET_CREATED
  check in the final put.
 
 I'm still concerned about one thing.  The previous patch does this in
 scsi_alloc_target():
 
found:
  -   found_target-reap_ref++;
  +   if (!kref_get_unless_zero(found_target-reap_ref))
  +   /*
  +* release routine already fired.  Target is dead, but
  +* STARGET_DEL may not yet be set (set in the release
  +* routine), so set here as well, just in case
  +*/
  +   found_target-state = STARGET_DEL;
  spin_unlock_irqrestore(shost-host_lock, flags);
 
 As a result, the two comments in this patch aren't right:
 
  @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref 
  *kref)
  struct scsi_target *starget
  = container_of(kref, struct scsi_target, reap_ref);
   
  -   transport_remove_device(starget-dev);
  -   device_del(starget-dev);
  -   starget-state = STARGET_DEL;
  +   /*
  +* if we get here and the target is still in the CREATED state that
  +* means it was allocated but never made visible (because a scan
  +* turned up no LUNs), so don't call device_del() on it.
  +*/
  +   if (starget-state == STARGET_RUNNING) {
  +   transport_remove_device(starget-dev);
  +   device_del(starget-dev);
  +   }
 
 Here the state could already be STARGET_DEL, even though the target is
 still visible.

Well, I agree with the theory.  In practise, there are only a few
machine instructions between the kref going to zero and us reaching that
point, because kref_release will jump into this routine next, so the
condition would be very hard to see.  However, I suppose it's easy to
close by checking for != STARGET_CREATED and there's no reason not to do
that, so I'll change it.

 Also, it's a little odd that the comment talks about CREATED but the 
 code really checks for RUNNING.  They should be consistent.

!= STARGET_CREATED should solve this.

  @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct 
  device *parent,
*/
   void scsi_target_reap(struct scsi_target *starget)
   {
  +   /*
  +* serious problem if this triggers: STARGET_DEL is only set in the
  +* kref release routine, so we're doing another final put on an
  +* already released kref
  +*/
  BUG_ON(starget-state == STARGET_DEL);
 
 Here the code is okay but the comment is wrong: STARGET_DEL is set in 
 _two_ places (but neither of them runs until reap_ref has reached 0).
 
 Would it be better in scsi_alloc_target() to behave as though the state 
 were STARGET_DEL without actually setting it?

Yes, I'll update the comment to it only goes to DEL after the kref goes
to zero.

How does the attached diff look?

James

---

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 82cf902..2f7de33 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -390,7 +390,7 @@ static void scsi_target_reap_ref_release(struct kref *kref)
 * means it was allocated but never made visible (because a scan
 * turned up no LUNs), so don't call device_del() on it.
 */
-   if (starget-state == STARGET_RUNNING) {
+   if (starget-state != STARGET_CREATED) {
transport_remove_device(starget-dev);
device_del(starget-dev);
}
@@ -514,9 +514,9 @@ static struct scsi_target *scsi_alloc_target(struct device 
*parent,
 void scsi_target_reap(struct scsi_target *starget)
 {
/*
-* serious problem if this triggers: STARGET_DEL is only set in the
-* kref release routine, so we're doing another final put on an
-* already released kref
+* serious problem if this triggers: STARGET_DEL is only set in the if
+* the reap_ref drops to zero, so we're trying to do another final put
+* on an already released kref
 */
BUG_ON(starget-state == STARGET_DEL);
scsi_target_reap_ref_put(starget);




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/2] dual scan thread bug fix

2014-01-02 Thread James Bottomley
In the highly unusual case where two threads are running concurrently through
the scanning code scanning the same target, we run into the situation where
one may allocate the target while the other is still using it.  In this case,
because the reap checks for STARGET_CREATED and kills the target without
reference counting, the second thread will do the wrong thing on reap.

Fix this by reference counting even creates and doing the STARGET_CREATED
check in the final put.

Signed-off-by: James Bottomley jbottom...@parallels.com
---
 drivers/scsi/scsi_scan.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index ef3f958..82cf902 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -320,6 +320,7 @@ static void scsi_target_destroy(struct scsi_target *starget)
struct Scsi_Host *shost = dev_to_shost(dev-parent);
unsigned long flags;
 
+   starget-state = STARGET_DEL;
transport_destroy_device(dev);
spin_lock_irqsave(shost-host_lock, flags);
if (shost-hostt-target_destroy)
@@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref)
struct scsi_target *starget
= container_of(kref, struct scsi_target, reap_ref);
 
-   transport_remove_device(starget-dev);
-   device_del(starget-dev);
-   starget-state = STARGET_DEL;
+   /*
+* if we get here and the target is still in the CREATED state that
+* means it was allocated but never made visible (because a scan
+* turned up no LUNs), so don't call device_del() on it.
+*/
+   if (starget-state == STARGET_RUNNING) {
+   transport_remove_device(starget-dev);
+   device_del(starget-dev);
+   }
scsi_target_destroy(starget);
 }
 
@@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct 
device *parent,
  */
 void scsi_target_reap(struct scsi_target *starget)
 {
+   /*
+* serious problem if this triggers: STARGET_DEL is only set in the
+* kref release routine, so we're doing another final put on an
+* already released kref
+*/
BUG_ON(starget-state == STARGET_DEL);
-   if (starget-state == STARGET_CREATED)
-   scsi_target_destroy(starget);
-   else
-   scsi_target_reap_ref_put(starget);
+   scsi_target_reap_ref_put(starget);
 }
 
 /**
-- 
1.8.4.3



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/2] dual scan thread bug fix

2013-12-16 Thread James Bottomley
In the highly unusual case where two threads are running concurrently through
the scanning code scanning the same target, we run into the situation where
one may allocate the target while the other is still using it.  In this case,
because the reap checks for STARGET_CREATED and kills the target without
reference counting, the second thread will do the wrong thing on reap.

Fix this by reference counting even creates and doing the STARGET_CREATED
check in the final put.

---
 drivers/scsi/scsi_scan.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 39e5c85..2d7aafa 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -320,6 +320,7 @@ static void scsi_target_destroy(struct scsi_target *starget)
struct Scsi_Host *shost = dev_to_shost(dev-parent);
unsigned long flags;
 
+   starget-state = STARGET_DEL;
transport_destroy_device(dev);
spin_lock_irqsave(shost-host_lock, flags);
if (shost-hostt-target_destroy)
@@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref)
struct scsi_target *starget
= container_of(kref, struct scsi_target, reap_ref);
 
-   transport_remove_device(starget-dev);
-   device_del(starget-dev);
-   starget-state = STARGET_DEL;
+   /*
+* if we get here and the target is still in the CREATED state that
+* means it was allocated but never made visible (because a scan
+* turned up no LUNs), so don't call device_del() on it.
+*/
+   if (starget-state == STARGET_RUNNING) {
+   transport_remove_device(starget-dev);
+   device_del(starget-dev);
+   }
scsi_target_destroy(starget);
 }
 
@@ -502,11 +509,13 @@ static struct scsi_target *scsi_alloc_target(struct 
device *parent,
  */
 void scsi_target_reap(struct scsi_target *starget)
 {
+   /*
+* serious problem if this triggers: STARGET_DEL is only set in the
+* kref release routine, so we're doing another final put on an
+* already released kref
+*/
BUG_ON(starget-state == STARGET_DEL);
-   if (starget-state == STARGET_CREATED)
-   scsi_target_destroy(starget);
-   else
-   scsi_target_reap_ref_put(starget);
+   scsi_target_reap_ref_put(starget);
 }
 
 /**
-- 
1.8.4.3



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html