Re: [RFC 2/2] dual scan thread bug fix
On Wed, 8 Jan 2014, James Bottomley wrote: OK, Agreed, but that means modifying the 1/2 patch with the below. This should make the proposed diff to 2/2 correct. James --- diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index ef3f958..5fad646 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -417,7 +417,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, + shost-transportt-target_size; struct scsi_target *starget; struct scsi_target *found_target; - int error; + int error, ref_got; starget = kzalloc(size, GFP_KERNEL); if (!starget) { @@ -466,15 +466,15 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, return starget; found: - if (!kref_get_unless_zero(found_target-reap_ref)) - /* - * release routine already fired. Target is dead, but - * STARGET_DEL may not yet be set (set in the release - * routine), so set here as well, just in case - */ - found_target-state = STARGET_DEL; + /* + * release routine already fired if kref is zero, so if we can still + * take the reference, the target must be alive. If we can't, it must + * be dying and we need to wait for a new target + */ + ref_got = kref_get_unless_zero(found_target-reap_ref); + spin_unlock_irqrestore(shost-host_lock, flags); - if (found_target-state != STARGET_DEL) { + if (ref_got) { put_device(dev); return found_target; } @@ -482,8 +482,8 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, * Unfortunately, we found a dying target; need to wait until it's * dead before we can get a new one. There is an anomaly here. We * *should* call scsi_target_reap() to balance the kref_get() of the - * reap_ref above. However, since the target is in state STARGET_DEL, - * it's already invisible and the reap_ref is irrelevant. If we call + * reap_ref above. However, since the target being released, it's + * already invisible and the reap_ref is irrelevant. If we call * scsi_target_reap() we might spuriously do another device_del() on * an already invisible target. */ In fact, most of this comment (everything after the first sentence) is no longer needed. If the target is dying then kref_get_unless_zero must have failed, so there is no need to worry about unbalanced refcounts. Otherwise this looks good. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] dual scan thread bug fix
On Sat, 2014-01-04 at 11:27 -0500, Alan Stern wrote: On Fri, 3 Jan 2014, James Bottomley wrote: I'm still concerned about one thing. The previous patch does this in scsi_alloc_target(): found: - found_target-reap_ref++; + if (!kref_get_unless_zero(found_target-reap_ref)) + /* +* release routine already fired. Target is dead, but +* STARGET_DEL may not yet be set (set in the release +* routine), so set here as well, just in case +*/ + found_target-state = STARGET_DEL; spin_unlock_irqrestore(shost-host_lock, flags); As a result, the two comments in this patch aren't right: @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* +* if we get here and the target is still in the CREATED state that +* means it was allocated but never made visible (because a scan +* turned up no LUNs), so don't call device_del() on it. +*/ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } Here the state could already be STARGET_DEL, even though the target is still visible. Well, I agree with the theory. In practise, there are only a few machine instructions between the kref going to zero and us reaching that point, because kref_release will jump into this routine next, so the condition would be very hard to see. It's true that the window is very small and not at all likely to be hit. Still, I prefer eliminating such things entirely. However, I suppose it's easy to close by checking for != STARGET_CREATED and there's no reason not to do that, so I'll change it. Checking for != STARGET_CREATED is also wrong in principle. The state could already be STARGET_DEL even though the target was never made visible. The basic problem is that you are relying on the state to be an accurate description of the target's visibility, but scsi_alloc_target() changes the state without changing the visibility. I really think you should get rid of that extra assignment in scsi_alloc_target(). OK, Agreed, but that means modifying the 1/2 patch with the below. This should make the proposed diff to 2/2 correct. James --- diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index ef3f958..5fad646 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -417,7 +417,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, + shost-transportt-target_size; struct scsi_target *starget; struct scsi_target *found_target; - int error; + int error, ref_got; starget = kzalloc(size, GFP_KERNEL); if (!starget) { @@ -466,15 +466,15 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, return starget; found: - if (!kref_get_unless_zero(found_target-reap_ref)) - /* -* release routine already fired. Target is dead, but -* STARGET_DEL may not yet be set (set in the release -* routine), so set here as well, just in case -*/ - found_target-state = STARGET_DEL; + /* +* release routine already fired if kref is zero, so if we can still +* take the reference, the target must be alive. If we can't, it must +* be dying and we need to wait for a new target +*/ + ref_got = kref_get_unless_zero(found_target-reap_ref); + spin_unlock_irqrestore(shost-host_lock, flags); - if (found_target-state != STARGET_DEL) { + if (ref_got) { put_device(dev); return found_target; } @@ -482,8 +482,8 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, * Unfortunately, we found a dying target; need to wait until it's * dead before we can get a new one. There is an anomaly here. We * *should* call scsi_target_reap() to balance the kref_get() of the -* reap_ref above. However, since the target is in state STARGET_DEL, -* it's already invisible and the reap_ref is irrelevant. If we call +* reap_ref above. However, since the target being released, it's +* already invisible and the reap_ref is irrelevant. If we call * scsi_target_reap() we might spuriously do another device_del() on * an already invisible target. */ -- To unsubscribe from this list: send the line
Re: [RFC 2/2] dual scan thread bug fix
On Fri, 3 Jan 2014, James Bottomley wrote: I'm still concerned about one thing. The previous patch does this in scsi_alloc_target(): found: - found_target-reap_ref++; + if (!kref_get_unless_zero(found_target-reap_ref)) + /* + * release routine already fired. Target is dead, but + * STARGET_DEL may not yet be set (set in the release + * routine), so set here as well, just in case + */ + found_target-state = STARGET_DEL; spin_unlock_irqrestore(shost-host_lock, flags); As a result, the two comments in this patch aren't right: @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* + * if we get here and the target is still in the CREATED state that + * means it was allocated but never made visible (because a scan + * turned up no LUNs), so don't call device_del() on it. + */ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } Here the state could already be STARGET_DEL, even though the target is still visible. Well, I agree with the theory. In practise, there are only a few machine instructions between the kref going to zero and us reaching that point, because kref_release will jump into this routine next, so the condition would be very hard to see. It's true that the window is very small and not at all likely to be hit. Still, I prefer eliminating such things entirely. However, I suppose it's easy to close by checking for != STARGET_CREATED and there's no reason not to do that, so I'll change it. Checking for != STARGET_CREATED is also wrong in principle. The state could already be STARGET_DEL even though the target was never made visible. The basic problem is that you are relying on the state to be an accurate description of the target's visibility, but scsi_alloc_target() changes the state without changing the visibility. I really think you should get rid of that extra assignment in scsi_alloc_target(). How does the attached diff look? Better but still not right. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] dual scan thread bug fix
On Thu, 2 Jan 2014, James Bottomley wrote: In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. I'm still concerned about one thing. The previous patch does this in scsi_alloc_target(): found: - found_target-reap_ref++; + if (!kref_get_unless_zero(found_target-reap_ref)) + /* + * release routine already fired. Target is dead, but + * STARGET_DEL may not yet be set (set in the release + * routine), so set here as well, just in case + */ + found_target-state = STARGET_DEL; spin_unlock_irqrestore(shost-host_lock, flags); As a result, the two comments in this patch aren't right: @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* + * if we get here and the target is still in the CREATED state that + * means it was allocated but never made visible (because a scan + * turned up no LUNs), so don't call device_del() on it. + */ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } Here the state could already be STARGET_DEL, even though the target is still visible. Also, it's a little odd that the comment talks about CREATED but the code really checks for RUNNING. They should be consistent. @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, */ void scsi_target_reap(struct scsi_target *starget) { + /* + * serious problem if this triggers: STARGET_DEL is only set in the + * kref release routine, so we're doing another final put on an + * already released kref + */ BUG_ON(starget-state == STARGET_DEL); Here the code is okay but the comment is wrong: STARGET_DEL is set in _two_ places (but neither of them runs until reap_ref has reached 0). Would it be better in scsi_alloc_target() to behave as though the state were STARGET_DEL without actually setting it? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] dual scan thread bug fix
On Fri, 2014-01-03 at 10:58 -0500, Alan Stern wrote: On Thu, 2 Jan 2014, James Bottomley wrote: In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. I'm still concerned about one thing. The previous patch does this in scsi_alloc_target(): found: - found_target-reap_ref++; + if (!kref_get_unless_zero(found_target-reap_ref)) + /* +* release routine already fired. Target is dead, but +* STARGET_DEL may not yet be set (set in the release +* routine), so set here as well, just in case +*/ + found_target-state = STARGET_DEL; spin_unlock_irqrestore(shost-host_lock, flags); As a result, the two comments in this patch aren't right: @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* +* if we get here and the target is still in the CREATED state that +* means it was allocated but never made visible (because a scan +* turned up no LUNs), so don't call device_del() on it. +*/ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } Here the state could already be STARGET_DEL, even though the target is still visible. Well, I agree with the theory. In practise, there are only a few machine instructions between the kref going to zero and us reaching that point, because kref_release will jump into this routine next, so the condition would be very hard to see. However, I suppose it's easy to close by checking for != STARGET_CREATED and there's no reason not to do that, so I'll change it. Also, it's a little odd that the comment talks about CREATED but the code really checks for RUNNING. They should be consistent. != STARGET_CREATED should solve this. @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, */ void scsi_target_reap(struct scsi_target *starget) { + /* +* serious problem if this triggers: STARGET_DEL is only set in the +* kref release routine, so we're doing another final put on an +* already released kref +*/ BUG_ON(starget-state == STARGET_DEL); Here the code is okay but the comment is wrong: STARGET_DEL is set in _two_ places (but neither of them runs until reap_ref has reached 0). Would it be better in scsi_alloc_target() to behave as though the state were STARGET_DEL without actually setting it? Yes, I'll update the comment to it only goes to DEL after the kref goes to zero. How does the attached diff look? James --- diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 82cf902..2f7de33 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -390,7 +390,7 @@ static void scsi_target_reap_ref_release(struct kref *kref) * means it was allocated but never made visible (because a scan * turned up no LUNs), so don't call device_del() on it. */ - if (starget-state == STARGET_RUNNING) { + if (starget-state != STARGET_CREATED) { transport_remove_device(starget-dev); device_del(starget-dev); } @@ -514,9 +514,9 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, void scsi_target_reap(struct scsi_target *starget) { /* -* serious problem if this triggers: STARGET_DEL is only set in the -* kref release routine, so we're doing another final put on an -* already released kref +* serious problem if this triggers: STARGET_DEL is only set in the if +* the reap_ref drops to zero, so we're trying to do another final put +* on an already released kref */ BUG_ON(starget-state == STARGET_DEL); scsi_target_reap_ref_put(starget); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/2] dual scan thread bug fix
In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. Signed-off-by: James Bottomley jbottom...@parallels.com --- drivers/scsi/scsi_scan.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index ef3f958..82cf902 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -320,6 +320,7 @@ static void scsi_target_destroy(struct scsi_target *starget) struct Scsi_Host *shost = dev_to_shost(dev-parent); unsigned long flags; + starget-state = STARGET_DEL; transport_destroy_device(dev); spin_lock_irqsave(shost-host_lock, flags); if (shost-hostt-target_destroy) @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* +* if we get here and the target is still in the CREATED state that +* means it was allocated but never made visible (because a scan +* turned up no LUNs), so don't call device_del() on it. +*/ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } scsi_target_destroy(starget); } @@ -506,11 +513,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, */ void scsi_target_reap(struct scsi_target *starget) { + /* +* serious problem if this triggers: STARGET_DEL is only set in the +* kref release routine, so we're doing another final put on an +* already released kref +*/ BUG_ON(starget-state == STARGET_DEL); - if (starget-state == STARGET_CREATED) - scsi_target_destroy(starget); - else - scsi_target_reap_ref_put(starget); + scsi_target_reap_ref_put(starget); } /** -- 1.8.4.3 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/2] dual scan thread bug fix
In the highly unusual case where two threads are running concurrently through the scanning code scanning the same target, we run into the situation where one may allocate the target while the other is still using it. In this case, because the reap checks for STARGET_CREATED and kills the target without reference counting, the second thread will do the wrong thing on reap. Fix this by reference counting even creates and doing the STARGET_CREATED check in the final put. --- drivers/scsi/scsi_scan.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 39e5c85..2d7aafa 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -320,6 +320,7 @@ static void scsi_target_destroy(struct scsi_target *starget) struct Scsi_Host *shost = dev_to_shost(dev-parent); unsigned long flags; + starget-state = STARGET_DEL; transport_destroy_device(dev); spin_lock_irqsave(shost-host_lock, flags); if (shost-hostt-target_destroy) @@ -384,9 +385,15 @@ static void scsi_target_reap_ref_release(struct kref *kref) struct scsi_target *starget = container_of(kref, struct scsi_target, reap_ref); - transport_remove_device(starget-dev); - device_del(starget-dev); - starget-state = STARGET_DEL; + /* +* if we get here and the target is still in the CREATED state that +* means it was allocated but never made visible (because a scan +* turned up no LUNs), so don't call device_del() on it. +*/ + if (starget-state == STARGET_RUNNING) { + transport_remove_device(starget-dev); + device_del(starget-dev); + } scsi_target_destroy(starget); } @@ -502,11 +509,13 @@ static struct scsi_target *scsi_alloc_target(struct device *parent, */ void scsi_target_reap(struct scsi_target *starget) { + /* +* serious problem if this triggers: STARGET_DEL is only set in the +* kref release routine, so we're doing another final put on an +* already released kref +*/ BUG_ON(starget-state == STARGET_DEL); - if (starget-state == STARGET_CREATED) - scsi_target_destroy(starget); - else - scsi_target_reap_ref_put(starget); + scsi_target_reap_ref_put(starget); } /** -- 1.8.4.3 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html