RE: [PATCH] scsi_debug: deadlock between completions and surprise module removal
> -Original Message- > From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- > ow...@vger.kernel.org] On Behalf Of Christoph Hellwig > Sent: Thursday, 25 September, 2014 7:13 AM > To: Douglas Gilbert > Cc: SCSI development list; linux-kernel; James Bottomley; Christoph Hellwig; > Milan Broz > Subject: Re: [PATCH] scsi_debug: deadlock between completions and surprise > module removal > > Review ping again? > > While I think the shutdown code in scsi_debug needs a bit more of an > overhault I'd really like to include the fix at least for 3.18 and > 3.17-stable now that we have missed the 3.17 window. > > On Sun, Aug 31, 2014 at 07:09:59PM -0400, Douglas Gilbert wrote: > > A deadlock has been reported when the completion > > of SCSI commands (simulated by a timer) was surprised > > by a module removal. This patch removes one half of > > the offending locks around timer deletions. This fix > > is applied both to stop_all_queued() which is were > > the deadlock was discovered and stop_queued_cmnd() > > which has very similar logic. > > > > This patch should be applied both to the lk 3.17 tree > > and Christoph's drivers-for-3.18 tree. > > > > Tested-and-reported-by: Milan Broz > > Signed-off-by: Douglas Gilbert Reviewed-by: Robert Elliott -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
Review ping again? While I think the shutdown code in scsi_debug needs a bit more of an overhault I'd really like to include the fix at least for 3.18 and 3.17-stable now that we have missed the 3.17 window. On Sun, Aug 31, 2014 at 07:09:59PM -0400, Douglas Gilbert wrote: > A deadlock has been reported when the completion > of SCSI commands (simulated by a timer) was surprised > by a module removal. This patch removes one half of > the offending locks around timer deletions. This fix > is applied both to stop_all_queued() which is were > the deadlock was discovered and stop_queued_cmnd() > which has very similar logic. > > This patch should be applied both to the lk 3.17 tree > and Christoph's drivers-for-3.18 tree. > > Tested-and-reported-by: Milan Broz > Signed-off-by: Douglas Gilbert > --- a/drivers/scsi/scsi_debug.c 2014-08-26 13:24:51.646948507 -0400 > +++ b/drivers/scsi/scsi_debug.c 2014-08-30 18:04:54.589226679 -0400 > @@ -2743,6 +2743,13 @@ static int stop_queued_cmnd(struct scsi_ > if (test_bit(k, queued_in_use_bm)) { > sqcp = &queued_arr[k]; > if (cmnd == sqcp->a_cmnd) { > + devip = (struct sdebug_dev_info *) > + cmnd->device->hostdata; > + if (devip) > + atomic_dec(&devip->num_in_q); > + sqcp->a_cmnd = NULL; > + spin_unlock_irqrestore(&queued_arr_lock, > +iflags); > if (scsi_debug_ndelay > 0) { > if (sqcp->sd_hrtp) > hrtimer_cancel( > @@ -2755,18 +2762,13 @@ static int stop_queued_cmnd(struct scsi_ > if (sqcp->tletp) > tasklet_kill(sqcp->tletp); > } > - __clear_bit(k, queued_in_use_bm); > - devip = (struct sdebug_dev_info *) > - cmnd->device->hostdata; > - if (devip) > - atomic_dec(&devip->num_in_q); > - sqcp->a_cmnd = NULL; > - break; > + clear_bit(k, queued_in_use_bm); > + return 1; > } > } > } > spin_unlock_irqrestore(&queued_arr_lock, iflags); > - return (k < qmax) ? 1 : 0; > + return 0; > } > > /* Deletes (stops) timers or tasklets of all queued commands */ > @@ -2782,6 +2784,13 @@ static void stop_all_queued(void) > if (test_bit(k, queued_in_use_bm)) { > sqcp = &queued_arr[k]; > if (sqcp->a_cmnd) { > + devip = (struct sdebug_dev_info *) > + sqcp->a_cmnd->device->hostdata; > + if (devip) > + atomic_dec(&devip->num_in_q); > + sqcp->a_cmnd = NULL; > + spin_unlock_irqrestore(&queued_arr_lock, > +iflags); > if (scsi_debug_ndelay > 0) { > if (sqcp->sd_hrtp) > hrtimer_cancel( > @@ -2794,12 +2803,8 @@ static void stop_all_queued(void) > if (sqcp->tletp) > tasklet_kill(sqcp->tletp); > } > - __clear_bit(k, queued_in_use_bm); > - devip = (struct sdebug_dev_info *) > - sqcp->a_cmnd->device->hostdata; > - if (devip) > - atomic_dec(&devip->num_in_q); > - sqcp->a_cmnd = NULL; > + clear_bit(k, queued_in_use_bm); > + spin_lock_irqsave(&queued_arr_lock, iflags); > } > } > } ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On Mon, Sep 08, 2014 at 04:31:01PM -0400, Douglas Gilbert wrote: > stop_all_queued() is doing hrtimer_cancel(), del_timer_sync() > or tasklet_kill() on all the scsi_cmnd objects that are > "in play". Unless another mechanism calls the .eh_abort_handler > entry point reliably on each "in play" command then the module > cannot be removed. That is because some timer expiry callbacks > are pending. scsi_remove_host disabled all queueing of new commands, so all these timers and tasklets will eventually expire or run and allow the removal to complete. Of course this could be sped up by cancelling them, but you don't need the sync version > >Something like the (untested) patch below would do the trick. > >We'd still need Dougs patch for the EH case, though. > > The only other call to stop_all_queued() is from the > .eh_host_reset_handler entry point. True, but you also have stop_queued_cmnd for a abort case which also needs that treatment. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On 14-09-08 11:07 AM, Christoph Hellwig wrote: On Mon, Sep 08, 2014 at 11:11:23AM +0200, Bart Van Assche wrote: Hello Doug, In the scsi_debug driver scsi_remove_host() is called from inside the sdebug_driver_remove() callback function. Unless I have missed something it is not guaranteed that that callback function is invoked before unloading of the scsi_debug driver has finished. I think most of the code in sdebug_driver_remove() should be moved to sdebug_remove_adapter(). I'm not sure that's right. scsi_debug uses the driver mode in a slightly unusual way, and includes both the bus driver, device and device driver. sdebug_driver_remove is a bus method, but as we don't have driver methods should act very much like all other _remove callbacks. sdebug_remove_adapter is more a "bus-level" function that calls into the driver model to unbind devices from the driver. But we defintively shouldn't stop and free queued command before we fully remove the hosts. As far as I can tell the stop_all_queued call can be entirely removed from the remove path, as the midlayer will take care of waiting for all commands to return, and the free_all_queued should be after all hosts are unregistered. stop_all_queued() is doing hrtimer_cancel(), del_timer_sync() or tasklet_kill() on all the scsi_cmnd objects that are "in play". Unless another mechanism calls the .eh_abort_handler entry point reliably on each "in play" command then the module cannot be removed. That is because some timer expiry callbacks are pending. Something like the (untested) patch below would do the trick. We'd still need Dougs patch for the EH case, though. diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index d19c0e3..d022c2f 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -3983,14 +3983,13 @@ static void __exit scsi_debug_exit(void) { int k = scsi_debug_add_host; - stop_all_queued(); - free_all_queued(); for (; k; k--) sdebug_remove_adapter(); driver_unregister(&sdebug_driverfs_driver); bus_unregister(&pseudo_lld_bus); root_device_unregister(pseudo_primary); + free_all_queued(); if (dif_storep) vfree(dif_storep); -- The only other call to stop_all_queued() is from the .eh_host_reset_handler entry point. Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On Mon, Sep 08, 2014 at 11:11:23AM +0200, Bart Van Assche wrote: > Hello Doug, > > In the scsi_debug driver scsi_remove_host() is called from inside the > sdebug_driver_remove() callback function. Unless I have missed something it > is not guaranteed that that callback function is invoked before unloading of > the scsi_debug driver has finished. I think most of the code in > sdebug_driver_remove() should be moved to sdebug_remove_adapter(). I'm not sure that's right. scsi_debug uses the driver mode in a slightly unusual way, and includes both the bus driver, device and device driver. sdebug_driver_remove is a bus method, but as we don't have driver methods should act very much like all other _remove callbacks. sdebug_remove_adapter is more a "bus-level" function that calls into the driver model to unbind devices from the driver. But we defintively shouldn't stop and free queued command before we fully remove the hosts. As far as I can tell the stop_all_queued call can be entirely removed from the remove path, as the midlayer will take care of waiting for all commands to return, and the free_all_queued should be after all hosts are unregistered. Something like the (untested) patch below would do the trick. We'd still need Dougs patch for the EH case, though. diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index d19c0e3..d022c2f 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -3983,14 +3983,13 @@ static void __exit scsi_debug_exit(void) { int k = scsi_debug_add_host; - stop_all_queued(); - free_all_queued(); for (; k; k--) sdebug_remove_adapter(); driver_unregister(&sdebug_driverfs_driver); bus_unregister(&pseudo_lld_bus); root_device_unregister(pseudo_primary); + free_all_queued(); if (dif_storep) vfree(dif_storep); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On 09/06/14 16:40, Douglas Gilbert wrote: On 14-09-05 11:25 AM, Bart Van Assche wrote: An LLD must call scsi_remove_host() directly or indirectly from the module cleanup path. scsi_remove_host() triggers a call to blk_cleanup_queue(). That last function sets the flag QUEUE_FLAG_DYING which prevents that new I/O is queued and waits until previously queued requests have finished before returning. And they do call scsi_remove_host(). But they do that toward the end of their clean-up. The problem that I observed has already happened before that. IOW I think the QUEUE_FLAG_DYING state needs to be set and acknowledged as the first order of business by the code that implements 'rmmod LLD'. Hello Doug, In the scsi_debug driver scsi_remove_host() is called from inside the sdebug_driver_remove() callback function. Unless I have missed something it is not guaranteed that that callback function is invoked before unloading of the scsi_debug driver has finished. I think most of the code in sdebug_driver_remove() should be moved to sdebug_remove_adapter(). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On Sat, Sep 06, 2014 at 10:40:06AM -0400, Douglas Gilbert wrote: > And they do call scsi_remove_host(). But they do that toward > the end of their clean-up. The problem that I observed has > already happened before that. > > IOW I think the QUEUE_FLAG_DYING state needs to be set and > acknowledged as the first order of business by the code > that implements 'rmmod LLD'. That's how driver should implement their ->remove driver callback: foo_remove() { scsi_remove_host() < actual cleanup here> scsi_host_put(); } if a driver doesn't do that, thats a bug in the driver which needs fixing. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On 14-09-05 11:25 AM, Bart Van Assche wrote: On 09/05/14 15:56, Douglas Gilbert wrote: With scsi-mq I think many LLDs probably have a new race possibility between a surprise rmmod of the LLD and another thread presenting a new command at about the same time (or another thread's command completing around that time). Does anything above the LLD stop this happening? Looking at mpt3sas and hpsa module exit calls, they don't seem to guard against this possibility. The test is pretty easy: build the LLD as a module, load it and fire up a multi-thread, libaio fio test on one or more devices (SSDs would probably be good) on that LLD. While the test is running, do 'rmmod LLD'. An LLD must call scsi_remove_host() directly or indirectly from the module cleanup path. scsi_remove_host() triggers a call to blk_cleanup_queue(). That last function sets the flag QUEUE_FLAG_DYING which prevents that new I/O is queued and waits until previously queued requests have finished before returning. And they do call scsi_remove_host(). But they do that toward the end of their clean-up. The problem that I observed has already happened before that. IOW I think the QUEUE_FLAG_DYING state needs to be set and acknowledged as the first order of business by the code that implements 'rmmod LLD'. Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On 09/05/14 15:56, Douglas Gilbert wrote: With scsi-mq I think many LLDs probably have a new race possibility between a surprise rmmod of the LLD and another thread presenting a new command at about the same time (or another thread's command completing around that time). Does anything above the LLD stop this happening? Looking at mpt3sas and hpsa module exit calls, they don't seem to guard against this possibility. The test is pretty easy: build the LLD as a module, load it and fire up a multi-thread, libaio fio test on one or more devices (SSDs would probably be good) on that LLD. While the test is running, do 'rmmod LLD'. An LLD must call scsi_remove_host() directly or indirectly from the module cleanup path. scsi_remove_host() triggers a call to blk_cleanup_queue(). That last function sets the flag QUEUE_FLAG_DYING which prevents that new I/O is queued and waits until previously queued requests have finished before returning. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
With scsi-mq I think many LLDs probably have a new race possibility between a surprise rmmod of the LLD and another thread presenting a new command at about the same time (or another thread's command completing around that time). Does anything above the LLD stop this happening? Looking at mpt3sas and hpsa module exit calls, they don't seem to guard against this possibility. The test is pretty easy: build the LLD as a module, load it and fire up a multi-thread, libaio fio test on one or more devices (SSDs would probably be good) on that LLD. While the test is running, do 'rmmod LLD'. Doug Gilbert On 14-09-05 01:24 AM, Christoph Hellwig wrote: Can I get another review for this one? On Sun, Aug 31, 2014 at 07:09:59PM -0400, Douglas Gilbert wrote: A deadlock has been reported when the completion of SCSI commands (simulated by a timer) was surprised by a module removal. This patch removes one half of the offending locks around timer deletions. This fix is applied both to stop_all_queued() which is were the deadlock was discovered and stop_queued_cmnd() which has very similar logic. This patch should be applied both to the lk 3.17 tree and Christoph's drivers-for-3.18 tree. Tested-and-reported-by: Milan Broz Signed-off-by: Douglas Gilbert --- a/drivers/scsi/scsi_debug.c 2014-08-26 13:24:51.646948507 -0400 +++ b/drivers/scsi/scsi_debug.c 2014-08-30 18:04:54.589226679 -0400 @@ -2743,6 +2743,13 @@ static int stop_queued_cmnd(struct scsi_ if (test_bit(k, queued_in_use_bm)) { sqcp = &queued_arr[k]; if (cmnd == sqcp->a_cmnd) { + devip = (struct sdebug_dev_info *) + cmnd->device->hostdata; + if (devip) + atomic_dec(&devip->num_in_q); + sqcp->a_cmnd = NULL; + spin_unlock_irqrestore(&queued_arr_lock, + iflags); if (scsi_debug_ndelay > 0) { if (sqcp->sd_hrtp) hrtimer_cancel( @@ -2755,18 +2762,13 @@ static int stop_queued_cmnd(struct scsi_ if (sqcp->tletp) tasklet_kill(sqcp->tletp); } - __clear_bit(k, queued_in_use_bm); - devip = (struct sdebug_dev_info *) - cmnd->device->hostdata; - if (devip) - atomic_dec(&devip->num_in_q); - sqcp->a_cmnd = NULL; - break; + clear_bit(k, queued_in_use_bm); + return 1; } } } spin_unlock_irqrestore(&queued_arr_lock, iflags); - return (k < qmax) ? 1 : 0; + return 0; } /* Deletes (stops) timers or tasklets of all queued commands */ @@ -2782,6 +2784,13 @@ static void stop_all_queued(void) if (test_bit(k, queued_in_use_bm)) { sqcp = &queued_arr[k]; if (sqcp->a_cmnd) { + devip = (struct sdebug_dev_info *) + sqcp->a_cmnd->device->hostdata; + if (devip) + atomic_dec(&devip->num_in_q); + sqcp->a_cmnd = NULL; + spin_unlock_irqrestore(&queued_arr_lock, + iflags); if (scsi_debug_ndelay > 0) { if (sqcp->sd_hrtp) hrtimer_cancel( @@ -2794,12 +2803,8 @@ static void stop_all_queued(void) if (sqcp->tletp) tasklet_kill(sqcp->tletp); } - __clear_bit(k, queued_in_use_bm); - devip = (struct sdebug_dev_info *) - sqcp->a_cmnd->device->hostdata; - if (devip) - atomic_dec(&devip->num_in_q); - sqcp->a_cmnd = NULL; + clear_bit(k, queued_in_use_bm); + spin_lock_irqsave(&queued_arr_lock, iflags); } } } ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
Can I get another review for this one? On Sun, Aug 31, 2014 at 07:09:59PM -0400, Douglas Gilbert wrote: > A deadlock has been reported when the completion > of SCSI commands (simulated by a timer) was surprised > by a module removal. This patch removes one half of > the offending locks around timer deletions. This fix > is applied both to stop_all_queued() which is were > the deadlock was discovered and stop_queued_cmnd() > which has very similar logic. > > This patch should be applied both to the lk 3.17 tree > and Christoph's drivers-for-3.18 tree. > > Tested-and-reported-by: Milan Broz > Signed-off-by: Douglas Gilbert > --- a/drivers/scsi/scsi_debug.c 2014-08-26 13:24:51.646948507 -0400 > +++ b/drivers/scsi/scsi_debug.c 2014-08-30 18:04:54.589226679 -0400 > @@ -2743,6 +2743,13 @@ static int stop_queued_cmnd(struct scsi_ > if (test_bit(k, queued_in_use_bm)) { > sqcp = &queued_arr[k]; > if (cmnd == sqcp->a_cmnd) { > + devip = (struct sdebug_dev_info *) > + cmnd->device->hostdata; > + if (devip) > + atomic_dec(&devip->num_in_q); > + sqcp->a_cmnd = NULL; > + spin_unlock_irqrestore(&queued_arr_lock, > +iflags); > if (scsi_debug_ndelay > 0) { > if (sqcp->sd_hrtp) > hrtimer_cancel( > @@ -2755,18 +2762,13 @@ static int stop_queued_cmnd(struct scsi_ > if (sqcp->tletp) > tasklet_kill(sqcp->tletp); > } > - __clear_bit(k, queued_in_use_bm); > - devip = (struct sdebug_dev_info *) > - cmnd->device->hostdata; > - if (devip) > - atomic_dec(&devip->num_in_q); > - sqcp->a_cmnd = NULL; > - break; > + clear_bit(k, queued_in_use_bm); > + return 1; > } > } > } > spin_unlock_irqrestore(&queued_arr_lock, iflags); > - return (k < qmax) ? 1 : 0; > + return 0; > } > > /* Deletes (stops) timers or tasklets of all queued commands */ > @@ -2782,6 +2784,13 @@ static void stop_all_queued(void) > if (test_bit(k, queued_in_use_bm)) { > sqcp = &queued_arr[k]; > if (sqcp->a_cmnd) { > + devip = (struct sdebug_dev_info *) > + sqcp->a_cmnd->device->hostdata; > + if (devip) > + atomic_dec(&devip->num_in_q); > + sqcp->a_cmnd = NULL; > + spin_unlock_irqrestore(&queued_arr_lock, > +iflags); > if (scsi_debug_ndelay > 0) { > if (sqcp->sd_hrtp) > hrtimer_cancel( > @@ -2794,12 +2803,8 @@ static void stop_all_queued(void) > if (sqcp->tletp) > tasklet_kill(sqcp->tletp); > } > - __clear_bit(k, queued_in_use_bm); > - devip = (struct sdebug_dev_info *) > - sqcp->a_cmnd->device->hostdata; > - if (devip) > - atomic_dec(&devip->num_in_q); > - sqcp->a_cmnd = NULL; > + clear_bit(k, queued_in_use_bm); > + spin_lock_irqsave(&queued_arr_lock, iflags); > } > } > } ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
On 14-09-01 11:36 AM, Christoph Hellwig wrote: --- a/drivers/scsi/scsi_debug.c 2014-08-26 13:24:51.646948507 -0400 +++ b/drivers/scsi/scsi_debug.c 2014-08-30 18:04:54.589226679 -0400 @@ -2743,6 +2743,13 @@ static int stop_queued_cmnd(struct scsi_ if (test_bit(k, queued_in_use_bm)) { sqcp = &queued_arr[k]; if (cmnd == sqcp->a_cmnd) { + devip = (struct sdebug_dev_info *) + cmnd->device->hostdata; + if (devip) + atomic_dec(&devip->num_in_q); + sqcp->a_cmnd = NULL; Why would the hostdata every be NULL here? We should never call ->slave_destroy on a device that has outstanding commands. To your first question, perhaps it could not be. In the scsi_debug driver almost all uses of 'devip' check for NULL, so it may not always have been true. 'rmmod scsi_debug' would lead to scsi_debug_exit() being called and that called stop_all_queued() which deadlocked with a command completion, or worse command commencement. scsi_debug_exit() looks a bit racy: the first thing it does is stop_all_queued() but has anything been done to stop new commands being issued? Later scsi_debug_exit() brings down the hosts. This patch is primarily a bug fix for the lk 3.17 series and the code you highlight was simply moved from being under a lock to outside that lock. I didn't want to be too creative, it's too easy to slip up and upset the management. Enhancements could be sent to your drivers-for-3.18 tree. Rob Elliott already has a few in mind, to improve performance. Removing all 'devip' NULL checks would also improve performance, and readability. Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_debug: deadlock between completions and surprise module removal
> --- a/drivers/scsi/scsi_debug.c 2014-08-26 13:24:51.646948507 -0400 > +++ b/drivers/scsi/scsi_debug.c 2014-08-30 18:04:54.589226679 -0400 > @@ -2743,6 +2743,13 @@ static int stop_queued_cmnd(struct scsi_ > if (test_bit(k, queued_in_use_bm)) { > sqcp = &queued_arr[k]; > if (cmnd == sqcp->a_cmnd) { > + devip = (struct sdebug_dev_info *) > + cmnd->device->hostdata; > + if (devip) > + atomic_dec(&devip->num_in_q); > + sqcp->a_cmnd = NULL; Why would the hostdata every be NULL here? We should never call ->slave_destroy on a device that has outstanding commands. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] scsi_debug: deadlock between completions and surprise module removal
A deadlock has been reported when the completion of SCSI commands (simulated by a timer) was surprised by a module removal. This patch removes one half of the offending locks around timer deletions. This fix is applied both to stop_all_queued() which is were the deadlock was discovered and stop_queued_cmnd() which has very similar logic. This patch should be applied both to the lk 3.17 tree and Christoph's drivers-for-3.18 tree. Tested-and-reported-by: Milan Broz Signed-off-by: Douglas Gilbert --- a/drivers/scsi/scsi_debug.c 2014-08-26 13:24:51.646948507 -0400 +++ b/drivers/scsi/scsi_debug.c 2014-08-30 18:04:54.589226679 -0400 @@ -2743,6 +2743,13 @@ static int stop_queued_cmnd(struct scsi_ if (test_bit(k, queued_in_use_bm)) { sqcp = &queued_arr[k]; if (cmnd == sqcp->a_cmnd) { +devip = (struct sdebug_dev_info *) + cmnd->device->hostdata; +if (devip) + atomic_dec(&devip->num_in_q); +sqcp->a_cmnd = NULL; +spin_unlock_irqrestore(&queued_arr_lock, + iflags); if (scsi_debug_ndelay > 0) { if (sqcp->sd_hrtp) hrtimer_cancel( @@ -2755,18 +2762,13 @@ static int stop_queued_cmnd(struct scsi_ if (sqcp->tletp) tasklet_kill(sqcp->tletp); } -__clear_bit(k, queued_in_use_bm); -devip = (struct sdebug_dev_info *) - cmnd->device->hostdata; -if (devip) - atomic_dec(&devip->num_in_q); -sqcp->a_cmnd = NULL; -break; +clear_bit(k, queued_in_use_bm); +return 1; } } } spin_unlock_irqrestore(&queued_arr_lock, iflags); - return (k < qmax) ? 1 : 0; + return 0; } /* Deletes (stops) timers or tasklets of all queued commands */ @@ -2782,6 +2784,13 @@ static void stop_all_queued(void) if (test_bit(k, queued_in_use_bm)) { sqcp = &queued_arr[k]; if (sqcp->a_cmnd) { +devip = (struct sdebug_dev_info *) + sqcp->a_cmnd->device->hostdata; +if (devip) + atomic_dec(&devip->num_in_q); +sqcp->a_cmnd = NULL; +spin_unlock_irqrestore(&queued_arr_lock, + iflags); if (scsi_debug_ndelay > 0) { if (sqcp->sd_hrtp) hrtimer_cancel( @@ -2794,12 +2803,8 @@ static void stop_all_queued(void) if (sqcp->tletp) tasklet_kill(sqcp->tletp); } -__clear_bit(k, queued_in_use_bm); -devip = (struct sdebug_dev_info *) - sqcp->a_cmnd->device->hostdata; -if (devip) - atomic_dec(&devip->num_in_q); -sqcp->a_cmnd = NULL; +clear_bit(k, queued_in_use_bm); +spin_lock_irqsave(&queued_arr_lock, iflags); } } }