On Tue, Apr 02, 2019 at 09:16:23AM -0700, Bart Van Assche wrote:
> Several SCSI transport and LLD drivers surround code that does not
> tolerate concurrent calls of .queuecommand() with scsi_target_block() /
> scsi_target_unblock(). These last two functions use
> blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
> queues to prevent concurrent .queuecommand() calls. However, that is
> not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
> Hence surround the .queuecommand() call from the SCSI error handler with
> code that avoids that .queuecommand() gets called in the quiesced state.
> 
> Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
> code that calls blk_get_request() + blk_execute_rq() is not an option
> since scsi_send_eh_cmnd() must be able to make forward progress even
> if all requests have been allocated.
> 
> Cc: Christoph Hellwig <h...@lst.de>
> Cc: Ming Lei <ming....@redhat.com>
> Cc: Hannes Reinecke <h...@suse.de>
> Cc: Johannes Thumshirn <jthumsh...@suse.de>
> Signed-off-by: Bart Van Assche <bvanass...@acm.org>
> ---
>  drivers/scsi/scsi_error.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 8e9680572b9f..d516dd1b824d 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1054,7 +1054,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> unsigned char *cmnd,
>       struct scsi_device *sdev = scmd->device;
>       struct Scsi_Host *shost = sdev->host;
>       DECLARE_COMPLETION_ONSTACK(done);
> -     unsigned long timeleft = timeout;
> +     unsigned long timeleft = timeout, delay;
>       struct scsi_eh_save ses;
>       const unsigned long stall_for = msecs_to_jiffies(100);
>       int rtn;
> @@ -1065,7 +1065,29 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> unsigned char *cmnd,
>  
>       scsi_log_send(scmd);
>       scmd->scsi_done = scsi_eh_done;
> -     rtn = shost->hostt->queuecommand(shost, scmd);
> +
> +     /*
> +      * Lock sdev->state_mutex to avoid that scsi_device_quiesce() can
> +      * change the SCSI device state after we have examined it and before
> +      * .queuecommand() is called.
> +      */
> +     mutex_lock(&sdev->state_mutex);
> +     while (sdev->sdev_state == SDEV_BLOCK && timeleft > 0) {
> +             mutex_unlock(&sdev->state_mutex);
> +             SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_DEBUG, sdev,
> +                     "%s: state %d <> %d\n", __func__, sdev->sdev_state,
> +                     SDEV_BLOCK));
> +             delay = min(timeleft, stall_for);
> +             timeleft -= delay;
> +             msleep(jiffies_to_msecs(delay));
> +             mutex_lock(&sdev->state_mutex);
> +     }
> +     if (sdev->sdev_state != SDEV_BLOCK)
> +             rtn = shost->hostt->queuecommand(shost, scmd);
> +     else
> +             rtn = SCSI_MLQUEUE_DEVICE_BUSY;
> +     mutex_unlock(&sdev->state_mutex);
> +
>       if (rtn) {
>               if (timeleft > stall_for) {
>                       scsi_eh_restore_cmnd(scmd, &ses);
> -- 
> 2.21.0.196.g041f5ea1cf98
> 

Still not sure if it is safe to do that in case of SDEV_BLOCK.

SDEV_BLOCK can be set via sysfs interface, then what if there are
in-flight IOs which need EH to retry just after someone sets 'blocked'?

People may complain this patch causes data loss on above test case.


Thanks,
Ming

Reply via email to