Re: Deadlock in usb-storage error handling

Alan Stern Thu, 20 Mar 2014 12:48:29 -0700

On Thu, 20 Mar 2014, James Bottomley wrote:

> On Thu, 2014-03-20 at 12:34 -0400, Alan Stern wrote:
> > On Thu, 20 Mar 2014, James Bottomley wrote:
> > 
> > > OK, so I think we have three things to do
> > > 
> > >      1. Investigate SCSI and fix it's abort state problem that's causing
> > >         it not to send the abort second time around
> > >      2. Fix usb-storage to fail a reset it can't do (i.e. device reset
> > >         with outstanding commands)
> > >      3. Find out why we're sending a spurious request sense.
> > > 
> > > I can look at 1 and 3 if you want to take 2.
> > 
> > It's a deal!  Thanks for your help.
> 
> And this looks to be 3: a bug in the way we attach sense data to
> commands (we shouldn't look for attached sense if the device error code
> didn't imply there would be any).
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 771c16b..d020149 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -1157,6 +1157,15 @@ int scsi_eh_get_sense(struct list_head *work_q,
>                                            __func__));
>                       break;
>               }
> +             if (status_byte(scmd->result) != CHECK_CONDITION)
> +                     /*
> +                      * don't request sense if there's no check condition
> +                      * status because the error we're processing isn't one
> +                      * that has a sense code (and some devices get
> +                      * confused by sense requests out of the blue)
> +                      */
> +                     continue;
> +
>               SCSI_LOG_ERROR_RECOVERY(2, scmd_printk(KERN_INFO, scmd,
>                                                 "%s: requesting sense\n",
>                                                 current->comm));


I tried this patch first, because fixing the earlier bug would mask
this one.

The patch sort of worked.  But the first time I tried it, it failed in
a rather amusing way.  While the second retry was running and hung,
scmd->result _was_ equal to CHECK_CONDITION -- because that was the
result from the _first_ retry, and it had never gotten cleared!

scmd->result needs to be set to 0 before the queuecommand callback is
invoked.  I ended up adding this to your patch, and then it worked
perfectly:


Index: usb-3.14/drivers/scsi/scsi_error.c
===================================================================
--- usb-3.14.orig/drivers/scsi/scsi_error.c
+++ usb-3.14/drivers/scsi/scsi_error.c
@@ -924,6 +924,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd
        memset(scmd->cmnd, 0, BLK_MAX_CDB);
        memset(&scmd->sdb, 0, sizeof(scmd->sdb));
        scmd->request->next_rq = NULL;
+       scmd->result = 0;
 
        if (sense_bytes) {
                scmd->sdb.length = min_t(unsigned, SCSI_SENSE_BUFFERSIZE,
Index: usb-3.14/drivers/scsi/scsi_lib.c
===================================================================
--- usb-3.14.orig/drivers/scsi/scsi_lib.c
+++ usb-3.14/drivers/scsi/scsi_lib.c
@@ -159,6 +159,7 @@ static void __scsi_queue_insert(struct s
         * lock such that the kblockd_schedule_work() call happens
         * before blk_cleanup_queue() finishes.
         */
+       cmd->result = 0;
        spin_lock_irqsave(q->queue_lock, flags);
        blk_requeue_request(q, cmd->request);
        kblockd_schedule_work(q, &device->requeue_work);


Maybe only the second one is necessary, but it seemed best to be
consistent.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Deadlock in usb-storage error handling

Reply via email to