Re: [PATCH v4 1/3] cxlflash: Base error recovery support

Benjamin Herrenschmidt Mon, 10 Aug 2015 20:44:07 -0700

On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote:
> Introduce support for enhanced I/O error handling.
> 
> Signed-off-by: Matthew R. Ochs <mro...@linux.vnet.ibm.com>
> Signed-off-by: Manoj N. Kumar <ma...@linux.vnet.ibm.com>
> ---


So I'm not necessarily very qualified to review SCSI bits as I haven't
done anything close to the Linux SCSI code for almost a decade but I
have a couple of questions and nits...

>       wait_queue_head_t tmf_waitq;
>       bool tmf_active;
> -     u8 err_recovery_active:1;
> +     wait_queue_head_t limbo_waitq;
> +     enum cxlflash_state state;
>  };
>  
>  struct afu_cmd {
> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
> index 76a7286..18359d4 100644
> --- a/drivers/scsi/cxlflash/main.c
> +++ b/drivers/scsi/cxlflash/main.c
> @@ -380,6 +380,18 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scp)
>       }
>       spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags);
>  
> +     switch (cfg->state) {
> +     case STATE_LIMBO:
> +             pr_debug_ratelimited("%s: device in limbo!\n", __func__);
> +             rc = SCSI_MLQUEUE_HOST_BUSY;
> +             goto out;
> +     case STATE_FAILTERM:
> +             pr_debug_ratelimited("%s: device has failed!\n", __func__);
> +             goto error;
> +     default:
> +             break;
> +     }

I noticed that you mostly read and write that new state outside of your
tmf_waitq.lock. Is there any other lock or mutex protecting you ? In
this specific case ?

I know in the old day queuecommand was called with a host lock, is that
still the case ?

Or you just don't care about an occasional spurrious
SCSI_MLQUEUE_HOST_BUSY ?

>       cmd = cxlflash_cmd_checkout(afu);
>       if (unlikely(!cmd)) {
>               pr_err("%s: could not get a free command\n", __func__);
> @@ -428,6 +440,10 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scp)
>  
>  out:
>       return rc;
> +error:
> +     scp->result = (DID_NO_CONNECT << 16);
> +     scp->scsi_done(scp);
> +     return 0;
>  }
>  
>  /**
> @@ -455,9 +471,21 @@ static int cxlflash_eh_device_reset_handler(struct 
> scsi_cmnd *scp)
>                get_unaligned_be32(&((u32 *)scp->cmnd)[2]),
>                get_unaligned_be32(&((u32 *)scp->cmnd)[3]));
>  
> -     rcr = send_tmf(afu, scp, TMF_LUN_RESET);
> -     if (unlikely(rcr))
> +     switch (cfg->state) {
> +     case STATE_NORMAL:
> +             rcr = send_tmf(afu, scp, TMF_LUN_RESET);
> +             if (unlikely(rcr))
> +                     rc = FAILED;
> +             break;
> +     case STATE_LIMBO:
> +             wait_event(cfg->limbo_waitq, cfg->state != STATE_LIMBO);
> +             if (cfg->state == STATE_NORMAL)
> +                     break;
> +             /* fall through */
> +     default:
>               rc = FAILED;
> +             break;
> +     }

Same here, since you are doing wait_event, I assume no lock is held
(unless it's a mutex) and in subsequent places I am not listing.

As I said, it could well be that it all works fine but it would be worth
mentioning in this case because it's not obvious from simply reading the
code.

Cheers,
Ben.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/3] cxlflash: Base error recovery support

Reply via email to