On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>       struct nvme_completion *cqe = &nvmeq->cqes[idx];
>       struct request *req;
>  
> -     if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> -             dev_warn(nvmeq->dev->ctrl.device,
> -                     "invalid id %d completed on queue %d\n",
> -                     cqe->command_id, le16_to_cpu(cqe->sq_id));
> -             return;
> -     }
> -
>       /*
>        * AEN requests are special as they don't time out and can
>        * survive any kind of queue freeze and often don't respond to
> @@ -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>       }
>  
>       req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> +     if (unlikely(!req)) {
> +             dev_warn(nvmeq->dev->ctrl.device,
> +                     "req is null for tag %d completed on queue %d\n",
> +                     cqe->command_id, le16_to_cpu(cqe->sq_id));
> +             return;
> +     }

This is making sense now, though I think we should retain the existing
dev_warn() since it's still accurate and provides continuity for people
who are used to looking for these sorts of messages.

Your changelog is a bit much though. I think we can say it a bit more
succinctly. This is what I'm thinking:

  The driver registers interrupts for queues before initializing the
  tagset because it uses the number of successful request_irq() calls
  to configure the tagset parameters. This allows a race condition with
  the current tag validity check if the controller happens to produce
  an interrupt with a corrupted CQE before the tagset is initialized.

  Replace the driver's indirect tag check with the one already provided
  by the block layer.

Reply via email to