HI Keith, Thanks for your comments, I will submit a new patch of version 2 for the further reviewing, v2 patch will contains: 1, retain existing judgement and dev_warn; 2, add the check whether req is null(already did in this patch) 3, simplify and make the changelog succinct according to you said " This is what I'm thinking:". Is it right? Should I retain the nvme_irq crash log in changelog, mention the difference between nvmeq->q_depth and tagset queue_depth?
Thanks -----Original Message----- From: Keith Busch [mailto:kbu...@kernel.org] Sent: Monday, September 21, 2020 11:08 PM To: tianxianting (RD) <tian.xiant...@h3c.com> Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether req is null On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote: > @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue > *nvmeq, u16 idx) > struct nvme_completion *cqe = &nvmeq->cqes[idx]; > struct request *req; > > - if (unlikely(cqe->command_id >= nvmeq->q_depth)) { > - dev_warn(nvmeq->dev->ctrl.device, > - "invalid id %d completed on queue %d\n", > - cqe->command_id, le16_to_cpu(cqe->sq_id)); > - return; > - } > - > /* > * AEN requests are special as they don't time out and can > * survive any kind of queue freeze and often don't respond to @@ > -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue > *nvmeq, u16 idx) > } > > req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id); > + if (unlikely(!req)) { > + dev_warn(nvmeq->dev->ctrl.device, > + "req is null for tag %d completed on queue %d\n", > + cqe->command_id, le16_to_cpu(cqe->sq_id)); > + return; > + } This is making sense now, though I think we should retain the existing dev_warn() since it's still accurate and provides continuity for people who are used to looking for these sorts of messages. Your changelog is a bit much though. I think we can say it a bit more succinctly. This is what I'm thinking: The driver registers interrupts for queues before initializing the tagset because it uses the number of successful request_irq() calls to configure the tagset parameters. This allows a race condition with the current tag validity check if the controller happens to produce an interrupt with a corrupted CQE before the tagset is initialized. Replace the driver's indirect tag check with the one already provided by the block layer.