Re: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-20 Thread Christoph Hellwig
Thanks, this looks much more sensible than the previous versions.


Re: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-21 Thread Keith Busch
On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   struct nvme_completion *cqe = &nvmeq->cqes[idx];
>   struct request *req;
>  
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> - dev_warn(nvmeq->dev->ctrl.device,
> - "invalid id %d completed on queue %d\n",
> - cqe->command_id, le16_to_cpu(cqe->sq_id));
> - return;
> - }
> -
>   /*
>* AEN requests are special as they don't time out and can
>* survive any kind of queue freeze and often don't respond to
> @@ -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   }
>  
>   req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> + if (unlikely(!req)) {
> + dev_warn(nvmeq->dev->ctrl.device,
> + "req is null for tag %d completed on queue %d\n",
> + cqe->command_id, le16_to_cpu(cqe->sq_id));
> + return;
> + }

This is making sense now, though I think we should retain the existing
dev_warn() since it's still accurate and provides continuity for people
who are used to looking for these sorts of messages.

Your changelog is a bit much though. I think we can say it a bit more
succinctly. This is what I'm thinking:

  The driver registers interrupts for queues before initializing the
  tagset because it uses the number of successful request_irq() calls
  to configure the tagset parameters. This allows a race condition with
  the current tag validity check if the controller happens to produce
  an interrupt with a corrupted CQE before the tagset is initialized.

  Replace the driver's indirect tag check with the one already provided
  by the block layer.


RE: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-21 Thread Tianxianting
HI Keith,
Thanks for your comments,
I will submit a new patch of version 2 for the further reviewing,  v2 patch 
will contains:
1, retain existing judgement and dev_warn;
2, add the check whether req is null(already did in this patch)
3, simplify and make the changelog succinct according to you said " This is 
what I'm thinking:".
Is it right?
Should I retain the nvme_irq crash log in changelog, mention the difference 
between nvmeq->q_depth and tagset queue_depth? 

Thanks

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Monday, September 21, 2020 11:08 PM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether 
req is null

On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   struct nvme_completion *cqe = &nvmeq->cqes[idx];
>   struct request *req;
>  
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> - dev_warn(nvmeq->dev->ctrl.device,
> - "invalid id %d completed on queue %d\n",
> - cqe->command_id, le16_to_cpu(cqe->sq_id));
> - return;
> - }
> -
>   /*
>* AEN requests are special as they don't time out and can
>* survive any kind of queue freeze and often don't respond to @@ 
> -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   }
>  
>   req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> + if (unlikely(!req)) {
> + dev_warn(nvmeq->dev->ctrl.device,
> + "req is null for tag %d completed on queue %d\n",
> + cqe->command_id, le16_to_cpu(cqe->sq_id));
> + return;
> + }

This is making sense now, though I think we should retain the existing
dev_warn() since it's still accurate and provides continuity for people who are 
used to looking for these sorts of messages.

Your changelog is a bit much though. I think we can say it a bit more 
succinctly. This is what I'm thinking:

  The driver registers interrupts for queues before initializing the
  tagset because it uses the number of successful request_irq() calls
  to configure the tagset parameters. This allows a race condition with
  the current tag validity check if the controller happens to produce
  an interrupt with a corrupted CQE before the tagset is initialized.

  Replace the driver's indirect tag check with the one already provided
  by the block layer.


Re: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-21 Thread Keith Busch
On Mon, Sep 21, 2020 at 03:49:09PM +, Tianxianting wrote:
> HI Keith,
> Thanks for your comments,
> I will submit a new patch of version 2 for the further reviewing,  v2 patch 
> will contains:
> 1, retain existing judgement and dev_warn;

No no, go ahead and remove the existing check just as you've done. That
check becomes redundant with the safer one you're adding, and we don't
want redundant checks in the fast path. My only suggestion is to use
the same dev_warn() in your new check.

> 2, add the check whether req is null(already did in this patch)
> 3, simplify and make the changelog succinct according to you said " This is 
> what I'm thinking:".
> Is it right?
> Should I retain the nvme_irq crash log in changelog, mention the difference 
> between nvmeq->q_depth and tagset queue_depth? 

The tagset's queue_depth is a valid point to mention as well. The
dirver's current indirect check is not necessarily in sync with the
actual tagset.
 
> Thanks
> 
> -Original Message-
> From: Keith Busch [mailto:kbu...@kernel.org] 
> Sent: Monday, September 21, 2020 11:08 PM
> To: tianxianting (RD) 
> Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
> linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether 
> req is null
> 
> On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> > @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> > *nvmeq, u16 idx)
> > struct nvme_completion *cqe = &nvmeq->cqes[idx];
> > struct request *req;
> >  
> > -   if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> > -   dev_warn(nvmeq->dev->ctrl.device,
> > -   "invalid id %d completed on queue %d\n",
> > -   cqe->command_id, le16_to_cpu(cqe->sq_id));
> > -   return;
> > -   }
> > -
> > /*
> >  * AEN requests are special as they don't time out and can
> >  * survive any kind of queue freeze and often don't respond to @@ 
> > -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> > *nvmeq, u16 idx)
> > }
> >  
> > req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> > +   if (unlikely(!req)) {
> > +   dev_warn(nvmeq->dev->ctrl.device,
> > +   "req is null for tag %d completed on queue %d\n",
> > +   cqe->command_id, le16_to_cpu(cqe->sq_id));
> > +   return;
> > +   }
> 
> This is making sense now, though I think we should retain the existing
> dev_warn() since it's still accurate and provides continuity for people who 
> are used to looking for these sorts of messages.
> 
> Your changelog is a bit much though. I think we can say it a bit more 
> succinctly. This is what I'm thinking:
> 
>   The driver registers interrupts for queues before initializing the
>   tagset because it uses the number of successful request_irq() calls
>   to configure the tagset parameters. This allows a race condition with
>   the current tag validity check if the controller happens to produce
>   an interrupt with a corrupted CQE before the tagset is initialized.
> 
>   Replace the driver's indirect tag check with the one already provided
>   by the block layer.


RE: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-21 Thread Tianxianting
I get it now, thanks Keith:)

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Monday, September 21, 2020 11:59 PM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether 
req is null

On Mon, Sep 21, 2020 at 03:49:09PM +, Tianxianting wrote:
> HI Keith,
> Thanks for your comments,
> I will submit a new patch of version 2 for the further reviewing,  v2 patch 
> will contains:
> 1, retain existing judgement and dev_warn;

No no, go ahead and remove the existing check just as you've done. That check 
becomes redundant with the safer one you're adding, and we don't want redundant 
checks in the fast path. My only suggestion is to use the same dev_warn() in 
your new check.

> 2, add the check whether req is null(already did in this patch) 3, 
> simplify and make the changelog succinct according to you said " This is what 
> I'm thinking:".
> Is it right?
> Should I retain the nvme_irq crash log in changelog, mention the difference 
> between nvmeq->q_depth and tagset queue_depth? 

The tagset's queue_depth is a valid point to mention as well. The dirver's 
current indirect check is not necessarily in sync with the actual tagset.
 
> Thanks
> 
> -Original Message-
> From: Keith Busch [mailto:kbu...@kernel.org]
> Sent: Monday, September 21, 2020 11:08 PM
> To: tianxianting (RD) 
> Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
> linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] nvme: replace meaningless judgement by checking 
> whether req is null
> 
> On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> > @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> > *nvmeq, u16 idx)
> > struct nvme_completion *cqe = &nvmeq->cqes[idx];
> > struct request *req;
> >  
> > -   if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> > -   dev_warn(nvmeq->dev->ctrl.device,
> > -   "invalid id %d completed on queue %d\n",
> > -   cqe->command_id, le16_to_cpu(cqe->sq_id));
> > -   return;
> > -   }
> > -
> > /*
> >  * AEN requests are special as they don't time out and can
> >  * survive any kind of queue freeze and often don't respond to @@
> > -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> > *nvmeq, u16 idx)
> > }
> >  
> > req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> > +   if (unlikely(!req)) {
> > +   dev_warn(nvmeq->dev->ctrl.device,
> > +   "req is null for tag %d completed on queue %d\n",
> > +   cqe->command_id, le16_to_cpu(cqe->sq_id));
> > +   return;
> > +   }
> 
> This is making sense now, though I think we should retain the existing
> dev_warn() since it's still accurate and provides continuity for people who 
> are used to looking for these sorts of messages.
> 
> Your changelog is a bit much though. I think we can say it a bit more 
> succinctly. This is what I'm thinking:
> 
>   The driver registers interrupts for queues before initializing the
>   tagset because it uses the number of successful request_irq() calls
>   to configure the tagset parameters. This allows a race condition with
>   the current tag validity check if the controller happens to produce
>   an interrupt with a corrupted CQE before the tagset is initialized.
> 
>   Replace the driver's indirect tag check with the one already provided
>   by the block layer.