Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-04-01 Thread Sagi Grimberg




request tag can't be zero? I forget...


Of course it can.  But the reserved tags are before the normal tags,
so 0 would be a reserved tag for nvme.


Right.


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-04-01 Thread Christoph Hellwig
On Wed, Mar 31, 2021 at 03:24:49PM -0700, Sagi Grimberg wrote:
>
>>> What we can do, though, is checking the 'state' field in the tcp
>>> request, and only allow completions for commands which are in a state
>>> allowing for completions.
>>>
>>> Let's see if I can whip up a patch.
>>
>> That would be great.  BTW in the crash dump I am looking at now, it
>> looks like pdu->command_id was zero in nvme_tcp_recv_data(), and
>> blk_mq_tag_to_rq() returned a request struct that had not been used.
>> So I think we do need to check that the tag was actually allocated.
>
> request tag can't be zero? I forget...

Of course it can.  But the reserved tags are before the normal tags,
so 0 would be a reserved tag for nvme.


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Sagi Grimberg




It is, but in this situation, the controller is sending a second
completion that results in a use-after-free, which makes the
transport irrelevant. Unless there is some other flow (which is
unclear
to me) that causes this which is a bug that needs to be fixed rather
than hidden with a safeguard.



The kernel should not crash regardless of any network traffic that is
sent to the system.  It should not be possible to either intentionally
of mistakenly contruct packets that will deny service in this way.


This is not specific to nvme-tcp. I can build an rdma or pci controller
that can trigger the same crash... I saw a similar patch from Hannes
implemented in the scsi level, and not the individual scsi transports..


If scsi wants this too, this could be made generic at the blk-mq level.
We just need to make something like blk_mq_tag_to_rq(), but return NULL
if the request isn't started.


Makes sense...


I would also mention, that a crash is not even the scariest issue that
we can see here, because if the request happened to be reused we are
in the silent data corruption realm...


If this does happen, I think we have to come up with some way to
mitigate it. We're not utilizing the full 16 bits of the command_id, so
maybe we can append something like a generation sequence number that can
be checked for validity.


That's actually a great idea. scsi needs unique tags so it encodes the
hwq in the upper 16 bits giving the actual tag the lower 16 bits which
is more than enough for a single queue. We can do the same with
a gencnt that will increment in both submission and completion and we
can validate against it.

This will be useful for all transports, so maintaining it in
nvme_req(rq)->genctr and introducing a helper like:
rq = nvme_find_tag(tagset, cqe->command_id)
That will filter genctr, locate the request.

Also:
nvme_validate_request_gen(rq, cqe->command_id) that would
compare against it.


And then a helper to set the command_id like:
cmd->common.command_id = nvme_request_command_id(rq)
that will both increment the genctr and build a command_id
from it.

Thoughts?


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Sagi Grimberg




What we can do, though, is checking the 'state' field in the tcp
request, and only allow completions for commands which are in a state
allowing for completions.

Let's see if I can whip up a patch.


That would be great.  BTW in the crash dump I am looking at now, it
looks like pdu->command_id was zero in nvme_tcp_recv_data(), and
blk_mq_tag_to_rq() returned a request struct that had not been used.
So I think we do need to check that the tag was actually allocated.


request tag can't be zero? I forget...


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Ewan D. Milne
On Wed, 2021-03-31 at 09:11 +0200, Hannes Reinecke wrote:
> On 3/31/21 1:28 AM, Keith Busch wrote:
> > On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote:
> > > 
> > > > > It is, but in this situation, the controller is sending a
> > > > > second
> > > > > completion that results in a use-after-free, which makes the
> > > > > transport irrelevant. Unless there is some other flow (which
> > > > > is
> > > > > unclear
> > > > > to me) that causes this which is a bug that needs to be fixed
> > > > > rather
> > > > > than hidden with a safeguard.
> > > > > 
> > > > 
> > > > The kernel should not crash regardless of any network traffic
> > > > that is
> > > > sent to the system.  It should not be possible to either
> > > > intentionally
> > > > of mistakenly contruct packets that will deny service in this
> > > > way.
> > > 
> > > This is not specific to nvme-tcp. I can build an rdma or pci
> > > controller
> > > that can trigger the same crash... I saw a similar patch from
> > > Hannes
> > > implemented in the scsi level, and not the individual scsi
> > > transports..
> > 
> > If scsi wants this too, this could be made generic at the blk-mq
> > level.
> > We just need to make something like blk_mq_tag_to_rq(), but return
> > NULL
> > if the request isn't started.
> >  
> > > I would also mention, that a crash is not even the scariest issue
> > > that
> > > we can see here, because if the request happened to be reused we
> > > are
> > > in the silent data corruption realm...
> > 
> > If this does happen, I think we have to come up with some way to
> > mitigate it. We're not utilizing the full 16 bits of the
> > command_id, so
> > maybe we can append something like a generation sequence number
> > that can
> > be checked for validity.
> > 
> 
> ... which will be near impossible.
> We can protect against crashing on invalid frames.
> We can _not_ protect against maliciously crafted packets referencing
> any
> random _existing_ tag; that's what TLS is for.
> 
> What we can do, though, is checking the 'state' field in the tcp
> request, and only allow completions for commands which are in a state
> allowing for completions.
> 
> Let's see if I can whip up a patch.

That would be great.  BTW in the crash dump I am looking at now, it
looks like pdu->command_id was zero in nvme_tcp_recv_data(), and
blk_mq_tag_to_rq() returned a request struct that had not been used.
So I think we do need to check that the tag was actually allocated.

-Ewan

> 
> Cheers,
> 
> Hannes



Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Hannes Reinecke
On 3/31/21 1:28 AM, Keith Busch wrote:
> On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote:
>>
 It is, but in this situation, the controller is sending a second
 completion that results in a use-after-free, which makes the
 transport irrelevant. Unless there is some other flow (which is
 unclear
 to me) that causes this which is a bug that needs to be fixed rather
 than hidden with a safeguard.

>>>
>>> The kernel should not crash regardless of any network traffic that is
>>> sent to the system.  It should not be possible to either intentionally
>>> of mistakenly contruct packets that will deny service in this way.
>>
>> This is not specific to nvme-tcp. I can build an rdma or pci controller
>> that can trigger the same crash... I saw a similar patch from Hannes
>> implemented in the scsi level, and not the individual scsi transports..
> 
> If scsi wants this too, this could be made generic at the blk-mq level.
> We just need to make something like blk_mq_tag_to_rq(), but return NULL
> if the request isn't started.
>  
>> I would also mention, that a crash is not even the scariest issue that
>> we can see here, because if the request happened to be reused we are
>> in the silent data corruption realm...
> 
> If this does happen, I think we have to come up with some way to
> mitigate it. We're not utilizing the full 16 bits of the command_id, so
> maybe we can append something like a generation sequence number that can
> be checked for validity.
> 
... which will be near impossible.
We can protect against crashing on invalid frames.
We can _not_ protect against maliciously crafted packets referencing any
random _existing_ tag; that's what TLS is for.

What we can do, though, is checking the 'state' field in the tcp
request, and only allow completions for commands which are in a state
allowing for completions.

Let's see if I can whip up a patch.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-30 Thread Keith Busch
On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote:
> 
> > > It is, but in this situation, the controller is sending a second
> > > completion that results in a use-after-free, which makes the
> > > transport irrelevant. Unless there is some other flow (which is
> > > unclear
> > > to me) that causes this which is a bug that needs to be fixed rather
> > > than hidden with a safeguard.
> > > 
> > 
> > The kernel should not crash regardless of any network traffic that is
> > sent to the system.  It should not be possible to either intentionally
> > of mistakenly contruct packets that will deny service in this way.
> 
> This is not specific to nvme-tcp. I can build an rdma or pci controller
> that can trigger the same crash... I saw a similar patch from Hannes
> implemented in the scsi level, and not the individual scsi transports..

If scsi wants this too, this could be made generic at the blk-mq level.
We just need to make something like blk_mq_tag_to_rq(), but return NULL
if the request isn't started.
 
> I would also mention, that a crash is not even the scariest issue that
> we can see here, because if the request happened to be reused we are
> in the silent data corruption realm...

If this does happen, I think we have to come up with some way to
mitigate it. We're not utilizing the full 16 bits of the command_id, so
maybe we can append something like a generation sequence number that can
be checked for validity.


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-30 Thread Sagi Grimberg




It is, but in this situation, the controller is sending a second
completion that results in a use-after-free, which makes the
transport irrelevant. Unless there is some other flow (which is
unclear
to me) that causes this which is a bug that needs to be fixed rather
than hidden with a safeguard.



The kernel should not crash regardless of any network traffic that is
sent to the system.  It should not be possible to either intentionally
of mistakenly contruct packets that will deny service in this way.


This is not specific to nvme-tcp. I can build an rdma or pci controller
that can trigger the same crash... I saw a similar patch from Hannes
implemented in the scsi level, and not the individual scsi transports..

I would also mention, that a crash is not even the scariest issue that
we can see here, because if the request happened to be reused we are
in the silent data corruption realm...


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-30 Thread Ewan D. Milne
On Mon, 2021-03-15 at 10:16 -0700, Sagi Grimberg wrote:
> > Hi Sagi,
> > 
> > On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote:
> > > Daniel, again, there is nothing specific about this to nvme-tcp,
> > > this is a safeguard against a funky controller (or a different
> > > bug that is hidden by this).
> > 
> > As far I can tell, the main difference between nvme-tcp and
> > FC/NVMe,
> > nvme-tcp has not a FW or a big driver which filter out some noise
> > from a
> > misbehaving controller. I haven't really checked the other
> > transports
> > but I wouldn't surprised they share the same properties as FC/NVMe.
> > 
> > > The same can happen in any other transport so I would suggest
> > > that if
> > > this is a safeguard we want to put in place, we should make it a
> > > generic one.
> > > 
> > > i.e. nvme_tag_to_rq() that _all_ transports call consistently.
> > 
> > Okay, I'll review all the relevant code and see what could made
> > more
> > generic and consistent.
> > 
> > Though I think nvme-tcp plays in a different league as it is
> > exposed to
> > normal networking traffic and this is a very hostile environment.
> 
> It is, but in this situation, the controller is sending a second
> completion that results in a use-after-free, which makes the
> transport irrelevant. Unless there is some other flow (which is
> unclear
> to me) that causes this which is a bug that needs to be fixed rather
> than hidden with a safeguard.
> 

The kernel should not crash regardless of any network traffic that is
sent to the system.  It should not be possible to either intentionally
of mistakenly contruct packets that will deny service in this way.

-Ewan





Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-15 Thread Sagi Grimberg




Hi Sagi,

On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote:

Daniel, again, there is nothing specific about this to nvme-tcp,
this is a safeguard against a funky controller (or a different
bug that is hidden by this).


As far I can tell, the main difference between nvme-tcp and FC/NVMe,
nvme-tcp has not a FW or a big driver which filter out some noise from a
misbehaving controller. I haven't really checked the other transports
but I wouldn't surprised they share the same properties as FC/NVMe.


The same can happen in any other transport so I would suggest that if
this is a safeguard we want to put in place, we should make it a
generic one.

i.e. nvme_tag_to_rq() that _all_ transports call consistently.


Okay, I'll review all the relevant code and see what could made more
generic and consistent.

Though I think nvme-tcp plays in a different league as it is exposed to
normal networking traffic and this is a very hostile environment.


It is, but in this situation, the controller is sending a second
completion that results in a use-after-free, which makes the
transport irrelevant. Unless there is some other flow (which is unclear
to me) that causes this which is a bug that needs to be fixed rather
than hidden with a safeguard.


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-11 Thread Daniel Wagner
Hi Sagi,

On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote:
> Daniel, again, there is nothing specific about this to nvme-tcp,
> this is a safeguard against a funky controller (or a different
> bug that is hidden by this).

As far I can tell, the main difference between nvme-tcp and FC/NVMe,
nvme-tcp has not a FW or a big driver which filter out some noise from a
misbehaving controller. I haven't really checked the other transports
but I wouldn't surprised they share the same properties as FC/NVMe.

> The same can happen in any other transport so I would suggest that if
> this is a safeguard we want to put in place, we should make it a
> generic one.
> 
> i.e. nvme_tag_to_rq() that _all_ transports call consistently.

Okay, I'll review all the relevant code and see what could made more
generic and consistent.

Though I think nvme-tcp plays in a different league as it is exposed to
normal networking traffic and this is a very hostile environment.

Thanks,
Daniel


Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-05 Thread Sagi Grimberg




blk_mq_tag_to_rq() always returns a request if the tag id is in a
valid range [0...max_tags). If the target replies with a tag for which
we don't have a request but it's not started, the host will likely
corrupt data or simply crash.

Add an additional check if the a request has been started if not
reset the connection.

This addition check will not protected against an invalid tag which
maps to a request which has been started. There is nothing we can do
about this. Though it will at a least protect from crashing the host,
which generally thought to be the right thing to do.


Daniel, again, there is nothing specific about this to nvme-tcp,
this is a safeguard against a funky controller (or a different
bug that is hidden by this). The same can happen in any other
transport so I would suggest that if this is a safeguard we
want to put in place, we should make it a generic one.

i.e. nvme_tag_to_rq() that _all_ transports call consistently.


[PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-01 Thread Daniel Wagner
blk_mq_tag_to_rq() always returns a request if the tag id is in a
valid range [0...max_tags). If the target replies with a tag for which
we don't have a request but it's not started, the host will likely
corrupt data or simply crash.

Add an additional check if the a request has been started if not
reset the connection.

This addition check will not protected against an invalid tag which
maps to a request which has been started. There is nothing we can do
about this. Though it will at a least protect from crashing the host,
which generally thought to be the right thing to do.

Signed-off-by: Daniel Wagner 
---

The patch is against nmve-5.12.

I noted that nvme_tcp_process_nvme_cqe() returns EINVAL
where as the rest uses ENOENT. Looks a bit odd to me.

I've tested this with blktests.

v2:
  - moved the check into a helper to avoid code duplication
  - use nvme_reset_ctrl if request has not been started
  - added nvme_tcp_recv_ddgst() callsite

 drivers/nvme/host/tcp.c | 56 +++--
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 69f59d2c5799..af6f725b842b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -479,19 +479,38 @@ static void nvme_tcp_error_recovery(struct nvme_ctrl 
*ctrl)
queue_work(nvme_reset_wq, _tcp_ctrl(ctrl)->err_work);
 }
 
-static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
-   struct nvme_completion *cqe)
+static bool nvme_tcp_tag_to_rq(struct nvme_tcp_queue *queue,
+   __u16 command_id, struct request **req)
 {
struct request *rq;
 
-   rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), cqe->command_id);
+   rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), command_id);
if (!rq) {
dev_err(queue->ctrl->ctrl.device,
"queue %d tag 0x%x not found\n",
-   nvme_tcp_queue_id(queue), cqe->command_id);
+   nvme_tcp_queue_id(queue), command_id);
nvme_tcp_error_recovery(>ctrl->ctrl);
-   return -EINVAL;
+   return false;
}
+   if (!blk_mq_request_started(rq)) {
+   dev_err(queue->ctrl->ctrl.device,
+   "queue %d received invalid tag\n",
+   nvme_tcp_queue_id(queue));
+   nvme_reset_ctrl(>ctrl->ctrl);
+   return false;
+   }
+
+   *req = rq;
+   return true;
+}
+
+static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
+   struct nvme_completion *cqe)
+{
+   struct request *rq;
+
+   if (!nvme_tcp_tag_to_rq(queue, cqe->command_id, ))
+   return -EINVAL;
 
if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
nvme_complete_rq(rq);
@@ -505,13 +524,8 @@ static int nvme_tcp_handle_c2h_data(struct nvme_tcp_queue 
*queue,
 {
struct request *rq;
 
-   rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
-   if (!rq) {
-   dev_err(queue->ctrl->ctrl.device,
-   "queue %d tag %#x not found\n",
-   nvme_tcp_queue_id(queue), pdu->command_id);
+   if (!nvme_tcp_tag_to_rq(queue, pdu->command_id, ))
return -ENOENT;
-   }
 
if (!blk_rq_payload_bytes(rq)) {
dev_err(queue->ctrl->ctrl.device,
@@ -609,13 +623,8 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue 
*queue,
struct request *rq;
int ret;
 
-   rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
-   if (!rq) {
-   dev_err(queue->ctrl->ctrl.device,
-   "queue %d tag %#x not found\n",
-   nvme_tcp_queue_id(queue), pdu->command_id);
+   if (!nvme_tcp_tag_to_rq(queue, pdu->command_id, ))
return -ENOENT;
-   }
req = blk_mq_rq_to_pdu(rq);
 
ret = nvme_tcp_setup_h2c_data_pdu(req, pdu);
@@ -695,13 +704,8 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue 
*queue, struct sk_buff *skb,
struct nvme_tcp_request *req;
struct request *rq;
 
-   rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
-   if (!rq) {
-   dev_err(queue->ctrl->ctrl.device,
-   "queue %d tag %#x not found\n",
-   nvme_tcp_queue_id(queue), pdu->command_id);
+   if (!nvme_tcp_tag_to_rq(queue, pdu->command_id, ))
return -ENOENT;
-   }
req = blk_mq_rq_to_pdu(rq);
 
while (true) {
@@ -794,8 +798,10 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue 
*queue,
}
 
if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
-   struct request *rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue),
-   pdu->command_id);
+   struct request *rq;
+
+   if