Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
request tag can't be zero? I forget... Of course it can. But the reserved tags are before the normal tags, so 0 would be a reserved tag for nvme. Right.
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
On Wed, Mar 31, 2021 at 03:24:49PM -0700, Sagi Grimberg wrote: > >>> What we can do, though, is checking the 'state' field in the tcp >>> request, and only allow completions for commands which are in a state >>> allowing for completions. >>> >>> Let's see if I can whip up a patch. >> >> That would be great. BTW in the crash dump I am looking at now, it >> looks like pdu->command_id was zero in nvme_tcp_recv_data(), and >> blk_mq_tag_to_rq() returned a request struct that had not been used. >> So I think we do need to check that the tag was actually allocated. > > request tag can't be zero? I forget... Of course it can. But the reserved tags are before the normal tags, so 0 would be a reserved tag for nvme.
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
It is, but in this situation, the controller is sending a second completion that results in a use-after-free, which makes the transport irrelevant. Unless there is some other flow (which is unclear to me) that causes this which is a bug that needs to be fixed rather than hidden with a safeguard. The kernel should not crash regardless of any network traffic that is sent to the system. It should not be possible to either intentionally of mistakenly contruct packets that will deny service in this way. This is not specific to nvme-tcp. I can build an rdma or pci controller that can trigger the same crash... I saw a similar patch from Hannes implemented in the scsi level, and not the individual scsi transports.. If scsi wants this too, this could be made generic at the blk-mq level. We just need to make something like blk_mq_tag_to_rq(), but return NULL if the request isn't started. Makes sense... I would also mention, that a crash is not even the scariest issue that we can see here, because if the request happened to be reused we are in the silent data corruption realm... If this does happen, I think we have to come up with some way to mitigate it. We're not utilizing the full 16 bits of the command_id, so maybe we can append something like a generation sequence number that can be checked for validity. That's actually a great idea. scsi needs unique tags so it encodes the hwq in the upper 16 bits giving the actual tag the lower 16 bits which is more than enough for a single queue. We can do the same with a gencnt that will increment in both submission and completion and we can validate against it. This will be useful for all transports, so maintaining it in nvme_req(rq)->genctr and introducing a helper like: rq = nvme_find_tag(tagset, cqe->command_id) That will filter genctr, locate the request. Also: nvme_validate_request_gen(rq, cqe->command_id) that would compare against it. And then a helper to set the command_id like: cmd->common.command_id = nvme_request_command_id(rq) that will both increment the genctr and build a command_id from it. Thoughts?
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
What we can do, though, is checking the 'state' field in the tcp request, and only allow completions for commands which are in a state allowing for completions. Let's see if I can whip up a patch. That would be great. BTW in the crash dump I am looking at now, it looks like pdu->command_id was zero in nvme_tcp_recv_data(), and blk_mq_tag_to_rq() returned a request struct that had not been used. So I think we do need to check that the tag was actually allocated. request tag can't be zero? I forget...
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
On Wed, 2021-03-31 at 09:11 +0200, Hannes Reinecke wrote: > On 3/31/21 1:28 AM, Keith Busch wrote: > > On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote: > > > > > > > > It is, but in this situation, the controller is sending a > > > > > second > > > > > completion that results in a use-after-free, which makes the > > > > > transport irrelevant. Unless there is some other flow (which > > > > > is > > > > > unclear > > > > > to me) that causes this which is a bug that needs to be fixed > > > > > rather > > > > > than hidden with a safeguard. > > > > > > > > > > > > > The kernel should not crash regardless of any network traffic > > > > that is > > > > sent to the system. It should not be possible to either > > > > intentionally > > > > of mistakenly contruct packets that will deny service in this > > > > way. > > > > > > This is not specific to nvme-tcp. I can build an rdma or pci > > > controller > > > that can trigger the same crash... I saw a similar patch from > > > Hannes > > > implemented in the scsi level, and not the individual scsi > > > transports.. > > > > If scsi wants this too, this could be made generic at the blk-mq > > level. > > We just need to make something like blk_mq_tag_to_rq(), but return > > NULL > > if the request isn't started. > > > > > I would also mention, that a crash is not even the scariest issue > > > that > > > we can see here, because if the request happened to be reused we > > > are > > > in the silent data corruption realm... > > > > If this does happen, I think we have to come up with some way to > > mitigate it. We're not utilizing the full 16 bits of the > > command_id, so > > maybe we can append something like a generation sequence number > > that can > > be checked for validity. > > > > ... which will be near impossible. > We can protect against crashing on invalid frames. > We can _not_ protect against maliciously crafted packets referencing > any > random _existing_ tag; that's what TLS is for. > > What we can do, though, is checking the 'state' field in the tcp > request, and only allow completions for commands which are in a state > allowing for completions. > > Let's see if I can whip up a patch. That would be great. BTW in the crash dump I am looking at now, it looks like pdu->command_id was zero in nvme_tcp_recv_data(), and blk_mq_tag_to_rq() returned a request struct that had not been used. So I think we do need to check that the tag was actually allocated. -Ewan > > Cheers, > > Hannes
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
On 3/31/21 1:28 AM, Keith Busch wrote: > On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote: >> It is, but in this situation, the controller is sending a second completion that results in a use-after-free, which makes the transport irrelevant. Unless there is some other flow (which is unclear to me) that causes this which is a bug that needs to be fixed rather than hidden with a safeguard. >>> >>> The kernel should not crash regardless of any network traffic that is >>> sent to the system. It should not be possible to either intentionally >>> of mistakenly contruct packets that will deny service in this way. >> >> This is not specific to nvme-tcp. I can build an rdma or pci controller >> that can trigger the same crash... I saw a similar patch from Hannes >> implemented in the scsi level, and not the individual scsi transports.. > > If scsi wants this too, this could be made generic at the blk-mq level. > We just need to make something like blk_mq_tag_to_rq(), but return NULL > if the request isn't started. > >> I would also mention, that a crash is not even the scariest issue that >> we can see here, because if the request happened to be reused we are >> in the silent data corruption realm... > > If this does happen, I think we have to come up with some way to > mitigate it. We're not utilizing the full 16 bits of the command_id, so > maybe we can append something like a generation sequence number that can > be checked for validity. > ... which will be near impossible. We can protect against crashing on invalid frames. We can _not_ protect against maliciously crafted packets referencing any random _existing_ tag; that's what TLS is for. What we can do, though, is checking the 'state' field in the tcp request, and only allow completions for commands which are in a state allowing for completions. Let's see if I can whip up a patch. Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote: > > > > It is, but in this situation, the controller is sending a second > > > completion that results in a use-after-free, which makes the > > > transport irrelevant. Unless there is some other flow (which is > > > unclear > > > to me) that causes this which is a bug that needs to be fixed rather > > > than hidden with a safeguard. > > > > > > > The kernel should not crash regardless of any network traffic that is > > sent to the system. It should not be possible to either intentionally > > of mistakenly contruct packets that will deny service in this way. > > This is not specific to nvme-tcp. I can build an rdma or pci controller > that can trigger the same crash... I saw a similar patch from Hannes > implemented in the scsi level, and not the individual scsi transports.. If scsi wants this too, this could be made generic at the blk-mq level. We just need to make something like blk_mq_tag_to_rq(), but return NULL if the request isn't started. > I would also mention, that a crash is not even the scariest issue that > we can see here, because if the request happened to be reused we are > in the silent data corruption realm... If this does happen, I think we have to come up with some way to mitigate it. We're not utilizing the full 16 bits of the command_id, so maybe we can append something like a generation sequence number that can be checked for validity.
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
It is, but in this situation, the controller is sending a second completion that results in a use-after-free, which makes the transport irrelevant. Unless there is some other flow (which is unclear to me) that causes this which is a bug that needs to be fixed rather than hidden with a safeguard. The kernel should not crash regardless of any network traffic that is sent to the system. It should not be possible to either intentionally of mistakenly contruct packets that will deny service in this way. This is not specific to nvme-tcp. I can build an rdma or pci controller that can trigger the same crash... I saw a similar patch from Hannes implemented in the scsi level, and not the individual scsi transports.. I would also mention, that a crash is not even the scariest issue that we can see here, because if the request happened to be reused we are in the silent data corruption realm...
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
On Mon, 2021-03-15 at 10:16 -0700, Sagi Grimberg wrote: > > Hi Sagi, > > > > On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote: > > > Daniel, again, there is nothing specific about this to nvme-tcp, > > > this is a safeguard against a funky controller (or a different > > > bug that is hidden by this). > > > > As far I can tell, the main difference between nvme-tcp and > > FC/NVMe, > > nvme-tcp has not a FW or a big driver which filter out some noise > > from a > > misbehaving controller. I haven't really checked the other > > transports > > but I wouldn't surprised they share the same properties as FC/NVMe. > > > > > The same can happen in any other transport so I would suggest > > > that if > > > this is a safeguard we want to put in place, we should make it a > > > generic one. > > > > > > i.e. nvme_tag_to_rq() that _all_ transports call consistently. > > > > Okay, I'll review all the relevant code and see what could made > > more > > generic and consistent. > > > > Though I think nvme-tcp plays in a different league as it is > > exposed to > > normal networking traffic and this is a very hostile environment. > > It is, but in this situation, the controller is sending a second > completion that results in a use-after-free, which makes the > transport irrelevant. Unless there is some other flow (which is > unclear > to me) that causes this which is a bug that needs to be fixed rather > than hidden with a safeguard. > The kernel should not crash regardless of any network traffic that is sent to the system. It should not be possible to either intentionally of mistakenly contruct packets that will deny service in this way. -Ewan
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
Hi Sagi, On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote: Daniel, again, there is nothing specific about this to nvme-tcp, this is a safeguard against a funky controller (or a different bug that is hidden by this). As far I can tell, the main difference between nvme-tcp and FC/NVMe, nvme-tcp has not a FW or a big driver which filter out some noise from a misbehaving controller. I haven't really checked the other transports but I wouldn't surprised they share the same properties as FC/NVMe. The same can happen in any other transport so I would suggest that if this is a safeguard we want to put in place, we should make it a generic one. i.e. nvme_tag_to_rq() that _all_ transports call consistently. Okay, I'll review all the relevant code and see what could made more generic and consistent. Though I think nvme-tcp plays in a different league as it is exposed to normal networking traffic and this is a very hostile environment. It is, but in this situation, the controller is sending a second completion that results in a use-after-free, which makes the transport irrelevant. Unless there is some other flow (which is unclear to me) that causes this which is a bug that needs to be fixed rather than hidden with a safeguard.
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
Hi Sagi, On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote: > Daniel, again, there is nothing specific about this to nvme-tcp, > this is a safeguard against a funky controller (or a different > bug that is hidden by this). As far I can tell, the main difference between nvme-tcp and FC/NVMe, nvme-tcp has not a FW or a big driver which filter out some noise from a misbehaving controller. I haven't really checked the other transports but I wouldn't surprised they share the same properties as FC/NVMe. > The same can happen in any other transport so I would suggest that if > this is a safeguard we want to put in place, we should make it a > generic one. > > i.e. nvme_tag_to_rq() that _all_ transports call consistently. Okay, I'll review all the relevant code and see what could made more generic and consistent. Though I think nvme-tcp plays in a different league as it is exposed to normal networking traffic and this is a very hostile environment. Thanks, Daniel
Re: [PATCH v2] nvme-tcp: Check if request has started before processing it
blk_mq_tag_to_rq() always returns a request if the tag id is in a valid range [0...max_tags). If the target replies with a tag for which we don't have a request but it's not started, the host will likely corrupt data or simply crash. Add an additional check if the a request has been started if not reset the connection. This addition check will not protected against an invalid tag which maps to a request which has been started. There is nothing we can do about this. Though it will at a least protect from crashing the host, which generally thought to be the right thing to do. Daniel, again, there is nothing specific about this to nvme-tcp, this is a safeguard against a funky controller (or a different bug that is hidden by this). The same can happen in any other transport so I would suggest that if this is a safeguard we want to put in place, we should make it a generic one. i.e. nvme_tag_to_rq() that _all_ transports call consistently.