On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote: > > > > It is, but in this situation, the controller is sending a second > > > completion that results in a use-after-free, which makes the > > > transport irrelevant. Unless there is some other flow (which is > > > unclear > > > to me) that causes this which is a bug that needs to be fixed rather > > > than hidden with a safeguard. > > > > > > > The kernel should not crash regardless of any network traffic that is > > sent to the system. It should not be possible to either intentionally > > of mistakenly contruct packets that will deny service in this way. > > This is not specific to nvme-tcp. I can build an rdma or pci controller > that can trigger the same crash... I saw a similar patch from Hannes > implemented in the scsi level, and not the individual scsi transports..
If scsi wants this too, this could be made generic at the blk-mq level. We just need to make something like blk_mq_tag_to_rq(), but return NULL if the request isn't started. > I would also mention, that a crash is not even the scariest issue that > we can see here, because if the request happened to be reused we are > in the silent data corruption realm... If this does happen, I think we have to come up with some way to mitigate it. We're not utilizing the full 16 bits of the command_id, so maybe we can append something like a generation sequence number that can be checked for validity.