nvme resume is really crazy, since it does not believe the device is stopped, tries to use high-level operations to stop it and then restart it, but it ends up reusing queue structures from before.
dv spent time in here, and I tried to figure it out also. The code is highly suspect since it isn't doing the MINIMUM, which is: at suspend time, ensure the soft state is good, and shut the hw down HARD. On resume, reset the hardware HARD in the minimum fashion, and then reconfigure it to match the soft state. it sounds so simple, but the code that exists is about 20x more complicated than this, and it doesn't make any sense to me why it is so complicated. dlg, jmatthew, can you show up?