On 12/5/18 12:09 PM, Guenter Roeck wrote: > On Wed, Dec 05, 2018 at 10:59:21AM -0700, Jens Axboe wrote: > [ ... ] >> >>> Also, it seems to me that even with this problem fixed, blk-mq may not >>> be ready for primetime after all. With that in mind, maybe commit >>> d5038a13eca72 ("scsi: core: switch to scsi-mq by default") was a >>> bit premature. Should that be reverted ? >> >> I have to strongly disagree with that, the timing is just unfortunate. >> There are literally millions of machines running blk-mq/scsi-mq, and >> this is the only hickup we've had. So I want to put this one to rest >> once and for all, there's absolutely no reason not to continue with >> what we've planned. >> > > Guess we have to agree to disagree. In my opinion, for infrastructure > as critical as this, a single hickup is one hickup too many. Not that > I would describe this as hickup in the first place; I would describe > it as major disaster.
Don't get me wrong, I don't mean to use hickup in a diminishing fashion, this was by all means a disaster for the ones hit by it. But if you look at the scope of how many folks are using blk-mq/scsi-mq and have been for years, we're really talking about a tiny tiny percentage here. This could just as easily have happened with the old IO stack. The bug was a freak accident, and even with full knowledge of why it happened, I'm still having an extraordinarily hard time triggering it at will on my test boxes. As with any disaster, it's usually a combination of multiple things that go wrong, and this one is no different. The folks that hit this generally hit it pretty easily, and (by far) the majority would never hit it. Bugs happen, whether you like it or not. They happen in file systems, memory management, and they happen in storage. Things are continually developed, and that sometimes introduces bugs. We do our best to ensure that doesn't happen, but sometimes freak accidents like this happen. I think my track record of decades of work speaks for itself there, it's not like this is a frequent occurence. And if this particular issue wasn't well understood and instead just reverted the offending commits, then I would agree with you. But that's not the case. I'm very confident in the stability, among other things, of blk-mq and the drivers that utilize it. Most of the storage drivers are using it today, and have been for a long time. -- Jens Axboe