On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > Of the two you mentioned, yours is preferable IMO. While I appreciate > Jianchao's detailed analysis, it's hard to take a proposal seriously > that so colourfully calls everyone else "dangerous" while advocating > for silently losing requests on purpose. > > But where's the option that fixes scsi to handle hardware completions > concurrently with arbitrary timeout software? Propping up that house of > cards can't be the only recourse.
The important bit is that we need to fix this issue quickly. We are past -rc5 so I'm rather concerned about anything too complicated. I'm not even sure SCSI has a problem with multiple completions happening at the same time, but it certainly has a problem with bypassing blk_mq_complete_request from the EH path. I think we can solve this properly, but I also think we are way to late in the 4.18 cycle to fix it properly. For now I fear we'll just have to revert the changes and try again for 4.19 or even 4.20 if we don't act quickly enough.