On Fri, 2018-11-16 at 10:53 +0100, Hannes Reinecke wrote: > On 11/15/18 6:58 PM, Keith Busch wrote: > > The scsi timeout error handling had been directly updating the block > > layer's request state to prevent a error handling and a natural completion > > from completing the same request twice. Fix this layering violation > > by having scsi control the fate of its commands with scsi owned flags > > rather than use blk-mq's. > > > > Signed-off-by: Keith Busch <keith.bu...@intel.com> > > --- > > drivers/scsi/scsi_error.c | 22 +++++++++++----------- > > drivers/scsi/scsi_lib.c | 6 +++++- > > include/scsi/scsi_cmnd.h | 5 ++++- > > 3 files changed, 20 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > > index dd338a8cd275..e92e088f636f 100644 > > --- a/drivers/scsi/scsi_error.c > > +++ b/drivers/scsi/scsi_error.c > > @@ -297,19 +297,19 @@ enum blk_eh_timer_return scsi_times_out(struct > > request *req) > > > > if (rtn == BLK_EH_DONE) { > > /* > > - * For blk-mq, we must set the request state to complete now > > - * before sending the request to the scsi error handler. This > > - * will prevent a use-after-free in the event the LLD manages > > - * to complete the request before the error handler finishes > > - * processing this timed out request. > > + * Set the command to complete first in order to prevent a real > > + * completion from releasing the command while error handling > > + * is using it. If the command was already completed, then the > > + * lower level driver beat the timeout handler, and it is safe > > + * to return without escalating error recovery. > > * > > - * If the request was already completed, then the LLD beat the > > - * time out handler from transferring the request to the scsi > > - * error handler. In that case we can return immediately as no > > - * further action is required. > > + * If timeout handling lost the race to a real completion, the > > + * block layer may ignore that due to a fake timeout injection, > > + * so return RESET_TIMER to allow error handling another shot > > + * at this command. > > */ > > - if (!blk_mq_mark_complete(req)) > > - return rtn; > > + if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags)) > > + return BLK_EH_RESET_TIMER; > > if (scsi_abort_command(scmd) != SUCCESS) { > > set_host_byte(scmd, DID_TIME_OUT); > > scsi_eh_scmd_add(scmd); > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 5d83a162d03b..c1d5e4e36125 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request > > *req) > > > > static void scsi_mq_done(struct scsi_cmnd *cmd) > > { > > + if (unlikely(test_and_set_bit(__SCMD_COMPLETE, &cmd->flags))) > > + return; > > trace_scsi_dispatch_cmd_done(cmd); > > - blk_mq_complete_request(cmd->request); > > + if (unlikely(!blk_mq_complete_request(cmd->request))) > > + clear_bit(__SCMD_COMPLETE, &cmd->flags); > > } > > > > static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx) > > @@ -1701,6 +1704,7 @@ static blk_status_t scsi_queue_rq(struct > > blk_mq_hw_ctx *hctx, > > goto out_dec_host_busy; > > req->rq_flags |= RQF_DONTPREP; > > } else { > > + cmd->flags &= ~SCMD_COMPLETE; > > blk_mq_start_request(req); > > } > > > > Why do you use a normal assignment here (and not a clear_bit() as in the > above hunk)? > > And shouldn't you add a barrier here to broadcast the state change to > other cores?
Although I don't like it that request state information is being moved from the block layer core into the SCSI core, I think that this patch is fine. scsi_queue_rq() is only called after a request structure has been recycled so I don't think that there can be any kind of race between the code paths that use atomic instructions to manipulate the __SCMD_COMPLETE bit and scsi_queue_rq(). Bart.