On Mon, May 21, 2018 at 11:29:06PM +0000, Bart Van Assche wrote:
> On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote:
> >     switch (ret) {
> >     case BLK_EH_HANDLED:
> > -           __blk_mq_complete_request(req);
> > -           break;
> > -   case BLK_EH_RESET_TIMER:
> >             /*
> > -            * As nothing prevents from completion happening while
> > -            * ->aborted_gstate is set, this may lead to ignored
> > -            * completions and further spurious timeouts.
> > +            * If the request is still in flight, the driver is requesting
> > +            * blk-mq complete it.
> >              */
> > -           blk_mq_rq_update_aborted_gstate(req, 0);
> > +           if (blk_mq_rq_state(req) == MQ_RQ_IN_FLIGHT)
> > +                   __blk_mq_complete_request(req);
> > +           break;
> > +   case BLK_EH_RESET_TIMER:
> >             blk_add_timer(req);
> >             break;
> >     case BLK_EH_NOT_HANDLED:
> > @@ -880,64 +782,64 @@ static void blk_mq_rq_timed_out(struct request *req, 
> > bool reserved)
> >     }
> >  }
> 
> I think the above changes can lead to concurrent calls of
> __blk_mq_complete_request() from the regular completion path and the timeout
> path. That's wrong: the __blk_mq_complete_request() caller should guarantee
> that no concurrent calls from another context to that function can occur.

Hi Bart,

This shouldn't be introducing any new concorrent calls to
__blk_mq_complete_request if they don't already exist. The timeout work
calls it only if the driver's timeout returns BLK_EH_HANDLED, which means
the driver is claiming the command is now done, but the driver didn't call
blk_mq_complete_request as indicated by the request's IN_FLIGHT state.

In order to get a second call to __blk_mq_complete_request(), then,
the driver would have to end up calling blk_mq_complete_request()
in another context. But there's nothing stopping that from happening
today, and would be bad if any driver actually did that: the request
may have been re-allocated and issued as a new command, and calling
blk_mq_complete_request() the second time introduces data corruption.

Thanks,
Keith

Reply via email to