Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

James Smart Mon, 18 Mar 2019 10:31:08 -0700


On 3/17/2019 9:09 PM, Bart Van Assche wrote:

On 3/17/19 8:29 PM, Ming Lei wrote:
In NVMe's error handler, follows the typical steps for tearing down
hardware:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
    blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
    mark the request as abort
    blk_mq_complete_request(req);
4) destroy real hw queues
However, there may be race between #3 and #4, becauseblk_mq_complete_request()
actually completes the request asynchronously.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.
Other block drivers wait until outstanding requests have completed bycalling blk_cleanup_queue() before hardware queues are destroyed. Whycan't the NVMe driver follow that approach?


speaking for the fabrics, not necessarily pci:

The intent of this looping, which happens immediately following an errorbeing detected, is to cause the termination of the outstanding requests.Otherwise, the only recourse is to wait for the ios to finish, whichthey may never do, or have their upper-level timeout expire to causetheir termination - thus a very long delay. And one of the commands,on the admin queue - a different tag set but handled the same, doesn'thave a timeout (the Async Event Reporting command) so it wouldn'tnecessarily clear without this looping.


-- james

Re: [PATCH 1/2] blk-mq: introduce blk_mq_complete_request_sync()

Reply via email to