Re: [PATCHSET v5] blk-mq: reimplement timeout handling
On 01/13/2018 05:19 AM, Bart Van Assche wrote: > Sorry but I only retrieved the blk-mq debugfs several minutes after the hang > started so I'm not sure the state information is relevant. Anyway, I have > attached > it to this e-mail. The most remarkable part is the following: > > ./9ddfa913/requeue_list:9646711c {.op=READ, .state=idle, > gen=0x1 > 18, abort_gen=0x0, .cmd_flags=, .rq_flags=SORTED|1|SOFTBARRIER|IO_STAT, > complete > =0, .tag=-1, .internal_tag=217} > > The hexadecimal number at the start is the request_queue pointer (I modified > the > blk-mq-debugfs code such that queues are registered with there address just > after > creation and until a name is assigned). This is a dm-mpath queue. There seems to be something wrong in hctx->nr_active. ./sde/hctx2/cpu2/completed:2 3 ./sde/hctx2/cpu2/merged:0 ./sde/hctx2/cpu2/dispatched:2 3 ./sde/hctx2/active:5 ./sde/hctx1/cpu1/completed:2 38 ./sde/hctx1/cpu1/merged:0 ./sde/hctx1/cpu1/dispatched:2 38 ./sde/hctx1/active:40 ./sde/hctx0/cpu0/completed:20 11 ./sde/hctx0/cpu0/merged:0 ./sde/hctx0/cpu0/dispatched:20 11 ./sde/hctx0/active:31 ... ./sdc/hctx1/cpu1/completed:14 13 ./sdc/hctx1/cpu1/merged:0 ./sdc/hctx1/cpu1/dispatched:14 13 ./sdc/hctx1/active:21 ./sdc/hctx0/cpu0/completed:1 41 ./sdc/hctx0/cpu0/merged:0 ./sdc/hctx0/cpu0/dispatched:1 41 ./sdc/hctx0/active:36 Then hctx_may_queue return false. Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
Hi tejun Many thanks for your kindly response. On 01/09/2018 11:37 AM, Tejun Heo wrote: > Hello, > > On Tue, Jan 09, 2018 at 11:08:04AM +0800, jianchao.wang wrote: >>> But what'd prevent the completion reinitializing the request and then >>> the actual completion path coming in and completing the request again? >> >> blk_mark_rq_complete() will gate and ensure there will be only one >> __blk_mq_complete_request() to be invoked. > > Yeah, but then the complete flag will be cleared once completion is > done and the request is reinitialized. Yes, it is. What I mean is not to reserve blk_mark_rq_complete and REQ_ATOM_COMPLETE usages in blk-mq but the side-effect after blk_mq_complete_request cannot exclude with itself. As you said, the scene is racy and should be modified. :) Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
Hi tejun Many thanks for you kindly response. On 01/09/2018 01:27 AM, Tejun Heo wrote: > Hello, Jianchao. > > On Fri, Dec 22, 2017 at 12:02:20PM +0800, jianchao.wang wrote: >>> On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: >>>> It's worrying that even though the blk_mark_rq_complete() here is >>>> intended to synchronize with timeout path, but it indeed give the >>>> blk_mq_complete_request() the capability to exclude with >> >> There could be scenario where the driver itself stop a request >> itself with blk_mq_complete_request() or some other interface that >> will invoke it, races with the normal completion path where a same >> request comes. > > But what'd prevent the completion reinitializing the request and then > the actual completion path coming in and completing the request again? > blk_mark_rq_complete() will gate and ensure there will be only one __blk_mq_complete_request() to be invoked. >> For example: >> a reset could be triggered through sysfs on nvme-rdma >> Then the driver will cancel all the reqs, including in-flight ones. >> nvme_rdma_reset_ctrl_work() >> nvme_rdma_shutdown_ctrl() >> >>>> >> if (ctrl->ctrl.queue_count > 1) { >> nvme_stop_queues(>ctrl); //quiesce the queue >> blk_mq_tagset_busy_iter(>tag_set, >> nvme_cancel_request, >ctrl); //invoke >> blk_mq_complete_request() >> nvme_rdma_destroy_io_queues(ctrl, shutdown); >> } >> >>>> >> >> These operations could race with the normal completion path of in-flight >> ones. >> It should drain all the in-flight ones first here. But there maybe some other >> places similar with this. > > If there are any such places, they should be using an interface which > is propelry synchronized like blk_abort_request(), which btw is what > libata already does. Otherwise, it's racy with or without these > patches. Yes, it is that. Thanks for you kindly response again. Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
Sorry for my non-detailed description. On 12/21/2017 09:50 PM, Tejun Heo wrote: > Hello, > > On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: >> It's worrying that even though the blk_mark_rq_complete() here is intended >> to synchronize with >> timeout path, but it indeed give the blk_mq_complete_request() the >> capability to exclude with There could be scenario where the driver itself stop a request itself with blk_mq_complete_request() or some other interface that will invoke it, races with the normal completion path where a same request comes. For example: a reset could be triggered through sysfs on nvme-rdma Then the driver will cancel all the reqs, including in-flight ones. nvme_rdma_reset_ctrl_work() nvme_rdma_shutdown_ctrl() >>>> if (ctrl->ctrl.queue_count > 1) { nvme_stop_queues(>ctrl); //quiesce the queue blk_mq_tagset_busy_iter(>tag_set, nvme_cancel_request, >ctrl); //invoke blk_mq_complete_request() nvme_rdma_destroy_io_queues(ctrl, shutdown); } >>>> These operations could race with the normal completion path of in-flight ones. It should drain all the in-flight ones first here. But there maybe some other places similar with this. >> itself. Maybe this capability should be reserved. > > Can you explain further how that'd help? The problem there is that if > you have two competing completions, where one isn't a timeout, there's > nothing synchronizing the reuse of the request. IOW, the losing on > can easily complete the next recycle instance. The atomic bitops > might feel like more protection but it's just feels. In above case, the request may simultaneously enter requeue and end path. Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
Hi tejun On 12/16/2017 08:07 PM, Tejun Heo wrote: > After the recent updates to use generation number and state based > synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except > to avoid firing the same timeout multiple times. > > Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag > RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple > times. This removes atomic bitops from hot paths too. > > v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). > > v3: Added RQF_MQ_TIMEOUT_EXPIRED flag. > > Signed-off-by: Tejun Heo <t...@kernel.org> > Cc: "jianchao.wang" <jianchao.w.w...@oracle.com> > --- > block/blk-mq.c | 18 -- > block/blk-timeout.c| 1 + > include/linux/blkdev.h | 2 ++ > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 88baa82..47e722b 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -607,14 +607,12 @@ void blk_mq_complete_request(struct request *rq) >*/ > if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { > rcu_read_lock(); > - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate && > - !blk_mark_rq_complete(rq)) > + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) > __blk_mq_complete_request(rq); > rcu_read_unlock(); > } else { > srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); > - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate && > - !blk_mark_rq_complete(rq)) > + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate) > __blk_mq_complete_request(rq); > srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); > } It's worrying that even though the blk_mark_rq_complete() here is intended to synchronize with timeout path, but it indeed give the blk_mq_complete_request() the capability to exclude with itself. Maybe this capability should be reserved. Thanks Jianchao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html