Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-14 Thread jianchao.wang


On 01/13/2018 05:19 AM, Bart Van Assche wrote:
> Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
> started so I'm not sure the state information is relevant. Anyway, I have 
> attached
> it to this e-mail. The most remarkable part is the following:
> 
> ./9ddfa913/requeue_list:9646711c {.op=READ, .state=idle, 
> gen=0x1
> 18, abort_gen=0x0, .cmd_flags=, .rq_flags=SORTED|1|SOFTBARRIER|IO_STAT, 
> complete
> =0, .tag=-1, .internal_tag=217}
> 
> The hexadecimal number at the start is the request_queue pointer (I modified 
> the
> blk-mq-debugfs code such that queues are registered with there address just 
> after
> creation and until a name is assigned). This is a dm-mpath queue.

There seems to be something wrong in hctx->nr_active.
./sde/hctx2/cpu2/completed:2 3
./sde/hctx2/cpu2/merged:0
./sde/hctx2/cpu2/dispatched:2 3
./sde/hctx2/active:5

./sde/hctx1/cpu1/completed:2 38
./sde/hctx1/cpu1/merged:0
./sde/hctx1/cpu1/dispatched:2 38
./sde/hctx1/active:40

./sde/hctx0/cpu0/completed:20 11
./sde/hctx0/cpu0/merged:0
./sde/hctx0/cpu0/dispatched:20 11
./sde/hctx0/active:31
...
./sdc/hctx1/cpu1/completed:14 13
./sdc/hctx1/cpu1/merged:0
./sdc/hctx1/cpu1/dispatched:14 13
./sdc/hctx1/active:21

./sdc/hctx0/cpu0/completed:1 41
./sdc/hctx0/cpu0/merged:0
./sdc/hctx0/cpu0/dispatched:1 41
./sdc/hctx0/active:36

Then hctx_may_queue return false.

Thanks
Jianchao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-08 Thread jianchao.wang
Hi tejun

Many thanks for your kindly response.

On 01/09/2018 11:37 AM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jan 09, 2018 at 11:08:04AM +0800, jianchao.wang wrote:
>>> But what'd prevent the completion reinitializing the request and then
>>> the actual completion path coming in and completing the request again?
>>
>> blk_mark_rq_complete() will gate and ensure there will be only one 
>> __blk_mq_complete_request() to be invoked.
> 
> Yeah, but then the complete flag will be cleared once completion is
> done and the request is reinitialized.

Yes, it is. What I mean is not to reserve blk_mark_rq_complete and  
REQ_ATOM_COMPLETE
usages in blk-mq but the side-effect after blk_mq_complete_request cannot 
exclude with itself.
As you said, the scene is racy and should be modified. :)

Thanks
Jianchao 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-08 Thread jianchao.wang
Hi tejun

Many thanks for you kindly response.

On 01/09/2018 01:27 AM, Tejun Heo wrote:
> Hello, Jianchao.
> 
> On Fri, Dec 22, 2017 at 12:02:20PM +0800, jianchao.wang wrote:
>>> On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote:
>>>> It's worrying that even though the blk_mark_rq_complete() here is
>>>> intended to synchronize with timeout path, but it indeed give the
>>>> blk_mq_complete_request() the capability to exclude with
>>
>> There could be scenario where the driver itself stop a request
>> itself with blk_mq_complete_request() or some other interface that
>> will invoke it, races with the normal completion path where a same
>> request comes.
> 
> But what'd prevent the completion reinitializing the request and then
> the actual completion path coming in and completing the request again?
> 
blk_mark_rq_complete() will gate and ensure there will be only one 
__blk_mq_complete_request() to be invoked.

>> For example:
>> a reset could be triggered through sysfs on nvme-rdma
>> Then the driver will cancel all the reqs, including in-flight ones.
>> nvme_rdma_reset_ctrl_work()
>> nvme_rdma_shutdown_ctrl()
>> >>>>
>> if (ctrl->ctrl.queue_count > 1) {
>> nvme_stop_queues(>ctrl); //quiesce the queue
>> blk_mq_tagset_busy_iter(>tag_set,
>> nvme_cancel_request, >ctrl); //invoke 
>> blk_mq_complete_request()
>> nvme_rdma_destroy_io_queues(ctrl, shutdown);
>> }
>> >>>>
>>
>> These operations could race with the normal completion path of in-flight 
>> ones.
>> It should drain all the in-flight ones first here. But there maybe some other
>> places similar with this.
> 
> If there are any such places, they should be using an interface which
> is propelry synchronized like blk_abort_request(), which btw is what
> libata already does.  Otherwise, it's racy with or without these
> patches.
Yes, it is that.

Thanks for you kindly response again.
Jianchao

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2017-12-21 Thread jianchao.wang
Sorry for my non-detailed description. 

On 12/21/2017 09:50 PM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote:
>> It's worrying that even though the blk_mark_rq_complete() here is intended 
>> to synchronize with
>> timeout path, but it indeed give the blk_mq_complete_request() the 
>> capability to exclude with 
There could be scenario where the driver itself stop a request itself with 
blk_mq_complete_request() or
some other interface that will invoke it, races with the normal completion path 
where a same request comes.
For example:
a reset could be triggered through sysfs on nvme-rdma
Then the driver will cancel all the reqs, including in-flight ones.
nvme_rdma_reset_ctrl_work()
nvme_rdma_shutdown_ctrl()
>>>>
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(>ctrl); //quiesce the queue
blk_mq_tagset_busy_iter(>tag_set,
nvme_cancel_request, >ctrl); //invoke 
blk_mq_complete_request()
nvme_rdma_destroy_io_queues(ctrl, shutdown);
}
>>>>

These operations could race with the normal completion path of in-flight ones.
It should drain all the in-flight ones first here. But there maybe some other
places similar with this.
 
>> itself. Maybe this capability should be reserved.
> 
> Can you explain further how that'd help?  The problem there is that if
> you have two competing completions, where one isn't a timeout, there's
> nothing synchronizing the reuse of the request.  IOW, the losing on
> can easily complete the next recycle instance.  The atomic bitops
> might feel like more protection but it's just feels.

In above case, the request may simultaneously enter requeue and end path.

Thanks
Jianchao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2017-12-20 Thread jianchao.wang
Hi tejun

On 12/16/2017 08:07 PM, Tejun Heo wrote:
> After the recent updates to use generation number and state based
> synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
> to avoid firing the same timeout multiple times.
> 
> Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
> RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
> times.  This removes atomic bitops from hot paths too.
> 
> v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().
> 
> v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.
> 
> Signed-off-by: Tejun Heo <t...@kernel.org>
> Cc: "jianchao.wang" <jianchao.w.w...@oracle.com>
> ---
>  block/blk-mq.c | 18 --
>  block/blk-timeout.c|  1 +
>  include/linux/blkdev.h |  2 ++
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 88baa82..47e722b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -607,14 +607,12 @@ void blk_mq_complete_request(struct request *rq)
>*/
>   if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
>   rcu_read_lock();
> - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate &&
> - !blk_mark_rq_complete(rq))
> + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
>   __blk_mq_complete_request(rq);
>   rcu_read_unlock();
>   } else {
>   srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
> - if (blk_mq_rq_aborted_gstate(rq) != rq->gstate &&
> - !blk_mark_rq_complete(rq))
> + if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
>   __blk_mq_complete_request(rq);
>   srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
>   }
It's worrying that even though the blk_mark_rq_complete() here is intended to 
synchronize with
timeout path, but it indeed give the blk_mq_complete_request() the capability 
to exclude with 
itself. Maybe this capability should be reserved.

Thanks
Jianchao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html