Re: nvmf regression with mq-deadline

2017-02-27 Thread Jens Axboe
On Mon, Feb 27 2017, Sagi Grimberg wrote:
> 
> >Now I'm getting a NULL deref with nvme-rdma [1].
> >
> >For some reason blk_mq_tag_to_rq() is returning NULL on
> >tag 0x0 which is io queue connect.
> >
> >I'll try to see where this is coming from.
> >This does not happen with loop though...
> 
> That's because the loop driver does not rely on the
> cqe.command_id to resolve the submitted request (I'll
> fix that).
> 
> Looks like blk_mq_alloc_request_hctx was overlooked when
> the back assignment of the request to the rq_map...
> 
> This patch solves the issue for fabrics:
> --
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d84c66fb37b7..9611cd9920e9 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -312,6 +312,7 @@ struct request *blk_mq_alloc_request_hctx(struct
> request_queue *q, int rw,
> ret = -EWOULDBLOCK;
> goto out_queue_exit;
> }
> +   alloc_data.hctx->tags->rqs[rq->tag] = rq;
> 
> return rq;
> --
> 
> If its agreed with everyone I'll send a proper patch
> for this and the blk_mq_sched_setup fix?

Thanks Sagi, yes please send a proper patch for those two conditions!

-- 
Jens Axboe



Re: nvmf regression with mq-deadline

2017-02-27 Thread Sagi Grimberg

Hey Jens,

I'm getting a regression in nvme-rdma/nvme-loop with for-linus [1]
with a small script to trigger it.

The reason seems to be that the sched_tags does not take into account
the tag_set reserved tags.

This solves it for me, any objections on this?
--
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 98c7b061781e..46ca965fff5c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -454,7 +454,8 @@ int blk_mq_sched_setup(struct request_queue *q)
 */
ret = 0;
queue_for_each_hw_ctx(q, hctx, i) {
-   hctx->sched_tags = blk_mq_alloc_rq_map(set, i,
q->nr_requests, 0);
+   hctx->sched_tags = blk_mq_alloc_rq_map(set, i,
+   q->nr_requests, set->reserved_tags);
if (!hctx->sched_tags) {
ret = -ENOMEM;
break;
--


Now I'm getting a NULL deref with nvme-rdma [1].

For some reason blk_mq_tag_to_rq() is returning NULL on
tag 0x0 which is io queue connect.

I'll try to see where this is coming from.
This does not happen with loop though...

--
[   30.431889] nvme nvme0: creating 2 I/O queues.
[   30.465458] nvme nvme0: tag 0x0 on QP 0x84 not found
[   36.060168] BUG: unable to handle kernel NULL pointer dereference at 
0030

[   36.063277] IP: bt_iter+0x31/0x50
[   36.064088] PGD 0

[   36.064088] Oops:  [#1] SMP
[   36.064088] Modules linked in: nvme_rdma nvme_fabrics nvme_core 
mlx5_ib ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper 
cryptd i2c_piix4 joydev input_leds serio_raw parport_pc parport mac_hid 
ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp 
libiscsi sunrpc scsi_transport_iscsi autofs4 cirrus ttm drm_kms_helper 
syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops psmouse drm 
floppy ptp pata_acpi pps_core

[   36.064088] CPU: 0 PID: 186 Comm: kworker/0:1H Not tainted 4.10.0+ #115
[   36.064088] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014

[   36.064088] Workqueue: kblockd blk_mq_timeout_work
[   36.064088] task: 95f6393a0080 task.stack: b826803ac000
[   36.064088] RIP: 0010:bt_iter+0x31/0x50
[   36.064088] RSP: 0018:b826803afda0 EFLAGS: 00010202
[   36.064088] RAX: b826803afdd0 RBX: 95f63c036800 RCX: 
0001
[   36.064088] RDX: 95f635ff0798 RSI:  RDI: 
95f63c036800
[   36.064088] RBP: b826803afe18 R08:  R09: 
0001
[   36.064088] R10:  R11:  R12: 

[   36.064088] R13: 95f635d7c240 R14:  R15: 
95f63c47ff00
[   36.064088] FS:  () GS:95f63fc0() 
knlGS:

[   36.064088] CS:  0010 DS:  ES:  CR0: 80050033
[   36.064088] CR2: 0030 CR3: 3c8db000 CR4: 
003406f0

[   36.064088] Call Trace:
[   36.064088]  ? blk_mq_queue_tag_busy_iter+0x191/0x1d0
[   36.064088]  ? blk_mq_rq_timed_out+0x70/0x70
[   36.064088]  ? blk_mq_rq_timed_out+0x70/0x70
[   36.064088]  blk_mq_timeout_work+0xba/0x160
[   36.064088]  process_one_work+0x16b/0x480
[   36.064088]  worker_thread+0x4b/0x500
[   36.064088]  kthread+0x101/0x140
[   36.064088]  ? process_one_work+0x480/0x480
[   36.064088]  ? kthread_create_on_node+0x40/0x40
[   36.064088]  ret_from_fork+0x2c/0x40
[   36.064088] Code: 89 d0 48 8b 3a 0f b6 48 18 48 8b 97 08 01 00 00 84 
c9 75 03 03 72 04 48 8b 92 80 00 00 00 89 f6 48 8b 34 f2 48 8b 97 98 00 
00 00 <48> 39 56 30 74 06 b8 01 00 00 00 c3 55 48 8b 50 10 48 89 e5 ff

[   36.064088] RIP: bt_iter+0x31/0x50 RSP: b826803afda0
[   36.064088] CR2: 0030
[   36.064088] ---[ end trace 469df54df5f3cd87 ]---
--


Re: nvmf regression with mq-deadline

2017-02-27 Thread Sagi Grimberg



Now I'm getting a NULL deref with nvme-rdma [1].

For some reason blk_mq_tag_to_rq() is returning NULL on
tag 0x0 which is io queue connect.

I'll try to see where this is coming from.
This does not happen with loop though...


That's because the loop driver does not rely on the
cqe.command_id to resolve the submitted request (I'll
fix that).

Looks like blk_mq_alloc_request_hctx was overlooked when
the back assignment of the request to the rq_map...

This patch solves the issue for fabrics:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d84c66fb37b7..9611cd9920e9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -312,6 +312,7 @@ struct request *blk_mq_alloc_request_hctx(struct 
request_queue *q, int rw,

ret = -EWOULDBLOCK;
goto out_queue_exit;
}
+   alloc_data.hctx->tags->rqs[rq->tag] = rq;

return rq;
--

If its agreed with everyone I'll send a proper patch
for this and the blk_mq_sched_setup fix?