v4.16-rc2: I/O hang with dm-rq + Kyber

2018-02-22 Thread Bart Van Assche
Hello Omar,

I/O hangs if I run the following command on top of kernel v4.16-rc2 + the
ib_srpt patch that adds RDMA/CM support:

srp-test/run_tests -c -d -r 10 -t 02-mq -e kyber

This does not happen with the deadline scheduler nor without a scheduler.
This test passed a few months ago. I have attached the output of the following
command to this e-mail:

(cd /sys/kernel/debug/block && grep -r .) | grep -v /poll_stat

Can you have a look?

Thanks,

Bart.dm-1/sched/async_depth:48
dm-1/sched/other_tokens:depth=64
dm-1/sched/other_tokens:busy=0
dm-1/sched/other_tokens:bits_per_word=64
dm-1/sched/other_tokens:map_nr=1
dm-1/sched/other_tokens:alloc_hint={33, 1415, 1816, 307}
dm-1/sched/other_tokens:wake_batch=8
dm-1/sched/other_tokens:wake_index=0
dm-1/sched/other_tokens:ws={
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:{.wait_cnt=8, .wait=inactive},
dm-1/sched/other_tokens:}
dm-1/sched/other_tokens:round_robin=0
dm-1/sched/sync_write_tokens:depth=128
dm-1/sched/sync_write_tokens:busy=128
dm-1/sched/sync_write_tokens:bits_per_word=64
dm-1/sched/sync_write_tokens:map_nr=2
dm-1/sched/sync_write_tokens:alloc_hint={85, 0, 0, 90}
dm-1/sched/sync_write_tokens:wake_batch=8
dm-1/sched/sync_write_tokens:wake_index=2
dm-1/sched/sync_write_tokens:ws={
dm-1/sched/sync_write_tokens:   {.wait_cnt=1, .wait=inactive},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=inactive},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=active},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=inactive},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=inactive},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=inactive},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=inactive},
dm-1/sched/sync_write_tokens:   {.wait_cnt=8, .wait=inactive},
dm-1/sched/sync_write_tokens:}
dm-1/sched/sync_write_tokens:round_robin=0
dm-1/sched/read_tokens:depth=256
dm-1/sched/read_tokens:busy=256
dm-1/sched/read_tokens:bits_per_word=64
dm-1/sched/read_tokens:map_nr=4
dm-1/sched/read_tokens:alloc_hint={89, 0, 90, 64}
dm-1/sched/read_tokens:wake_batch=8
dm-1/sched/read_tokens:wake_index=0
dm-1/sched/read_tokens:ws={
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=active},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens: {.wait_cnt=8, .wait=inactive},
dm-1/sched/read_tokens:}
dm-1/sched/read_tokens:round_robin=0
dm-1/hctx0/sched/batching:0
dm-1/hctx0/sched/cur_domain:READ
dm-1/hctx0/sched/other_waiting:0
dm-1/hctx0/sched/sync_write_waiting:1
dm-1/hctx0/sched/sync_write_rqs:8b7da875 {.op=WRITE, 
.cmd_flags=SYNC|META|PRIO, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x260/0x0, .tag=-1, .internal_tag=48}
dm-1/hctx0/sched/sync_write_rqs:4b242673 {.op=WRITE, 
.cmd_flags=SYNC|META|PRIO, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x1b4/0x0, .tag=-1, .internal_tag=46}
dm-1/hctx0/sched/sync_write_rqs:83108835 {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x248/0x0, .tag=-1, .internal_tag=183}
dm-1/hctx0/sched/sync_write_rqs:b01459f7 {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x2a4/0x0, .tag=-1, .internal_tag=54}
dm-1/hctx0/sched/sync_write_rqs:f56c232f {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x23c/0x0, .tag=-1, .internal_tag=62}
dm-1/hctx0/sched/sync_write_rqs:8ccce380 {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x290/0x0, .tag=-1, .internal_tag=28}
dm-1/hctx0/sched/sync_write_rqs:89435afc {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x15c/0x0, .tag=-1, .internal_tag=122}
dm-1/hctx0/sched/sync_write_rqs:e683a4e2 {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x268/0x0, .tag=-1, .internal_tag=127}
dm-1/hctx0/sched/sync_write_rqs:c4ada57b {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x2e4/0x0, .tag=-1, .internal_tag=64}
dm-1/hctx0/sched/sync_write_rqs:ab365972 {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELVPRIV|IO_STAT, .state=idle, 
.gstate=0x230/0x0, .tag=-1, .internal_tag=101}
dm-1/hctx0/sched/sync_write_rqs:c9e14c7b {.op=WRITE, 
.cmd_flags=SYNC|IDLE, .rq_flags=ELV

Re: v4.16-rc2: I/O hang with dm-rq + Kyber

2018-02-22 Thread Omar Sandoval
On Thu, Feb 22, 2018 at 09:10:23PM +, Bart Van Assche wrote:
> Hello Omar,
> 
> I/O hangs if I run the following command on top of kernel v4.16-rc2 + the
> ib_srpt patch that adds RDMA/CM support:
> 
> srp-test/run_tests -c -d -r 10 -t 02-mq -e kyber
> 
> This does not happen with the deadline scheduler nor without a scheduler.
> This test passed a few months ago. I have attached the output of the following
> command to this e-mail:
> 
> (cd /sys/kernel/debug/block && grep -r .) | grep -v /poll_stat
> 
> Can you have a look?
> 
> Thanks,
> 
> Bart.

Hey, Bart, thanks for the report. Can you clarify what the device
topology is w.r.t. the dm devices?


Re: v4.16-rc2: I/O hang with dm-rq + Kyber

2018-02-22 Thread Bart Van Assche
On Thu, 2018-02-22 at 14:42 -0800, Omar Sandoval wrote:
> On Thu, Feb 22, 2018 at 09:10:23PM +, Bart Van Assche wrote:
> > I/O hangs if I run the following command on top of kernel v4.16-rc2 + the
> > ib_srpt patch that adds RDMA/CM support:
> > 
> > srp-test/run_tests -c -d -r 10 -t 02-mq -e kyber
> > 
> > This does not happen with the deadline scheduler nor without a scheduler.
> > This test passed a few months ago. I have attached the output of the 
> > following
> > command to this e-mail:
> > 
> > (cd /sys/kernel/debug/block && grep -r .) | grep -v /poll_stat
> > 
> > Can you have a look?
> 
> Hey, Bart, thanks for the report. Can you clarify what the device
> topology is w.r.t. the dm devices?

Hello Omar,

The topology was nothing fancy: dm-rq in blk-mq mode was configured to use
one scsi-mq SRP-over-RoCE path. The multipath output is as follows:

# multipath -ll
mpathb (3600140572616d6469736b310) dm-0 LIO-ORG,IBLOCK
size=32M features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 4:0:0:0 sdc 8:32 active ready running

# for d in /sys/block/{dm-0,sdc}; do echo $(basename $d): 
$(<$d/queue/scheduler); done
dm-0: bfq [kyber] mq-deadline none
sdc: bfq [kyber] mq-deadline none

Bart.

Re: v4.16-rc2: I/O hang with dm-rq + Kyber

2018-02-23 Thread Ming Lei
On Thu, Feb 22, 2018 at 09:10:23PM +, Bart Van Assche wrote:
> Hello Omar,
> 
> I/O hangs if I run the following command on top of kernel v4.16-rc2 + the
> ib_srpt patch that adds RDMA/CM support:
> 
> srp-test/run_tests -c -d -r 10 -t 02-mq -e kyber
> 
> This does not happen with the deadline scheduler nor without a scheduler.
> This test passed a few months ago. I have attached the output of the following
> command to this e-mail:
> 
> (cd /sys/kernel/debug/block && grep -r .) | grep -v /poll_stat
> 
> Can you have a look?

The following 2 patch fixes one IO hang on kyber in my test on USB, could
you test it and see if your case can be fixed?

https://marc.info/?l=linux-block&m=151940022831994&w=2

BTW, from your attached log, looks it belongs to domain token leak
issue too, so it should have been addressed by above patches.

Thanks,
Ming


Re: v4.16-rc2: I/O hang with dm-rq + Kyber

2018-02-23 Thread Bart Van Assche
On Sat, 2018-02-24 at 00:26 +0800, Ming Lei wrote:
> The following 2 patch fixes one IO hang on kyber in my test on USB, could
> you test it and see if your case can be fixed?
> 
>   https://marc.info/?l=linux-block&m=151940022831994&w=2

These two patches are sufficient to make my test pass.

Thanks!

Bart.