Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-14 Thread Ming Lei
On Sun, Jan 14, 2018 at 11:12 PM, jianchao.wang
 wrote:
>
>
> On 01/13/2018 05:19 AM, Bart Van Assche wrote:
>> Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
>> started so I'm not sure the state information is relevant. Anyway, I have 
>> attached
>> it to this e-mail. The most remarkable part is the following:
>>
>> ./9ddfa913/requeue_list:9646711c {.op=READ, .state=idle, 
>> gen=0x1
>> 18, abort_gen=0x0, .cmd_flags=, .rq_flags=SORTED|1|SOFTBARRIER|IO_STAT, 
>> complete
>> =0, .tag=-1, .internal_tag=217}
>>
>> The hexadecimal number at the start is the request_queue pointer (I modified 
>> the
>> blk-mq-debugfs code such that queues are registered with there address just 
>> after
>> creation and until a name is assigned). This is a dm-mpath queue.
>
> There seems to be something wrong in hctx->nr_active.

Then looks it is same with the issue I saw during starting multipathd, and the
following patch should fix that, if there isn't other issue.

https://marc.info/?l=linux-block=151586577400558=2

-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-14 Thread jianchao.wang


On 01/13/2018 05:19 AM, Bart Van Assche wrote:
> Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
> started so I'm not sure the state information is relevant. Anyway, I have 
> attached
> it to this e-mail. The most remarkable part is the following:
> 
> ./9ddfa913/requeue_list:9646711c {.op=READ, .state=idle, 
> gen=0x1
> 18, abort_gen=0x0, .cmd_flags=, .rq_flags=SORTED|1|SOFTBARRIER|IO_STAT, 
> complete
> =0, .tag=-1, .internal_tag=217}
> 
> The hexadecimal number at the start is the request_queue pointer (I modified 
> the
> blk-mq-debugfs code such that queues are registered with there address just 
> after
> creation and until a name is assigned). This is a dm-mpath queue.

There seems to be something wrong in hctx->nr_active.
./sde/hctx2/cpu2/completed:2 3
./sde/hctx2/cpu2/merged:0
./sde/hctx2/cpu2/dispatched:2 3
./sde/hctx2/active:5

./sde/hctx1/cpu1/completed:2 38
./sde/hctx1/cpu1/merged:0
./sde/hctx1/cpu1/dispatched:2 38
./sde/hctx1/active:40

./sde/hctx0/cpu0/completed:20 11
./sde/hctx0/cpu0/merged:0
./sde/hctx0/cpu0/dispatched:20 11
./sde/hctx0/active:31
...
./sdc/hctx1/cpu1/completed:14 13
./sdc/hctx1/cpu1/merged:0
./sdc/hctx1/cpu1/dispatched:14 13
./sdc/hctx1/active:21

./sdc/hctx0/cpu0/completed:1 41
./sdc/hctx0/cpu0/merged:0
./sdc/hctx0/cpu0/dispatched:1 41
./sdc/hctx0/active:36

Then hctx_may_queue return false.

Thanks
Jianchao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-13 Thread Ming Lei
On Sat, Jan 13, 2018 at 10:45:14PM +0800, Ming Lei wrote:
> On Fri, Jan 12, 2018 at 04:55:34PM -0500, Laurence Oberman wrote:
> > On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote:
> > > On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
> > > > Currently, blk-mq timeout path synchronizes against the usual
> > > > issue/completion path using a complex scheme involving atomic
> > > > bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> > > > rules.  Unfortunatley, it contains quite a few holes.
> > > 
> > > Hello Tejun,
> > > 
> > > With this patch series applied I see weird hangs in blk_mq_get_tag()
> > > when I
> > > run the srp-test software. If I pull Jens' latest for-next branch and
> > > revert
> > > this patch series then the srp-test software runs successfully. Note:
> > > if you
> > > don't have InfiniBand hardware available then you will need the
> > > RDMA/CM
> > > patches for the SRP initiator and target drivers that have been
> > > posted
> > > recently on the linux-rdma mailing list to run the srp-test software.
> > > 
> > > This is how I run the srp-test software in a VM:
> > > 
> > > ./run_tests -c -d -r 10
> > > 
> > > Here is an example of what SysRq-w reported when the hang occurred:
> > > 
> > > sysrq: SysRq : Show Blocked State
> > >  taskPC stack   pid father
> > > kworker/u8:0D12864 5  2 0x8000
> > > Workqueue: events_unbound sd_probe_async [sd_mod]
> > > Call Trace:
> > > ? __schedule+0x2b4/0xbb0
> > > schedule+0x2d/0x90
> > > io_schedule+0xd/0x30
> > > blk_mq_get_tag+0x169/0x290
> > > ? finish_wait+0x80/0x80
> > > blk_mq_get_request+0x16a/0x4f0
> > > blk_mq_alloc_request+0x59/0xc0
> > > blk_get_request_flags+0x3f/0x260
> > > scsi_execute+0x33/0x1e0 [scsi_mod]
> > > read_capacity_16.part.35+0x9c/0x460 [sd_mod]
> > > sd_revalidate_disk+0x14bb/0x1cb0 [sd_mod]
> > > sd_probe_async+0xf2/0x1a0 [sd_mod]
> > > process_one_work+0x21c/0x6d0
> > > worker_thread+0x35/0x380
> > > ? process_one_work+0x6d0/0x6d0
> > > kthread+0x117/0x130
> > > ? kthread_create_worker_on_cpu+0x40/0x40
> > > ret_from_fork+0x24/0x30
> > > systemd-udevd   D13672  1048285 0x0100
> > > Call Trace:
> > > ? __schedule+0x2b4/0xbb0
> > > schedule+0x2d/0x90
> > > io_schedule+0xd/0x30
> > > generic_file_read_iter+0x32f/0x970
> > > ? page_cache_tree_insert+0x100/0x100
> > > __vfs_read+0xcc/0x120
> > > vfs_read+0x96/0x140
> > > SyS_read+0x40/0xa0
> > > do_syscall_64+0x5f/0x1b0
> > > entry_SYSCALL64_slow_path+0x25/0x25
> > > RIP: 0033:0x7f8ce6d08d11
> > > RSP: 002b:7fff96dec288 EFLAGS: 0246 ORIG_RAX:
> > > 
> > > RAX: ffda RBX: 5651de7f6e10 RCX: 7f8ce6d08d11
> > > RDX: 0040 RSI: 5651de7f6e38 RDI: 0007
> > > RBP: 5651de7ea500 R08: 7f8ce6cf1c20 R09: 5651de7f6e10
> > > R10: 006f R11: 0246 R12: 01ff
> > > R13: 01ff0040 R14: 5651de7ea550 R15: 0040
> > > systemd-udevd   D13496  1049285 0x0100
> > > Call Trace:
> > > ? __schedule+0x2b4/0xbb0
> > > schedule+0x2d/0x90
> > > io_schedule+0xd/0x30
> > > blk_mq_get_tag+0x169/0x290
> > > ? finish_wait+0x80/0x80
> > > blk_mq_get_request+0x16a/0x4f0
> > > blk_mq_make_request+0x105/0x8e0
> > > ? generic_make_request+0xd6/0x3d0
> > > generic_make_request+0x103/0x3d0
> > > ? submit_bio+0x57/0x110
> > > submit_bio+0x57/0x110
> > > mpage_readpages+0x13b/0x160
> > > ? I_BDEV+0x10/0x10
> > > ? rcu_read_lock_sched_held+0x66/0x70
> > > ? __alloc_pages_nodemask+0x2e8/0x360
> > > __do_page_cache_readahead+0x2a4/0x370
> > > ? force_page_cache_readahead+0xaf/0x110
> > > force_page_cache_readahead+0xaf/0x110
> > > generic_file_read_iter+0x743/0x970
> > > ? find_held_lock+0x2d/0x90
> > > ? _raw_spin_unlock+0x29/0x40
> > > __vfs_read+0xcc/0x120
> > > vfs_read+0x96/0x140
> > > SyS_read+0x40/0xa0
> > > do_syscall_64+0x5f/0x1b0
> > > entry_SYSCALL64_slow_path+0x25/0x25
> > > RIP: 0033:0x7f8ce6d08d11
> > > RSP: 002b:7fff96dec8b8 EFLAGS: 0246 ORIG_RAX:
> > > 
> > > RAX: ffda RBX: 7f8ce7085010 RCX: 7f8ce6d08d11
> > > RDX: 0004 RSI: 7f8ce7085038 RDI: 000f
> > > RBP: 5651de7ec840 R08:  R09: 7f8ce7085010
> > > R10: 7f8ce7085028 R11: 0246 R12: 
> > > R13: 0004 R14: 5651de7ec890 R15: 0004
> > > systemd-udevd   D13672  1055285 0x0100
> > > Call Trace:
> > > ? __schedule+0x2b4/0xbb0
> > > schedule+0x2d/0x90
> > > io_schedule+0xd/0x30
> > > blk_mq_get_tag+0x169/0x290
> > > ? finish_wait+0x80/0x80
> > > blk_mq_get_request+0x16a/0x4f0
> > > blk_mq_make_request+0x105/0x8e0
> > > ? generic_make_request+0xd6/0x3d0
> > > generic_make_request+0x103/0x3d0
> > > ? submit_bio+0x57/0x110
> > > submit_bio+0x57/0x110
> > > mpage_readpages+0x13b/0x160
> > > ? I_BDEV+0x10/0x10
> > > ? rcu_read_lock_sched_held+0x66/0x70
> > > ? 

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 04:55:34PM -0500, Laurence Oberman wrote:
> On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote:
> > On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
> > > Currently, blk-mq timeout path synchronizes against the usual
> > > issue/completion path using a complex scheme involving atomic
> > > bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> > > rules.  Unfortunatley, it contains quite a few holes.
> > 
> > Hello Tejun,
> > 
> > With this patch series applied I see weird hangs in blk_mq_get_tag()
> > when I
> > run the srp-test software. If I pull Jens' latest for-next branch and
> > revert
> > this patch series then the srp-test software runs successfully. Note:
> > if you
> > don't have InfiniBand hardware available then you will need the
> > RDMA/CM
> > patches for the SRP initiator and target drivers that have been
> > posted
> > recently on the linux-rdma mailing list to run the srp-test software.
> > 
> > This is how I run the srp-test software in a VM:
> > 
> > ./run_tests -c -d -r 10
> > 
> > Here is an example of what SysRq-w reported when the hang occurred:
> > 
> > sysrq: SysRq : Show Blocked State
> >  taskPC stack   pid father
> > kworker/u8:0D12864 5  2 0x8000
> > Workqueue: events_unbound sd_probe_async [sd_mod]
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > blk_mq_get_tag+0x169/0x290
> > ? finish_wait+0x80/0x80
> > blk_mq_get_request+0x16a/0x4f0
> > blk_mq_alloc_request+0x59/0xc0
> > blk_get_request_flags+0x3f/0x260
> > scsi_execute+0x33/0x1e0 [scsi_mod]
> > read_capacity_16.part.35+0x9c/0x460 [sd_mod]
> > sd_revalidate_disk+0x14bb/0x1cb0 [sd_mod]
> > sd_probe_async+0xf2/0x1a0 [sd_mod]
> > process_one_work+0x21c/0x6d0
> > worker_thread+0x35/0x380
> > ? process_one_work+0x6d0/0x6d0
> > kthread+0x117/0x130
> > ? kthread_create_worker_on_cpu+0x40/0x40
> > ret_from_fork+0x24/0x30
> > systemd-udevd   D13672  1048285 0x0100
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > generic_file_read_iter+0x32f/0x970
> > ? page_cache_tree_insert+0x100/0x100
> > __vfs_read+0xcc/0x120
> > vfs_read+0x96/0x140
> > SyS_read+0x40/0xa0
> > do_syscall_64+0x5f/0x1b0
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x7f8ce6d08d11
> > RSP: 002b:7fff96dec288 EFLAGS: 0246 ORIG_RAX:
> > 
> > RAX: ffda RBX: 5651de7f6e10 RCX: 7f8ce6d08d11
> > RDX: 0040 RSI: 5651de7f6e38 RDI: 0007
> > RBP: 5651de7ea500 R08: 7f8ce6cf1c20 R09: 5651de7f6e10
> > R10: 006f R11: 0246 R12: 01ff
> > R13: 01ff0040 R14: 5651de7ea550 R15: 0040
> > systemd-udevd   D13496  1049285 0x0100
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > blk_mq_get_tag+0x169/0x290
> > ? finish_wait+0x80/0x80
> > blk_mq_get_request+0x16a/0x4f0
> > blk_mq_make_request+0x105/0x8e0
> > ? generic_make_request+0xd6/0x3d0
> > generic_make_request+0x103/0x3d0
> > ? submit_bio+0x57/0x110
> > submit_bio+0x57/0x110
> > mpage_readpages+0x13b/0x160
> > ? I_BDEV+0x10/0x10
> > ? rcu_read_lock_sched_held+0x66/0x70
> > ? __alloc_pages_nodemask+0x2e8/0x360
> > __do_page_cache_readahead+0x2a4/0x370
> > ? force_page_cache_readahead+0xaf/0x110
> > force_page_cache_readahead+0xaf/0x110
> > generic_file_read_iter+0x743/0x970
> > ? find_held_lock+0x2d/0x90
> > ? _raw_spin_unlock+0x29/0x40
> > __vfs_read+0xcc/0x120
> > vfs_read+0x96/0x140
> > SyS_read+0x40/0xa0
> > do_syscall_64+0x5f/0x1b0
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x7f8ce6d08d11
> > RSP: 002b:7fff96dec8b8 EFLAGS: 0246 ORIG_RAX:
> > 
> > RAX: ffda RBX: 7f8ce7085010 RCX: 7f8ce6d08d11
> > RDX: 0004 RSI: 7f8ce7085038 RDI: 000f
> > RBP: 5651de7ec840 R08:  R09: 7f8ce7085010
> > R10: 7f8ce7085028 R11: 0246 R12: 
> > R13: 0004 R14: 5651de7ec890 R15: 0004
> > systemd-udevd   D13672  1055285 0x0100
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > blk_mq_get_tag+0x169/0x290
> > ? finish_wait+0x80/0x80
> > blk_mq_get_request+0x16a/0x4f0
> > blk_mq_make_request+0x105/0x8e0
> > ? generic_make_request+0xd6/0x3d0
> > generic_make_request+0x103/0x3d0
> > ? submit_bio+0x57/0x110
> > submit_bio+0x57/0x110
> > mpage_readpages+0x13b/0x160
> > ? I_BDEV+0x10/0x10
> > ? rcu_read_lock_sched_held+0x66/0x70
> > ? __alloc_pages_nodemask+0x2e8/0x360
> > __do_page_cache_readahead+0x2a4/0x370
> > ? force_page_cache_readahead+0xaf/0x110
> > force_page_cache_readahead+0xaf/0x110
> > generic_file_read_iter+0x743/0x970
> > __vfs_read+0xcc/0x120
> > vfs_read+0x96/0x140
> > SyS_read+0x40/0xa0
> > do_syscall_64+0x5f/0x1b0
> > 

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 04:55:34PM -0500, Laurence Oberman wrote:
> On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote:
> > On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
> > > Currently, blk-mq timeout path synchronizes against the usual
> > > issue/completion path using a complex scheme involving atomic
> > > bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> > > rules.  Unfortunatley, it contains quite a few holes.
> > 
> > Hello Tejun,
> > 
> > With this patch series applied I see weird hangs in blk_mq_get_tag()
> > when I
> > run the srp-test software. If I pull Jens' latest for-next branch and
> > revert
> > this patch series then the srp-test software runs successfully. Note:
> > if you
> > don't have InfiniBand hardware available then you will need the
> > RDMA/CM
> > patches for the SRP initiator and target drivers that have been
> > posted
> > recently on the linux-rdma mailing list to run the srp-test software.
> > 
> > This is how I run the srp-test software in a VM:
> > 
> > ./run_tests -c -d -r 10
> > 
> > Here is an example of what SysRq-w reported when the hang occurred:
> > 
> > sysrq: SysRq : Show Blocked State
> >  taskPC stack   pid father
> > kworker/u8:0D12864 5  2 0x8000
> > Workqueue: events_unbound sd_probe_async [sd_mod]
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > blk_mq_get_tag+0x169/0x290
> > ? finish_wait+0x80/0x80
> > blk_mq_get_request+0x16a/0x4f0
> > blk_mq_alloc_request+0x59/0xc0
> > blk_get_request_flags+0x3f/0x260
> > scsi_execute+0x33/0x1e0 [scsi_mod]
> > read_capacity_16.part.35+0x9c/0x460 [sd_mod]
> > sd_revalidate_disk+0x14bb/0x1cb0 [sd_mod]
> > sd_probe_async+0xf2/0x1a0 [sd_mod]
> > process_one_work+0x21c/0x6d0
> > worker_thread+0x35/0x380
> > ? process_one_work+0x6d0/0x6d0
> > kthread+0x117/0x130
> > ? kthread_create_worker_on_cpu+0x40/0x40
> > ret_from_fork+0x24/0x30
> > systemd-udevd   D13672  1048285 0x0100
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > generic_file_read_iter+0x32f/0x970
> > ? page_cache_tree_insert+0x100/0x100
> > __vfs_read+0xcc/0x120
> > vfs_read+0x96/0x140
> > SyS_read+0x40/0xa0
> > do_syscall_64+0x5f/0x1b0
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x7f8ce6d08d11
> > RSP: 002b:7fff96dec288 EFLAGS: 0246 ORIG_RAX:
> > 
> > RAX: ffda RBX: 5651de7f6e10 RCX: 7f8ce6d08d11
> > RDX: 0040 RSI: 5651de7f6e38 RDI: 0007
> > RBP: 5651de7ea500 R08: 7f8ce6cf1c20 R09: 5651de7f6e10
> > R10: 006f R11: 0246 R12: 01ff
> > R13: 01ff0040 R14: 5651de7ea550 R15: 0040
> > systemd-udevd   D13496  1049285 0x0100
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > blk_mq_get_tag+0x169/0x290
> > ? finish_wait+0x80/0x80
> > blk_mq_get_request+0x16a/0x4f0
> > blk_mq_make_request+0x105/0x8e0
> > ? generic_make_request+0xd6/0x3d0
> > generic_make_request+0x103/0x3d0
> > ? submit_bio+0x57/0x110
> > submit_bio+0x57/0x110
> > mpage_readpages+0x13b/0x160
> > ? I_BDEV+0x10/0x10
> > ? rcu_read_lock_sched_held+0x66/0x70
> > ? __alloc_pages_nodemask+0x2e8/0x360
> > __do_page_cache_readahead+0x2a4/0x370
> > ? force_page_cache_readahead+0xaf/0x110
> > force_page_cache_readahead+0xaf/0x110
> > generic_file_read_iter+0x743/0x970
> > ? find_held_lock+0x2d/0x90
> > ? _raw_spin_unlock+0x29/0x40
> > __vfs_read+0xcc/0x120
> > vfs_read+0x96/0x140
> > SyS_read+0x40/0xa0
> > do_syscall_64+0x5f/0x1b0
> > entry_SYSCALL64_slow_path+0x25/0x25
> > RIP: 0033:0x7f8ce6d08d11
> > RSP: 002b:7fff96dec8b8 EFLAGS: 0246 ORIG_RAX:
> > 
> > RAX: ffda RBX: 7f8ce7085010 RCX: 7f8ce6d08d11
> > RDX: 0004 RSI: 7f8ce7085038 RDI: 000f
> > RBP: 5651de7ec840 R08:  R09: 7f8ce7085010
> > R10: 7f8ce7085028 R11: 0246 R12: 
> > R13: 0004 R14: 5651de7ec890 R15: 0004
> > systemd-udevd   D13672  1055285 0x0100
> > Call Trace:
> > ? __schedule+0x2b4/0xbb0
> > schedule+0x2d/0x90
> > io_schedule+0xd/0x30
> > blk_mq_get_tag+0x169/0x290
> > ? finish_wait+0x80/0x80
> > blk_mq_get_request+0x16a/0x4f0
> > blk_mq_make_request+0x105/0x8e0
> > ? generic_make_request+0xd6/0x3d0
> > generic_make_request+0x103/0x3d0
> > ? submit_bio+0x57/0x110
> > submit_bio+0x57/0x110
> > mpage_readpages+0x13b/0x160
> > ? I_BDEV+0x10/0x10
> > ? rcu_read_lock_sched_held+0x66/0x70
> > ? __alloc_pages_nodemask+0x2e8/0x360
> > __do_page_cache_readahead+0x2a4/0x370
> > ? force_page_cache_readahead+0xaf/0x110
> > force_page_cache_readahead+0xaf/0x110
> > generic_file_read_iter+0x743/0x970
> > __vfs_read+0xcc/0x120
> > vfs_read+0x96/0x140
> > SyS_read+0x40/0xa0
> > do_syscall_64+0x5f/0x1b0
> > 

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-12 Thread Jens Axboe
On 1/12/18 2:19 PM, Bart Van Assche wrote:
> On Fri, 2018-01-12 at 14:07 -0700, Jens Axboe wrote:
>> You're really not making it easy for folks to run this :-)
> 
> My hope is that the ib_srp and ib_srpt patches will be accepted upstream soon.
> As long as these are not upstream, anyone who wants to retrieve these patches
> is welcome to clone 
> https://github.com/bvanassche/linux/tree/block-scsi-for-next,
> a kernel tree with all my pending patches.
> 
>> Do you have the matching blk-mq debugfs output for the device?
> 
> Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
> started so I'm not sure the state information is relevant. Anyway, I have 
> attached
> it to this e-mail. The most remarkable part is the following:
> 
> ./9ddfa913/requeue_list:9646711c {.op=READ, .state=idle, 
> gen=0x1
> 18, abort_gen=0x0, .cmd_flags=, .rq_flags=SORTED|1|SOFTBARRIER|IO_STAT, 
> complete
> =0, .tag=-1, .internal_tag=217}

Two things come to mind here:

1) We forgot to add RQF_STARTED to the debugfs bits, I just rectified
   that.

2) You are using a scheduler (which one?). The request was inserted, and
   retrieved by the driver, then requeued. After this requeue,
   apparently nothing happened. The queue should have been re-run, but
   that didn't happen. What are the queue/hctx states?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-12 Thread Laurence Oberman
On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
> > Currently, blk-mq timeout path synchronizes against the usual
> > issue/completion path using a complex scheme involving atomic
> > bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> > rules.  Unfortunatley, it contains quite a few holes.
> 
> Hello Tejun,
> 
> With this patch series applied I see weird hangs in blk_mq_get_tag()
> when I
> run the srp-test software. If I pull Jens' latest for-next branch and
> revert
> this patch series then the srp-test software runs successfully. Note:
> if you
> don't have InfiniBand hardware available then you will need the
> RDMA/CM
> patches for the SRP initiator and target drivers that have been
> posted
> recently on the linux-rdma mailing list to run the srp-test software.
> 
> This is how I run the srp-test software in a VM:
> 
> ./run_tests -c -d -r 10
> 
> Here is an example of what SysRq-w reported when the hang occurred:
> 
> sysrq: SysRq : Show Blocked State
>  taskPC stack   pid father
> kworker/u8:0D12864 5  2 0x8000
> Workqueue: events_unbound sd_probe_async [sd_mod]
> Call Trace:
> ? __schedule+0x2b4/0xbb0
> schedule+0x2d/0x90
> io_schedule+0xd/0x30
> blk_mq_get_tag+0x169/0x290
> ? finish_wait+0x80/0x80
> blk_mq_get_request+0x16a/0x4f0
> blk_mq_alloc_request+0x59/0xc0
> blk_get_request_flags+0x3f/0x260
> scsi_execute+0x33/0x1e0 [scsi_mod]
> read_capacity_16.part.35+0x9c/0x460 [sd_mod]
> sd_revalidate_disk+0x14bb/0x1cb0 [sd_mod]
> sd_probe_async+0xf2/0x1a0 [sd_mod]
> process_one_work+0x21c/0x6d0
> worker_thread+0x35/0x380
> ? process_one_work+0x6d0/0x6d0
> kthread+0x117/0x130
> ? kthread_create_worker_on_cpu+0x40/0x40
> ret_from_fork+0x24/0x30
> systemd-udevd   D13672  1048285 0x0100
> Call Trace:
> ? __schedule+0x2b4/0xbb0
> schedule+0x2d/0x90
> io_schedule+0xd/0x30
> generic_file_read_iter+0x32f/0x970
> ? page_cache_tree_insert+0x100/0x100
> __vfs_read+0xcc/0x120
> vfs_read+0x96/0x140
> SyS_read+0x40/0xa0
> do_syscall_64+0x5f/0x1b0
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f8ce6d08d11
> RSP: 002b:7fff96dec288 EFLAGS: 0246 ORIG_RAX:
> 
> RAX: ffda RBX: 5651de7f6e10 RCX: 7f8ce6d08d11
> RDX: 0040 RSI: 5651de7f6e38 RDI: 0007
> RBP: 5651de7ea500 R08: 7f8ce6cf1c20 R09: 5651de7f6e10
> R10: 006f R11: 0246 R12: 01ff
> R13: 01ff0040 R14: 5651de7ea550 R15: 0040
> systemd-udevd   D13496  1049285 0x0100
> Call Trace:
> ? __schedule+0x2b4/0xbb0
> schedule+0x2d/0x90
> io_schedule+0xd/0x30
> blk_mq_get_tag+0x169/0x290
> ? finish_wait+0x80/0x80
> blk_mq_get_request+0x16a/0x4f0
> blk_mq_make_request+0x105/0x8e0
> ? generic_make_request+0xd6/0x3d0
> generic_make_request+0x103/0x3d0
> ? submit_bio+0x57/0x110
> submit_bio+0x57/0x110
> mpage_readpages+0x13b/0x160
> ? I_BDEV+0x10/0x10
> ? rcu_read_lock_sched_held+0x66/0x70
> ? __alloc_pages_nodemask+0x2e8/0x360
> __do_page_cache_readahead+0x2a4/0x370
> ? force_page_cache_readahead+0xaf/0x110
> force_page_cache_readahead+0xaf/0x110
> generic_file_read_iter+0x743/0x970
> ? find_held_lock+0x2d/0x90
> ? _raw_spin_unlock+0x29/0x40
> __vfs_read+0xcc/0x120
> vfs_read+0x96/0x140
> SyS_read+0x40/0xa0
> do_syscall_64+0x5f/0x1b0
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f8ce6d08d11
> RSP: 002b:7fff96dec8b8 EFLAGS: 0246 ORIG_RAX:
> 
> RAX: ffda RBX: 7f8ce7085010 RCX: 7f8ce6d08d11
> RDX: 0004 RSI: 7f8ce7085038 RDI: 000f
> RBP: 5651de7ec840 R08:  R09: 7f8ce7085010
> R10: 7f8ce7085028 R11: 0246 R12: 
> R13: 0004 R14: 5651de7ec890 R15: 0004
> systemd-udevd   D13672  1055285 0x0100
> Call Trace:
> ? __schedule+0x2b4/0xbb0
> schedule+0x2d/0x90
> io_schedule+0xd/0x30
> blk_mq_get_tag+0x169/0x290
> ? finish_wait+0x80/0x80
> blk_mq_get_request+0x16a/0x4f0
> blk_mq_make_request+0x105/0x8e0
> ? generic_make_request+0xd6/0x3d0
> generic_make_request+0x103/0x3d0
> ? submit_bio+0x57/0x110
> submit_bio+0x57/0x110
> mpage_readpages+0x13b/0x160
> ? I_BDEV+0x10/0x10
> ? rcu_read_lock_sched_held+0x66/0x70
> ? __alloc_pages_nodemask+0x2e8/0x360
> __do_page_cache_readahead+0x2a4/0x370
> ? force_page_cache_readahead+0xaf/0x110
> force_page_cache_readahead+0xaf/0x110
> generic_file_read_iter+0x743/0x970
> __vfs_read+0xcc/0x120
> vfs_read+0x96/0x140
> SyS_read+0x40/0xa0
> do_syscall_64+0x5f/0x1b0
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f8ce6d08d11
> RSP: 002b:7fff96dec848 EFLAGS: 0246 ORIG_RAX:
> 
> RAX: ffda RBX: 5651de7ec300 RCX: 7f8ce6d08d11
> RDX: 0100 RSI: 5651de7ec328 RDI: 000f
> RBP: 5651de7ea500 R08:  R09: 5651de7ec300
> R10: 

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-12 Thread Bart Van Assche
On Fri, 2018-01-12 at 14:07 -0700, Jens Axboe wrote:
> You're really not making it easy for folks to run this :-)

My hope is that the ib_srp and ib_srpt patches will be accepted upstream soon.
As long as these are not upstream, anyone who wants to retrieve these patches
is welcome to clone 
https://github.com/bvanassche/linux/tree/block-scsi-for-next,
a kernel tree with all my pending patches.

> Do you have the matching blk-mq debugfs output for the device?

Sorry but I only retrieved the blk-mq debugfs several minutes after the hang
started so I'm not sure the state information is relevant. Anyway, I have 
attached
it to this e-mail. The most remarkable part is the following:

./9ddfa913/requeue_list:9646711c {.op=READ, .state=idle, gen=0x1
18, abort_gen=0x0, .cmd_flags=, .rq_flags=SORTED|1|SOFTBARRIER|IO_STAT, complete
=0, .tag=-1, .internal_tag=217}

The hexadecimal number at the start is the request_queue pointer (I modified the
blk-mq-debugfs code such that queues are registered with there address just 
after
creation and until a name is assigned). This is a dm-mpath queue.

Bart../9ddfa913/hctx0/cpu3/completed:29 0
./9ddfa913/hctx0/cpu3/merged:0
./9ddfa913/hctx0/cpu3/dispatched:30 0
./9ddfa913/hctx0/cpu2/completed:0 0
./9ddfa913/hctx0/cpu2/merged:0
./9ddfa913/hctx0/cpu2/dispatched:0 0
./9ddfa913/hctx0/cpu1/completed:0 0
./9ddfa913/hctx0/cpu1/merged:0
./9ddfa913/hctx0/cpu1/dispatched:0 0
./9ddfa913/hctx0/cpu0/completed:0 0
./9ddfa913/hctx0/cpu0/merged:0
./9ddfa913/hctx0/cpu0/dispatched:0 0
./9ddfa913/hctx0/active:0
./9ddfa913/hctx0/run:92
./9ddfa913/hctx0/queued:30
./9ddfa913/hctx0/dispatched:   00
./9ddfa913/hctx0/dispatched:   197
./9ddfa913/hctx0/dispatched:   20
./9ddfa913/hctx0/dispatched:   40
./9ddfa913/hctx0/dispatched:   80
./9ddfa913/hctx0/dispatched:  160
./9ddfa913/hctx0/dispatched:  32+   0
./9ddfa913/hctx0/io_poll:considered=0
./9ddfa913/hctx0/io_poll:invoked=0
./9ddfa913/hctx0/io_poll:success=0
./9ddfa913/hctx0/sched_tags_bitmap::      
  
./9ddfa913/hctx0/sched_tags_bitmap:0010:      
0002  
./9ddfa913/hctx0/sched_tags:nr_tags=256
./9ddfa913/hctx0/sched_tags:nr_reserved_tags=0
./9ddfa913/hctx0/sched_tags:active_queues=0
./9ddfa913/hctx0/sched_tags:
./9ddfa913/hctx0/sched_tags:bitmap_tags:
./9ddfa913/hctx0/sched_tags:depth=256
./9ddfa913/hctx0/sched_tags:busy=1
./9ddfa913/hctx0/sched_tags:bits_per_word=64
./9ddfa913/hctx0/sched_tags:map_nr=4
./9ddfa913/hctx0/sched_tags:alloc_hint={45, 56, 144, 218}
./9ddfa913/hctx0/sched_tags:wake_batch=8
./9ddfa913/hctx0/sched_tags:wake_index=0
./9ddfa913/hctx0/sched_tags:ws={
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:{.wait_cnt=8, .wait=inactive},
./9ddfa913/hctx0/sched_tags:}
./9ddfa913/hctx0/sched_tags:round_robin=0
./9ddfa913/hctx0/tags_bitmap::       
 
./9ddfa913/hctx0/tags_bitmap:0010:       
 
./9ddfa913/hctx0/tags_bitmap:0020:       
 
./9ddfa913/hctx0/tags_bitmap:0030:       
 
./9ddfa913/hctx0/tags_bitmap:0040:       
 
./9ddfa913/hctx0/tags_bitmap:0050:       
 
./9ddfa913/hctx0/tags_bitmap:0060:       
 
./9ddfa913/hctx0/tags_bitmap:0070:       
 
./9ddfa913/hctx0/tags_bitmap:0080:       
 
./9ddfa913/hctx0/tags_bitmap:0090:       
 
./9ddfa913/hctx0/tags_bitmap:00a0:       
 
./9ddfa913/hctx0/tags_bitmap:00b0:       
 
./9ddfa913/hctx0/tags_bitmap:00c0:       
 
./9ddfa913/hctx0/tags_bitmap:00d0:       
 
./9ddfa913/hctx0/tags_bitmap:00e0:      

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-12 Thread Jens Axboe
On 1/12/18 1:57 PM, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
>> Currently, blk-mq timeout path synchronizes against the usual
>> issue/completion path using a complex scheme involving atomic
>> bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
>> rules.  Unfortunatley, it contains quite a few holes.
> 
> Hello Tejun,
> 
> With this patch series applied I see weird hangs in blk_mq_get_tag() when I
> run the srp-test software. If I pull Jens' latest for-next branch and revert
> this patch series then the srp-test software runs successfully. Note: if you
> don't have InfiniBand hardware available then you will need the RDMA/CM
> patches for the SRP initiator and target drivers that have been posted
> recently on the linux-rdma mailing list to run the srp-test software.

You're really not making it easy for folks to run this :-)

> This is how I run the srp-test software in a VM:
> 
> ./run_tests -c -d -r 10
> 
> Here is an example of what SysRq-w reported when the hang occurred:
> 
> sysrq: SysRq : Show Blocked State
>  taskPC stack   pid father
> kworker/u8:0D12864 5  2 0x8000
> Workqueue: events_unbound sd_probe_async [sd_mod]
> Call Trace:
> ? __schedule+0x2b4/0xbb0
> schedule+0x2d/0x90
> io_schedule+0xd/0x30
> blk_mq_get_tag+0x169/0x290
> ? finish_wait+0x80/0x80
> blk_mq_get_request+0x16a/0x4f0
> blk_mq_alloc_request+0x59/0xc0
> blk_get_request_flags+0x3f/0x260
> scsi_execute+0x33/0x1e0 [scsi_mod]
> read_capacity_16.part.35+0x9c/0x460 [sd_mod]
> sd_revalidate_disk+0x14bb/0x1cb0 [sd_mod]
> sd_probe_async+0xf2/0x1a0 [sd_mod]
> process_one_work+0x21c/0x6d0
> worker_thread+0x35/0x380
> ? process_one_work+0x6d0/0x6d0
> kthread+0x117/0x130
> ? kthread_create_worker_on_cpu+0x40/0x40
> ret_from_fork+0x24/0x30
> systemd-udevd   D13672  1048285 0x0100
> Call Trace:
> ? __schedule+0x2b4/0xbb0
> schedule+0x2d/0x90
> io_schedule+0xd/0x30
> generic_file_read_iter+0x32f/0x970
> ? page_cache_tree_insert+0x100/0x100
> __vfs_read+0xcc/0x120
> vfs_read+0x96/0x140
> SyS_read+0x40/0xa0
> do_syscall_64+0x5f/0x1b0
> entry_SYSCALL64_slow_path+0x25/0x25

Do you have the matching blk-mq debugfs output for the device?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-12 Thread Bart Van Assche
On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote:
> Currently, blk-mq timeout path synchronizes against the usual
> issue/completion path using a complex scheme involving atomic
> bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> rules.  Unfortunatley, it contains quite a few holes.

Hello Tejun,

With this patch series applied I see weird hangs in blk_mq_get_tag() when I
run the srp-test software. If I pull Jens' latest for-next branch and revert
this patch series then the srp-test software runs successfully. Note: if you
don't have InfiniBand hardware available then you will need the RDMA/CM
patches for the SRP initiator and target drivers that have been posted
recently on the linux-rdma mailing list to run the srp-test software.

This is how I run the srp-test software in a VM:

./run_tests -c -d -r 10

Here is an example of what SysRq-w reported when the hang occurred:

sysrq: SysRq : Show Blocked State
 taskPC stack   pid father
kworker/u8:0D12864 5  2 0x8000
Workqueue: events_unbound sd_probe_async [sd_mod]
Call Trace:
? __schedule+0x2b4/0xbb0
schedule+0x2d/0x90
io_schedule+0xd/0x30
blk_mq_get_tag+0x169/0x290
? finish_wait+0x80/0x80
blk_mq_get_request+0x16a/0x4f0
blk_mq_alloc_request+0x59/0xc0
blk_get_request_flags+0x3f/0x260
scsi_execute+0x33/0x1e0 [scsi_mod]
read_capacity_16.part.35+0x9c/0x460 [sd_mod]
sd_revalidate_disk+0x14bb/0x1cb0 [sd_mod]
sd_probe_async+0xf2/0x1a0 [sd_mod]
process_one_work+0x21c/0x6d0
worker_thread+0x35/0x380
? process_one_work+0x6d0/0x6d0
kthread+0x117/0x130
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x24/0x30
systemd-udevd   D13672  1048285 0x0100
Call Trace:
? __schedule+0x2b4/0xbb0
schedule+0x2d/0x90
io_schedule+0xd/0x30
generic_file_read_iter+0x32f/0x970
? page_cache_tree_insert+0x100/0x100
__vfs_read+0xcc/0x120
vfs_read+0x96/0x140
SyS_read+0x40/0xa0
do_syscall_64+0x5f/0x1b0
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f8ce6d08d11
RSP: 002b:7fff96dec288 EFLAGS: 0246 ORIG_RAX: 
RAX: ffda RBX: 5651de7f6e10 RCX: 7f8ce6d08d11
RDX: 0040 RSI: 5651de7f6e38 RDI: 0007
RBP: 5651de7ea500 R08: 7f8ce6cf1c20 R09: 5651de7f6e10
R10: 006f R11: 0246 R12: 01ff
R13: 01ff0040 R14: 5651de7ea550 R15: 0040
systemd-udevd   D13496  1049285 0x0100
Call Trace:
? __schedule+0x2b4/0xbb0
schedule+0x2d/0x90
io_schedule+0xd/0x30
blk_mq_get_tag+0x169/0x290
? finish_wait+0x80/0x80
blk_mq_get_request+0x16a/0x4f0
blk_mq_make_request+0x105/0x8e0
? generic_make_request+0xd6/0x3d0
generic_make_request+0x103/0x3d0
? submit_bio+0x57/0x110
submit_bio+0x57/0x110
mpage_readpages+0x13b/0x160
? I_BDEV+0x10/0x10
? rcu_read_lock_sched_held+0x66/0x70
? __alloc_pages_nodemask+0x2e8/0x360
__do_page_cache_readahead+0x2a4/0x370
? force_page_cache_readahead+0xaf/0x110
force_page_cache_readahead+0xaf/0x110
generic_file_read_iter+0x743/0x970
? find_held_lock+0x2d/0x90
? _raw_spin_unlock+0x29/0x40
__vfs_read+0xcc/0x120
vfs_read+0x96/0x140
SyS_read+0x40/0xa0
do_syscall_64+0x5f/0x1b0
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f8ce6d08d11
RSP: 002b:7fff96dec8b8 EFLAGS: 0246 ORIG_RAX: 
RAX: ffda RBX: 7f8ce7085010 RCX: 7f8ce6d08d11
RDX: 0004 RSI: 7f8ce7085038 RDI: 000f
RBP: 5651de7ec840 R08:  R09: 7f8ce7085010
R10: 7f8ce7085028 R11: 0246 R12: 
R13: 0004 R14: 5651de7ec890 R15: 0004
systemd-udevd   D13672  1055285 0x0100
Call Trace:
? __schedule+0x2b4/0xbb0
schedule+0x2d/0x90
io_schedule+0xd/0x30
blk_mq_get_tag+0x169/0x290
? finish_wait+0x80/0x80
blk_mq_get_request+0x16a/0x4f0
blk_mq_make_request+0x105/0x8e0
? generic_make_request+0xd6/0x3d0
generic_make_request+0x103/0x3d0
? submit_bio+0x57/0x110
submit_bio+0x57/0x110
mpage_readpages+0x13b/0x160
? I_BDEV+0x10/0x10
? rcu_read_lock_sched_held+0x66/0x70
? __alloc_pages_nodemask+0x2e8/0x360
__do_page_cache_readahead+0x2a4/0x370
? force_page_cache_readahead+0xaf/0x110
force_page_cache_readahead+0xaf/0x110
generic_file_read_iter+0x743/0x970
__vfs_read+0xcc/0x120
vfs_read+0x96/0x140
SyS_read+0x40/0xa0
do_syscall_64+0x5f/0x1b0
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f8ce6d08d11
RSP: 002b:7fff96dec848 EFLAGS: 0246 ORIG_RAX: 
RAX: ffda RBX: 5651de7ec300 RCX: 7f8ce6d08d11
RDX: 0100 RSI: 5651de7ec328 RDI: 000f
RBP: 5651de7ea500 R08:  R09: 5651de7ec300
R10: 5651de7ec318 R11: 0246 R12: 01ffe000
R13: 01ffe100 R14: 5651de7ea550 R15: 0100

Please let me know if you need more information.

Bart.N�r��yb�X��ǧv�^�)޺{.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-09 Thread Jens Axboe
On 1/9/18 9:29 AM, Tejun Heo wrote:
> Hello,
> 
> Changes from [v4]
> 
> - Comments added.  Patch description updated.
> 
> Changes from [v3]
> 
> - Rebased on top of for-4.16/block.
> 
> - Integrated Jens's hctx_[un]lock() factoring patch and refreshed the
>   patches accordingly.
> 
> - Added comment explaining the use of hctx_lock() instead of
>   rcu_read_lock() in completion path.
> 
> Changes from [v2]
> 
> - Possible extended looping around seqcount and u64_stat_sync fixed.
> 
> - Misplaced MQ_RQ_IDLE state setting fixed.
> 
> - RQF_MQ_TIMEOUT_EXPIRED added to prevent firing the same timeout
>   multiple times.
> 
> - s/queue_rq_src/srcu/ patch added.
> 
> - Other misc changes.
> 
> Changes from [v1]
> 
> - BLK_EH_RESET_TIMER handling fixed.
> 
> - s/request->gstate_seqc/request->gstate_seq/
> 
> - READ_ONCE() added to blk_mq_rq_udpate_state().
> 
> - Removed left over blk_clear_rq_complete() invocation from
>   blk_mq_rq_timed_out().
> 
> Currently, blk-mq timeout path synchronizes against the usual
> issue/completion path using a complex scheme involving atomic
> bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
> rules.  Unfortunatley, it contains quite a few holes.
> 
> It's pretty easy to make blk_mq_check_expired() terminate a later
> instance of a request.  If we induce 5 sec delay before
> time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
> 2s, and issue back-to-back large IOs, blk-mq starts timing out
> requests spuriously pretty quickly.  Nothing actually timed out.  It
> just made the call on a recycle instance of a request and then
> terminated a later instance long after the original instance finished.
> The scenario isn't theoretical either.
> 
> This patchset replaces the broken synchronization mechanism with a RCU
> and generation number based one.  Please read the patch description of
> the second path for more details.

Applied for 4.16, and added a patch to silence that false positive
srcu_idx for hctx_unlock() that got reported.

This grows the request a bit, which sucks, but probably unavoidable.
There's some room for improvement with a hole or two, however.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-09 Thread Tejun Heo
Hello,

Changes from [v4]

- Comments added.  Patch description updated.

Changes from [v3]

- Rebased on top of for-4.16/block.

- Integrated Jens's hctx_[un]lock() factoring patch and refreshed the
  patches accordingly.

- Added comment explaining the use of hctx_lock() instead of
  rcu_read_lock() in completion path.

Changes from [v2]

- Possible extended looping around seqcount and u64_stat_sync fixed.

- Misplaced MQ_RQ_IDLE state setting fixed.

- RQF_MQ_TIMEOUT_EXPIRED added to prevent firing the same timeout
  multiple times.

- s/queue_rq_src/srcu/ patch added.

- Other misc changes.

Changes from [v1]

- BLK_EH_RESET_TIMER handling fixed.

- s/request->gstate_seqc/request->gstate_seq/

- READ_ONCE() added to blk_mq_rq_udpate_state().

- Removed left over blk_clear_rq_complete() invocation from
  blk_mq_rq_timed_out().

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules.  Unfortunatley, it contains quite a few holes.

It's pretty easy to make blk_mq_check_expired() terminate a later
instance of a request.  If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly.  Nothing actually timed out.  It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patchset replaces the broken synchronization mechanism with a RCU
and generation number based one.  Please read the patch description of
the second path for more details.

This patchset contains the following eight patches.

0001-blk-mq-move-hctx-lock-unlock-into-a-helper.patch
0002-blk-mq-protect-completion-path-with-RCU.patch
0003-blk-mq-replace-timeout-synchronization-with-a-RCU-an.patch
0004-blk-mq-use-blk_mq_rq_state-instead-of-testing-REQ_AT.patch
0005-blk-mq-make-blk_abort_request-trigger-timeout-path.patch
0006-blk-mq-remove-REQ_ATOM_COMPLETE-usages-from-blk-mq.patch
0007-blk-mq-remove-REQ_ATOM_STARTED.patch
0008-blk-mq-rename-blk_mq_hw_ctx-queue_rq_srcu-to-srcu.patch

and is available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-mq-timeout-v5

diffstat follows.  Thanks.

 block/blk-core.c   |2 
 block/blk-mq-debugfs.c |4 
 block/blk-mq.c |  346 +
 block/blk-mq.h |   49 ++
 block/blk-timeout.c|   16 +-
 block/blk.h|7 
 include/linux/blk-mq.h |3 
 include/linux/blkdev.h |   25 +++
 8 files changed, 294 insertions(+), 158 deletions(-)

--
tejun

[v4] http://lkml.kernel.org/r/20180108191542.379478-1...@kernel.org
[v3] http://lkml.kernel.org/r/20171216120726.517153-1...@kernel.org
[v2] http://lkml.kernel.org/r/20171010155441.753966-1...@kernel.org
[v1] http://lkml.kernel.org/r/20171209192525.982030-1...@kernel.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html