Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-31 Thread Paolo Valente

> Il giorno 06 mar 2017, alle ore 08:43, Markus Trippelsdorf 
>  ha scritto:
> 
> On 2017.03.04 at 17:01 +0100, Paolo Valente wrote:
>> Hi,
>> at last, here is my first patch series meant for merging. It adds BFQ
>> to blk-mq. Don't worry, in this message I won't bore you again with
>> the wonderful properties of BFQ :)
> 
> I gave BFQ a quick try. Unfortunately it hangs when I try to delete
> btrfs snapshots:
> 
> root   124  0.0  0.0  0 0 ?D07:19   0:03 
> [btrfs-cleaner]  
> root   125  0.0  0.0  0 0 ?D07:19   0:00 
> [btrfs-transacti]
> 
> [ 4372.880116] sysrq: SysRq : Show Blocked State
> [ 4372.880125]   taskPC stack   pid father
> [ 4372.880148] btrfs-cleaner   D0   124  2 0x
> [ 4372.880156] Call Trace:
> [ 4372.880166]  ? __schedule+0x160/0x7c0
> [ 4372.880174]  ? io_schedule+0x64/0xe0
> [ 4372.880179]  ? wait_on_page_bit+0x7a/0x100
> [ 4372.880183]  ? devm_memunmap+0x40/0x40
> [ 4372.880189]  ? read_extent_buffer_pages+0x25c/0x2c0
> [ 4372.880195]  ? run_one_async_done+0xc0/0xc0
> [ 4372.880200]  ? btree_read_extent_buffer_pages+0x60/0x2e0
> [ 4372.880206]  ? read_tree_block+0x2c/0x60
> [ 4372.880211]  ? read_block_for_search.isra.38+0xec/0x3a0
> [ 4372.880217]  ? btrfs_search_slot+0x214/0xbc0
> [ 4372.880221]  ? lookup_inline_extent_backref+0xfb/0x8c0
> [ 4372.880225]  ? __btrfs_free_extent.isra.74+0xe9/0xdc0
> [ 4372.880231]  ? btrfs_merge_delayed_refs+0x57/0x6e0
> [ 4372.880235]  ? __btrfs_run_delayed_refs+0x60d/0x1340
> [ 4372.880239]  ? btrfs_run_delayed_refs+0x64/0x280
> [ 4372.880243]  ? btrfs_should_end_transaction+0x3b/0xa0
> [ 4372.880247]  ? btrfs_drop_snapshot+0x3b2/0x800
> [ 4372.880251]  ? __schedule+0x168/0x7c0
> [ 4372.880254]  ? btrfs_clean_one_deleted_snapshot+0xa4/0xe0
> [ 4372.880259]  ? cleaner_kthread+0x13a/0x180
> [ 4372.880264]  ? btree_invalidatepage+0xc0/0xc0
> [ 4372.880268]  ? kthread+0x144/0x180
> [ 4372.880272]  ? kthread_flush_work_fn+0x20/0x20
> [ 4372.880277]  ? ret_from_fork+0x23/0x30
> [ 4372.880280] btrfs-transacti D0   125  2 0x
> [ 4372.880285] Call Trace:
> [ 4372.880290]  ? __schedule+0x160/0x7c0
> [ 4372.880295]  ? io_schedule+0x64/0xe0
> [ 4372.880300]  ? wait_on_page_bit_common.constprop.57+0x160/0x180
> [ 4372.880303]  ? devm_memunmap+0x40/0x40
> [ 4372.880307]  ? __filemap_fdatawait_range+0xd3/0x140
> [ 4372.880311]  ? clear_state_bit.constprop.82+0xf7/0x180
> [ 4372.880315]  ? __clear_extent_bit.constprop.79+0x138/0x3c0
> [ 4372.880319]  ? filemap_fdatawait_range+0x9/0x60
> [ 4372.880323]  ? __btrfs_wait_marked_extents.isra.18+0xc1/0x100
> [ 4372.880327]  ? btrfs_write_and_wait_marked_extents.constprop.23+0x49/0x80
> [ 4372.880331]  ? btrfs_commit_transaction+0x8e1/0xb00
> [ 4372.880334]  ? join_transaction.constprop.24+0x10/0xa0
> [ 4372.880340]  ? wake_bit_function+0x60/0x60
> [ 4372.880345]  ? transaction_kthread+0x185/0x1a0
> [ 4372.880350]  ? btrfs_cleanup_transaction+0x500/0x500
> [ 4372.880354]  ? kthread+0x144/0x180
> [ 4372.880358]  ? kthread_flush_work_fn+0x20/0x20
> [ 4372.880362]  ? ret_from_fork+0x23/0x30
> [ 4372.880367] ntpdD0   175  1 0x0004
> [ 4372.880372] Call Trace:
> [ 4372.880375]  ? __schedule+0x160/0x7c0
> [ 4372.880379]  ? schedule_preempt_disabled+0x2d/0x80
> [ 4372.880383]  ? __mutex_lock.isra.5+0x17b/0x4c0
> [ 4372.880386]  ? wait_current_trans+0x15/0xc0
> [ 4372.880391]  ? btrfs_free_path+0xe/0x20
> [ 4372.880395]  ? btrfs_pin_log_trans+0x14/0x40
> [ 4372.880400]  ? btrfs_rename2+0x28e/0x19c0
> [ 4372.880404]  ? path_init+0x187/0x3e0
> [ 4372.880407]  ? unlazy_walk+0x4b/0x100
> [ 4372.880410]  ? terminate_walk+0x8d/0x100
> [ 4372.880414]  ? filename_parentat+0x1e9/0x2c0
> [ 4372.880420]  ? __kmalloc_track_caller+0xc4/0x100
> [ 4372.880424]  ? vfs_rename+0x33f/0x7e0
> [ 4372.880428]  ? SYSC_renameat2+0x53c/0x680
> [ 4372.880433]  ? entry_SYSCALL_64_fastpath+0x13/0x94
> [ 4372.880437] fcron   D0   178  1 0x
> [ 4372.880441] Call Trace:
> [ 4372.880445]  ? __schedule+0x160/0x7c0
> [ 4372.880448]  ? schedule_preempt_disabled+0x2d/0x80
> [ 4372.880452]  ? __mutex_lock.isra.5+0x17b/0x4c0
> [ 4372.880458]  ? pagevec_lookup_tag+0x18/0x20
> [ 4372.880462]  ? btrfs_log_dentry_safe+0x4cd/0xac0
> [ 4372.880466]  ? btrfs_start_transaction+0x249/0x460
> [ 4372.880470]  ? btrfs_sync_file+0x288/0x3c0
> [ 4372.880475]  ? btrfs_file_write_iter+0x3a9/0x4e0
> [ 4372.880479]  ? vfs_write+0x26c/0x2c0
> [ 4372.880483]  ? SyS_write+0x3d/0xa0
> [ 4372.880486]  ? SyS_fchown+0x7b/0xa0
> [ 4372.880491]  ? entry_SYSCALL_64_fastpath+0x13/0x94
> [ 4372.880508] kworker/u8:8D0   759  2 0x
> [ 4372.880518] Workqueue: btrfs-submit btrfs_submit_helper
> [ 4372.880520] Call Trace:
> [ 4372.880524]  ? __schedule+0x160/0x7c0
> [ 4372.880529]  ? io_schedule+0x64/0xe0
> [ 4372.880534]  ? blk_mq_get_tag+0x212/0x320
> [ 4372.880538]  ? wake_bit_function+0x60/0x60
> [ 

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-31 Thread Paolo Valente

> Il giorno 06 mar 2017, alle ore 08:43, Markus Trippelsdorf 
>  ha scritto:
> 
> On 2017.03.04 at 17:01 +0100, Paolo Valente wrote:
>> Hi,
>> at last, here is my first patch series meant for merging. It adds BFQ
>> to blk-mq. Don't worry, in this message I won't bore you again with
>> the wonderful properties of BFQ :)
> 
> I gave BFQ a quick try. Unfortunately it hangs when I try to delete
> btrfs snapshots:
> 
> root   124  0.0  0.0  0 0 ?D07:19   0:03 
> [btrfs-cleaner]  
> root   125  0.0  0.0  0 0 ?D07:19   0:00 
> [btrfs-transacti]
> 
> [ 4372.880116] sysrq: SysRq : Show Blocked State
> [ 4372.880125]   taskPC stack   pid father
> [ 4372.880148] btrfs-cleaner   D0   124  2 0x
> [ 4372.880156] Call Trace:
> [ 4372.880166]  ? __schedule+0x160/0x7c0
> [ 4372.880174]  ? io_schedule+0x64/0xe0
> [ 4372.880179]  ? wait_on_page_bit+0x7a/0x100
> [ 4372.880183]  ? devm_memunmap+0x40/0x40
> [ 4372.880189]  ? read_extent_buffer_pages+0x25c/0x2c0
> [ 4372.880195]  ? run_one_async_done+0xc0/0xc0
> [ 4372.880200]  ? btree_read_extent_buffer_pages+0x60/0x2e0
> [ 4372.880206]  ? read_tree_block+0x2c/0x60
> [ 4372.880211]  ? read_block_for_search.isra.38+0xec/0x3a0
> [ 4372.880217]  ? btrfs_search_slot+0x214/0xbc0
> [ 4372.880221]  ? lookup_inline_extent_backref+0xfb/0x8c0
> [ 4372.880225]  ? __btrfs_free_extent.isra.74+0xe9/0xdc0
> [ 4372.880231]  ? btrfs_merge_delayed_refs+0x57/0x6e0
> [ 4372.880235]  ? __btrfs_run_delayed_refs+0x60d/0x1340
> [ 4372.880239]  ? btrfs_run_delayed_refs+0x64/0x280
> [ 4372.880243]  ? btrfs_should_end_transaction+0x3b/0xa0
> [ 4372.880247]  ? btrfs_drop_snapshot+0x3b2/0x800
> [ 4372.880251]  ? __schedule+0x168/0x7c0
> [ 4372.880254]  ? btrfs_clean_one_deleted_snapshot+0xa4/0xe0
> [ 4372.880259]  ? cleaner_kthread+0x13a/0x180
> [ 4372.880264]  ? btree_invalidatepage+0xc0/0xc0
> [ 4372.880268]  ? kthread+0x144/0x180
> [ 4372.880272]  ? kthread_flush_work_fn+0x20/0x20
> [ 4372.880277]  ? ret_from_fork+0x23/0x30
> [ 4372.880280] btrfs-transacti D0   125  2 0x
> [ 4372.880285] Call Trace:
> [ 4372.880290]  ? __schedule+0x160/0x7c0
> [ 4372.880295]  ? io_schedule+0x64/0xe0
> [ 4372.880300]  ? wait_on_page_bit_common.constprop.57+0x160/0x180
> [ 4372.880303]  ? devm_memunmap+0x40/0x40
> [ 4372.880307]  ? __filemap_fdatawait_range+0xd3/0x140
> [ 4372.880311]  ? clear_state_bit.constprop.82+0xf7/0x180
> [ 4372.880315]  ? __clear_extent_bit.constprop.79+0x138/0x3c0
> [ 4372.880319]  ? filemap_fdatawait_range+0x9/0x60
> [ 4372.880323]  ? __btrfs_wait_marked_extents.isra.18+0xc1/0x100
> [ 4372.880327]  ? btrfs_write_and_wait_marked_extents.constprop.23+0x49/0x80
> [ 4372.880331]  ? btrfs_commit_transaction+0x8e1/0xb00
> [ 4372.880334]  ? join_transaction.constprop.24+0x10/0xa0
> [ 4372.880340]  ? wake_bit_function+0x60/0x60
> [ 4372.880345]  ? transaction_kthread+0x185/0x1a0
> [ 4372.880350]  ? btrfs_cleanup_transaction+0x500/0x500
> [ 4372.880354]  ? kthread+0x144/0x180
> [ 4372.880358]  ? kthread_flush_work_fn+0x20/0x20
> [ 4372.880362]  ? ret_from_fork+0x23/0x30
> [ 4372.880367] ntpdD0   175  1 0x0004
> [ 4372.880372] Call Trace:
> [ 4372.880375]  ? __schedule+0x160/0x7c0
> [ 4372.880379]  ? schedule_preempt_disabled+0x2d/0x80
> [ 4372.880383]  ? __mutex_lock.isra.5+0x17b/0x4c0
> [ 4372.880386]  ? wait_current_trans+0x15/0xc0
> [ 4372.880391]  ? btrfs_free_path+0xe/0x20
> [ 4372.880395]  ? btrfs_pin_log_trans+0x14/0x40
> [ 4372.880400]  ? btrfs_rename2+0x28e/0x19c0
> [ 4372.880404]  ? path_init+0x187/0x3e0
> [ 4372.880407]  ? unlazy_walk+0x4b/0x100
> [ 4372.880410]  ? terminate_walk+0x8d/0x100
> [ 4372.880414]  ? filename_parentat+0x1e9/0x2c0
> [ 4372.880420]  ? __kmalloc_track_caller+0xc4/0x100
> [ 4372.880424]  ? vfs_rename+0x33f/0x7e0
> [ 4372.880428]  ? SYSC_renameat2+0x53c/0x680
> [ 4372.880433]  ? entry_SYSCALL_64_fastpath+0x13/0x94
> [ 4372.880437] fcron   D0   178  1 0x
> [ 4372.880441] Call Trace:
> [ 4372.880445]  ? __schedule+0x160/0x7c0
> [ 4372.880448]  ? schedule_preempt_disabled+0x2d/0x80
> [ 4372.880452]  ? __mutex_lock.isra.5+0x17b/0x4c0
> [ 4372.880458]  ? pagevec_lookup_tag+0x18/0x20
> [ 4372.880462]  ? btrfs_log_dentry_safe+0x4cd/0xac0
> [ 4372.880466]  ? btrfs_start_transaction+0x249/0x460
> [ 4372.880470]  ? btrfs_sync_file+0x288/0x3c0
> [ 4372.880475]  ? btrfs_file_write_iter+0x3a9/0x4e0
> [ 4372.880479]  ? vfs_write+0x26c/0x2c0
> [ 4372.880483]  ? SyS_write+0x3d/0xa0
> [ 4372.880486]  ? SyS_fchown+0x7b/0xa0
> [ 4372.880491]  ? entry_SYSCALL_64_fastpath+0x13/0x94
> [ 4372.880508] kworker/u8:8D0   759  2 0x
> [ 4372.880518] Workqueue: btrfs-submit btrfs_submit_helper
> [ 4372.880520] Call Trace:
> [ 4372.880524]  ? __schedule+0x160/0x7c0
> [ 4372.880529]  ? io_schedule+0x64/0xe0
> [ 4372.880534]  ? blk_mq_get_tag+0x212/0x320
> [ 4372.880538]  ? wake_bit_function+0x60/0x60
> [ 4372.880544]  ? 

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-20 Thread Jens Axboe
On 03/18/2017 01:46 PM, Bart Van Assche wrote:
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
 Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
  ha scritto:
 (...) what should
 a developer do who only has access to a small subset of all the storage
 devices that are supported by the Linux kernel and hence who can not run 
 the
 benchmark against every supported storage device?
>>
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
> 
> Hello Linus,
> 
> Do you mean relying on the community to test other storage devices
> before or after a patch is upstream? Relying on the community to file
> bug reports after a patch is upstream would be wrong. The Linux kernel
> should not be used for experiments. As you know patches that are sent
> upstream should not introduce regressions.

I think there are two main aspects to this:

1) Stability issues
2) Performance issues

For stability issues, obviously we expect BFQ to be bug free when
merged. In practical matters, this means that it doesn't have any known
pending issues, since we obviously cannot guarantee that the code is Bug
Free in general.

>From a performance perspective, using BFQ is absolutely known to
introduce regressions when used on certain types of storage. It works
well on single queue rotating devices. It'll tank your NVMe device
performance. I don't think think this is necessarily a problem. By
default, BFQ will not be enabled anywhere. It's a scheduler that is
available in the system, and users can opt in if they desire to use BFQ.
I'm expecting distros to do the right thing with udev rules here.

> My primary concern about BFQ is that it is a very complicated I/O
> scheduler and also that the concepts used internally in that I/O
> scheduler are far away from the concepts we are used to when reasoning
> about I/O devices. I'm concerned that this will make the BFQ I/O
> scheduler hard to maintain.

That is also my main concern, which is why I'm trying to go through the
code and suggest areas where it can be improved. It'd be great if it was
more modular, for instance, it's somewhat cumbersome to wade through
nine thousand lines of code. It's my hope that we can improve this
aspect of it.

Understanding the actual algorithms is a separate issue. But in that
regard I do think that BFQ is more forgiving than CFQ, since there are
actual papers detailing how it works. If implemented as cleanly as
possible, we can't really make it any easier to understand. It's not a
trivial topic.

-- 
Jens Axboe



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-20 Thread Jens Axboe
On 03/18/2017 01:46 PM, Bart Van Assche wrote:
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
 Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
  ha scritto:
 (...) what should
 a developer do who only has access to a small subset of all the storage
 devices that are supported by the Linux kernel and hence who can not run 
 the
 benchmark against every supported storage device?
>>
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
> 
> Hello Linus,
> 
> Do you mean relying on the community to test other storage devices
> before or after a patch is upstream? Relying on the community to file
> bug reports after a patch is upstream would be wrong. The Linux kernel
> should not be used for experiments. As you know patches that are sent
> upstream should not introduce regressions.

I think there are two main aspects to this:

1) Stability issues
2) Performance issues

For stability issues, obviously we expect BFQ to be bug free when
merged. In practical matters, this means that it doesn't have any known
pending issues, since we obviously cannot guarantee that the code is Bug
Free in general.

>From a performance perspective, using BFQ is absolutely known to
introduce regressions when used on certain types of storage. It works
well on single queue rotating devices. It'll tank your NVMe device
performance. I don't think think this is necessarily a problem. By
default, BFQ will not be enabled anywhere. It's a scheduler that is
available in the system, and users can opt in if they desire to use BFQ.
I'm expecting distros to do the right thing with udev rules here.

> My primary concern about BFQ is that it is a very complicated I/O
> scheduler and also that the concepts used internally in that I/O
> scheduler are far away from the concepts we are used to when reasoning
> about I/O devices. I'm concerned that this will make the BFQ I/O
> scheduler hard to maintain.

That is also my main concern, which is why I'm trying to go through the
code and suggest areas where it can be improved. It'd be great if it was
more modular, for instance, it's somewhat cumbersome to wade through
nine thousand lines of code. It's my hope that we can improve this
aspect of it.

Understanding the actual algorithms is a separate issue. But in that
regard I do think that BFQ is more forgiving than CFQ, since there are
actual papers detailing how it works. If implemented as cleanly as
possible, we can't really make it any easier to understand. It's not a
trivial topic.

-- 
Jens Axboe



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-19 Thread Paolo Valente

> Il giorno 18 mar 2017, alle ore 13:46, Bart Van Assche 
>  ha scritto:
> 
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
 Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
  ha scritto:
 (...) what should
 a developer do who only has access to a small subset of all the storage
 devices that are supported by the Linux kernel and hence who can not run 
 the
 benchmark against every supported storage device?
>> 
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
> 
> Hello Linus,
> 
> Do you mean relying on the community to test other storage devices before
> or after a patch is upstream? Relying on the community to file bug reports
> after a patch is upstream would be wrong. The Linux kernel should not be
> used for experiments. As you know patches that are sent upstream should
> not introduce regressions.
> 
> My primary concern about BFQ is that it is a very complicated I/O scheduler
> and also that the concepts used internally in that I/O scheduler are far
> away from the concepts we are used to when reasoning about I/O devices.

Hi Bart,
could you elaborate a little bit more on this?  To hopefully help you
highlight where the problem is, here is a summary of what the patches
introduce.

1.  BFQ engine.  This initial piece of code has been obtained mainly
by copying (verbatim) CFQ, replacing all cfq_ prefixes with bfq_,
replacing the round-robin algorithm at the hearth of BFQ with wf2q+, a
well-know and widely studied variant of the classical wfq algorithm,
and, finally, by adapting the code around the new engine to accomodate
the latter.  In particular, budgets, measured in number of sectors, are
used instead of time slices, to achieve bandwidth fairness.

2. Support for cgroups and hierarchical scheduling.

3.  Heuristics to improve service quality and boost throughput.  These
additional pieces are introduced and documented one by one.  The most
complex are: improving responsiveness by privileging the I/O of
interactive applications, improving audio/video playback/streaming by
privileging their I/O, boosting throughput with interleaved I/O (such
as KVM I/O) by merging the queues associated with the processes doing
such an I/O, boosting throughput for applications that span several
processes.

Which of these contributions contain deviations from the I/O concepts
you are used to, and what are these deviations?

Thanks,
Paolo


> I'm
> concerned that this will make the BFQ I/O scheduler hard to maintain.
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-19 Thread Paolo Valente

> Il giorno 18 mar 2017, alle ore 13:46, Bart Van Assche 
>  ha scritto:
> 
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
 Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
  ha scritto:
 (...) what should
 a developer do who only has access to a small subset of all the storage
 devices that are supported by the Linux kernel and hence who can not run 
 the
 benchmark against every supported storage device?
>> 
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
> 
> Hello Linus,
> 
> Do you mean relying on the community to test other storage devices before
> or after a patch is upstream? Relying on the community to file bug reports
> after a patch is upstream would be wrong. The Linux kernel should not be
> used for experiments. As you know patches that are sent upstream should
> not introduce regressions.
> 
> My primary concern about BFQ is that it is a very complicated I/O scheduler
> and also that the concepts used internally in that I/O scheduler are far
> away from the concepts we are used to when reasoning about I/O devices.

Hi Bart,
could you elaborate a little bit more on this?  To hopefully help you
highlight where the problem is, here is a summary of what the patches
introduce.

1.  BFQ engine.  This initial piece of code has been obtained mainly
by copying (verbatim) CFQ, replacing all cfq_ prefixes with bfq_,
replacing the round-robin algorithm at the hearth of BFQ with wf2q+, a
well-know and widely studied variant of the classical wfq algorithm,
and, finally, by adapting the code around the new engine to accomodate
the latter.  In particular, budgets, measured in number of sectors, are
used instead of time slices, to achieve bandwidth fairness.

2. Support for cgroups and hierarchical scheduling.

3.  Heuristics to improve service quality and boost throughput.  These
additional pieces are introduced and documented one by one.  The most
complex are: improving responsiveness by privileging the I/O of
interactive applications, improving audio/video playback/streaming by
privileging their I/O, boosting throughput with interleaved I/O (such
as KVM I/O) by merging the queues associated with the processes doing
such an I/O, boosting throughput for applications that span several
processes.

Which of these contributions contain deviations from the I/O concepts
you are used to, and what are these deviations?

Thanks,
Paolo


> I'm
> concerned that this will make the BFQ I/O scheduler hard to maintain.
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Linus Walleij
On Sat, Mar 18, 2017 at 6:46 PM, Bart Van Assche
 wrote:
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
>> > > Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>> > >  ha scritto:
>> > > (...) what should
>> > > a developer do who only has access to a small subset of all the storage
>> > > devices that are supported by the Linux kernel and hence who can not run 
>> > > the
>> > > benchmark against every supported storage device?
>>
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
>
> Hello Linus,
>
> Do you mean relying on the community to test other storage devices before
> or after a patch is upstream?

I guess they should test it when it is in linux-next?

> Relying on the community to file bug reports
> after a patch is upstream would be wrong. The Linux kernel should not be
> used for experiments. As you know patches that are sent upstream should
> not introduce regressions.

Yeah still people introduce regressions and we have 7-8 release
candidates for each kernel because of this. Humans have flaws
I guess.

> My primary concern about BFQ is that it is a very complicated I/O scheduler
> and also that the concepts used internally in that I/O scheduler are far
> away from the concepts we are used to when reasoning about I/O devices. I'm
> concerned that this will make the BFQ I/O scheduler hard to maintain.

I understand that. Let's follow all rules of thumb that make code
easy to maintain.

We have pretty broad agreement on what makes code easy to
maintain on the syntactic level, checkpatch and some manual inspection
easily gives that.

I think where we need the most brainshare is in how to make semantics
maintainable. It's no fun reading terse code and try to figure out how
the developer writing it was thinking, so let's focus on anything we do not
understand and make it understandable, it seems Paolo is onto this task
for what I can tell.

Yours,
Linus Walleij


Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Linus Walleij
On Sat, Mar 18, 2017 at 6:46 PM, Bart Van Assche
 wrote:
> On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
>> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>>  wrote:
>> > > Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>> > >  ha scritto:
>> > > (...) what should
>> > > a developer do who only has access to a small subset of all the storage
>> > > devices that are supported by the Linux kernel and hence who can not run 
>> > > the
>> > > benchmark against every supported storage device?
>>
>> Don't we use the community for that? We are dependent on people
>> downloading and testing our code eventually, I mean sure it's good if
>> we make some reasonable effort to test changes we do, but we are
>> only humans, and we get corrected by the experience of other humans.
>
> Hello Linus,
>
> Do you mean relying on the community to test other storage devices before
> or after a patch is upstream?

I guess they should test it when it is in linux-next?

> Relying on the community to file bug reports
> after a patch is upstream would be wrong. The Linux kernel should not be
> used for experiments. As you know patches that are sent upstream should
> not introduce regressions.

Yeah still people introduce regressions and we have 7-8 release
candidates for each kernel because of this. Humans have flaws
I guess.

> My primary concern about BFQ is that it is a very complicated I/O scheduler
> and also that the concepts used internally in that I/O scheduler are far
> away from the concepts we are used to when reasoning about I/O devices. I'm
> concerned that this will make the BFQ I/O scheduler hard to maintain.

I understand that. Let's follow all rules of thumb that make code
easy to maintain.

We have pretty broad agreement on what makes code easy to
maintain on the syntactic level, checkpatch and some manual inspection
easily gives that.

I think where we need the most brainshare is in how to make semantics
maintainable. It's no fun reading terse code and try to figure out how
the developer writing it was thinking, so let's focus on anything we do not
understand and make it understandable, it seems Paolo is onto this task
for what I can tell.

Yours,
Linus Walleij


Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Bart Van Assche
On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>  wrote:
> > > Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
> > >  ha scritto:
> > > (...) what should
> > > a developer do who only has access to a small subset of all the storage
> > > devices that are supported by the Linux kernel and hence who can not run 
> > > the
> > > benchmark against every supported storage device?
> 
> Don't we use the community for that? We are dependent on people
> downloading and testing our code eventually, I mean sure it's good if
> we make some reasonable effort to test changes we do, but we are
> only humans, and we get corrected by the experience of other humans.

Hello Linus,

Do you mean relying on the community to test other storage devices before
or after a patch is upstream? Relying on the community to file bug reports
after a patch is upstream would be wrong. The Linux kernel should not be
used for experiments. As you know patches that are sent upstream should
not introduce regressions.

My primary concern about BFQ is that it is a very complicated I/O scheduler
and also that the concepts used internally in that I/O scheduler are far
away from the concepts we are used to when reasoning about I/O devices. I'm
concerned that this will make the BFQ I/O scheduler hard to maintain.

Bart.

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Bart Van Assche
On Sat, 2017-03-18 at 18:09 +0100, Linus Walleij wrote:
> On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
>  wrote:
> > > Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
> > >  ha scritto:
> > > (...) what should
> > > a developer do who only has access to a small subset of all the storage
> > > devices that are supported by the Linux kernel and hence who can not run 
> > > the
> > > benchmark against every supported storage device?
> 
> Don't we use the community for that? We are dependent on people
> downloading and testing our code eventually, I mean sure it's good if
> we make some reasonable effort to test changes we do, but we are
> only humans, and we get corrected by the experience of other humans.

Hello Linus,

Do you mean relying on the community to test other storage devices before
or after a patch is upstream? Relying on the community to file bug reports
after a patch is upstream would be wrong. The Linux kernel should not be
used for experiments. As you know patches that are sent upstream should
not introduce regressions.

My primary concern about BFQ is that it is a very complicated I/O scheduler
and also that the concepts used internally in that I/O scheduler are far
away from the concepts we are used to when reasoning about I/O devices. I'm
concerned that this will make the BFQ I/O scheduler hard to maintain.

Bart.

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Linus Walleij
On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
 wrote:
>> Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>>  ha scritto:

>> (...) what should
>> a developer do who only has access to a small subset of all the storage
>> devices that are supported by the Linux kernel and hence who can not run the
>> benchmark against every supported storage device?

Don't we use the community for that? We are dependent on people
downloading and testing our code eventually, I mean sure it's good if
we make some reasonable effort to test changes we do, but we are
only humans, and we get corrected by the experience of other humans.

>> Do developers who do not
>> fully understand the BFQ algorithms and who run into a performance problem
>> have any other option than trial and error for fixing such performance 
>> issues?
>
> Hi Bart,
> maybe I got your point even before, but I did not reply consistently.
> You are highlighting an important problem, which, I think, can be
> stated in more general terms: if one makes a change in any complex
> component, which, in its turn, interacts with complex I/O devices,
> then it is hard, if ever possible, to prove, that that change will
> cause no regression with any possible device, just by speculation.
> Actually, facts show that this often holds even for simple components,
> given the complexity of the environment in which they work.  Of
> course, if not only the component is complex, but who modifies it does
> not even fully understand how that component works, then regressions
> on untested devices are certainly more probable.

You are running a host of benchmarks on a host of devices, using
the fio tool that Jens devised for this kind of tests. What more can be
asked? More tests, more devices?

If you increase the amount of proof that is requested for any change
to any computer program not to cause unintended side effects or
regressions, you will eventually end up with the brick wall
"solve the halting problem".

Alternatively "test it forever on all systems in the world".

It eventually becomes absurd.

This actually occurred to me .. in a certain mission-critical algorithm
my department was requested to "prove that this will run to completion".
I was baffled and said that what they were requesting was that I
solve the halting problem. It turned out they just wanted something like
a comprehensible test suite.

Yours,
Linus Walleij


Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Linus Walleij
On Sat, Mar 18, 2017 at 11:52 AM, Paolo Valente
 wrote:
>> Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>>  ha scritto:

>> (...) what should
>> a developer do who only has access to a small subset of all the storage
>> devices that are supported by the Linux kernel and hence who can not run the
>> benchmark against every supported storage device?

Don't we use the community for that? We are dependent on people
downloading and testing our code eventually, I mean sure it's good if
we make some reasonable effort to test changes we do, but we are
only humans, and we get corrected by the experience of other humans.

>> Do developers who do not
>> fully understand the BFQ algorithms and who run into a performance problem
>> have any other option than trial and error for fixing such performance 
>> issues?
>
> Hi Bart,
> maybe I got your point even before, but I did not reply consistently.
> You are highlighting an important problem, which, I think, can be
> stated in more general terms: if one makes a change in any complex
> component, which, in its turn, interacts with complex I/O devices,
> then it is hard, if ever possible, to prove, that that change will
> cause no regression with any possible device, just by speculation.
> Actually, facts show that this often holds even for simple components,
> given the complexity of the environment in which they work.  Of
> course, if not only the component is complex, but who modifies it does
> not even fully understand how that component works, then regressions
> on untested devices are certainly more probable.

You are running a host of benchmarks on a host of devices, using
the fio tool that Jens devised for this kind of tests. What more can be
asked? More tests, more devices?

If you increase the amount of proof that is requested for any change
to any computer program not to cause unintended side effects or
regressions, you will eventually end up with the brick wall
"solve the halting problem".

Alternatively "test it forever on all systems in the world".

It eventually becomes absurd.

This actually occurred to me .. in a certain mission-critical algorithm
my department was requested to "prove that this will run to completion".
I was baffled and said that what they were requesting was that I
solve the halting problem. It turned out they just wanted something like
a comprehensible test suite.

Yours,
Linus Walleij


Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Paolo Valente

> Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>  ha scritto:
> 
> On Tue, 2017-03-14 at 16:35 +0100, Paolo Valente wrote:
>>> Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche 
>>>  ha scritto:
>>> 
>>> Additionally, the complexity of the code is huge. Just like for CFQ,
>>> sooner or later someone will run into a bug or a performance issue
>>> and will post a patch to fix it. However, the complexity of BFQ is
>>> such that a source code review alone won't be sufficient to verify
>>> whether or not such a patch negatively affects a workload or device
>>> that has not been tested by the author of the patch. This makes me
>>> wonder what process should be followed to verify future BFQ patches?
>> 
>> Third and last, a proposal: why don't we discuss this issue at LSF
>> too?  In particular, we could talk about the parts of BFQ that seem
>> more complex to understand, until they become clearer to you.  Then I
>> could try to understand what helped make them clearer, and translate
>> it into extra comments in the code or into other, more radical
>> changes.
> 
> Hello Paolo,
> 
> Sorry if my comment was not clear enough. Suppose that e.g. someone would
> like to modify the following code:
> 
> static int bfq_min_budget(struct bfq_data *bfqd)
> {
>   if (bfqd->budgets_assigned < bfq_stats_min_budgets)
>   return bfq_default_max_budget / 32;
>   else
>   return bfqd->bfq_max_budget / 32;
> }
> 
> How to predict the performance impact of any changes in e.g. this function?
> It is really great that a performance benchmark is available. But what should
> a developer do who only has access to a small subset of all the storage
> devices that are supported by the Linux kernel and hence who can not run the
> benchmark against every supported storage device? Do developers who do not
> fully understand the BFQ algorithms and who run into a performance problem
> have any other option than trial and error for fixing such performance issues?
> 

Hi Bart,
maybe I got your point even before, but I did not reply consistently.
You are highlighting an important problem, which, I think, can be
stated in more general terms: if one makes a change in any complex
component, which, in its turn, interacts with complex I/O devices,
then it is hard, if ever possible, to prove, that that change will
cause no regression with any possible device, just by speculation.
Actually, facts show that this often holds even for simple components,
given the complexity of the environment in which they work.  Of
course, if not only the component is complex, but who modifies it does
not even fully understand how that component works, then regressions
on untested devices are certainly more probable.

These general considerations are the motivation for my previous
proposals: reduce complexity by breaking into simpler, independent
pieces; fix or improve documentation where needed or useful (why don't
we discuss the most obscure parts at lsfmm?); use a fixed set of
benchmarks to find regressions.  Any other proposal is more than
welcome.

Thanks,
Paolo


> Thanks,
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-18 Thread Paolo Valente

> Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche 
>  ha scritto:
> 
> On Tue, 2017-03-14 at 16:35 +0100, Paolo Valente wrote:
>>> Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche 
>>>  ha scritto:
>>> 
>>> Additionally, the complexity of the code is huge. Just like for CFQ,
>>> sooner or later someone will run into a bug or a performance issue
>>> and will post a patch to fix it. However, the complexity of BFQ is
>>> such that a source code review alone won't be sufficient to verify
>>> whether or not such a patch negatively affects a workload or device
>>> that has not been tested by the author of the patch. This makes me
>>> wonder what process should be followed to verify future BFQ patches?
>> 
>> Third and last, a proposal: why don't we discuss this issue at LSF
>> too?  In particular, we could talk about the parts of BFQ that seem
>> more complex to understand, until they become clearer to you.  Then I
>> could try to understand what helped make them clearer, and translate
>> it into extra comments in the code or into other, more radical
>> changes.
> 
> Hello Paolo,
> 
> Sorry if my comment was not clear enough. Suppose that e.g. someone would
> like to modify the following code:
> 
> static int bfq_min_budget(struct bfq_data *bfqd)
> {
>   if (bfqd->budgets_assigned < bfq_stats_min_budgets)
>   return bfq_default_max_budget / 32;
>   else
>   return bfqd->bfq_max_budget / 32;
> }
> 
> How to predict the performance impact of any changes in e.g. this function?
> It is really great that a performance benchmark is available. But what should
> a developer do who only has access to a small subset of all the storage
> devices that are supported by the Linux kernel and hence who can not run the
> benchmark against every supported storage device? Do developers who do not
> fully understand the BFQ algorithms and who run into a performance problem
> have any other option than trial and error for fixing such performance issues?
> 

Hi Bart,
maybe I got your point even before, but I did not reply consistently.
You are highlighting an important problem, which, I think, can be
stated in more general terms: if one makes a change in any complex
component, which, in its turn, interacts with complex I/O devices,
then it is hard, if ever possible, to prove, that that change will
cause no regression with any possible device, just by speculation.
Actually, facts show that this often holds even for simple components,
given the complexity of the environment in which they work.  Of
course, if not only the component is complex, but who modifies it does
not even fully understand how that component works, then regressions
on untested devices are certainly more probable.

These general considerations are the motivation for my previous
proposals: reduce complexity by breaking into simpler, independent
pieces; fix or improve documentation where needed or useful (why don't
we discuss the most obscure parts at lsfmm?); use a fixed set of
benchmarks to find regressions.  Any other proposal is more than
welcome.

Thanks,
Paolo


> Thanks,
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Bart Van Assche
On Tue, 2017-03-14 at 16:35 +0100, Paolo Valente wrote:
> > Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche 
> >  ha scritto:
> > 
> > Additionally, the complexity of the code is huge. Just like for CFQ,
> > sooner or later someone will run into a bug or a performance issue
> > and will post a patch to fix it. However, the complexity of BFQ is
> > such that a source code review alone won't be sufficient to verify
> > whether or not such a patch negatively affects a workload or device
> > that has not been tested by the author of the patch. This makes me
> > wonder what process should be followed to verify future BFQ patches?
> 
> Third and last, a proposal: why don't we discuss this issue at LSF
> too?  In particular, we could talk about the parts of BFQ that seem
> more complex to understand, until they become clearer to you.  Then I
> could try to understand what helped make them clearer, and translate
> it into extra comments in the code or into other, more radical
> changes.

Hello Paolo,

Sorry if my comment was not clear enough. Suppose that e.g. someone would
like to modify the following code:

static int bfq_min_budget(struct bfq_data *bfqd)
{
   if (bfqd->budgets_assigned < bfq_stats_min_budgets)
   return bfq_default_max_budget / 32;
   else
   return bfqd->bfq_max_budget / 32;
}

How to predict the performance impact of any changes in e.g. this function?
It is really great that a performance benchmark is available. But what should
a developer do who only has access to a small subset of all the storage
devices that are supported by the Linux kernel and hence who can not run the
benchmark against every supported storage device? Do developers who do not
fully understand the BFQ algorithms and who run into a performance problem
have any other option than trial and error for fixing such performance issues?

Thanks,

Bart.

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Bart Van Assche
On Tue, 2017-03-14 at 16:35 +0100, Paolo Valente wrote:
> > Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche 
> >  ha scritto:
> > 
> > Additionally, the complexity of the code is huge. Just like for CFQ,
> > sooner or later someone will run into a bug or a performance issue
> > and will post a patch to fix it. However, the complexity of BFQ is
> > such that a source code review alone won't be sufficient to verify
> > whether or not such a patch negatively affects a workload or device
> > that has not been tested by the author of the patch. This makes me
> > wonder what process should be followed to verify future BFQ patches?
> 
> Third and last, a proposal: why don't we discuss this issue at LSF
> too?  In particular, we could talk about the parts of BFQ that seem
> more complex to understand, until they become clearer to you.  Then I
> could try to understand what helped make them clearer, and translate
> it into extra comments in the code or into other, more radical
> changes.

Hello Paolo,

Sorry if my comment was not clear enough. Suppose that e.g. someone would
like to modify the following code:

static int bfq_min_budget(struct bfq_data *bfqd)
{
   if (bfqd->budgets_assigned < bfq_stats_min_budgets)
   return bfq_default_max_budget / 32;
   else
   return bfqd->bfq_max_budget / 32;
}

How to predict the performance impact of any changes in e.g. this function?
It is really great that a performance benchmark is available. But what should
a developer do who only has access to a small subset of all the storage
devices that are supported by the Linux kernel and hence who can not run the
benchmark against every supported storage device? Do developers who do not
fully understand the BFQ algorithms and who run into a performance problem
have any other option than trial and error for fixing such performance issues?

Thanks,

Bart.

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Jens Axboe
On 03/14/2017 09:35 AM, Paolo Valente wrote:
> First, I've developed BFQ in a sort of
> first-the-problem-then-the-solution way.  That is, each time, I have
> first implemented a benchmark that enabled me to highlight the problem
> and get all relevant statistics on it, then I have worked on BFQ to
> try to solve that problem, using the benchmark as a support.  All
> those benchmarks are in the public S suite now.  In particular, by
> running one script, and waiting at most one hour, you get graphs of
> - throughput with read/write/random/sequential workloads
> - start-up times of bash, xterm, gnome terminal and libreoffice, when
> all the above combinations of workloads are executed in the background
> - frame drop rate for the playback of a movie, again with both all the
> above combinations of workloads and the recurrent start of a bash
> shell in the background
> - kernel-task execution times (compilation, merge, ...), again with
> all the above combinations of workloads in the background
> - fairness with various combinations of weights and processes
> - throughput against interleaved I/O, with a number of readers ranging
> from 2 to 9
> 
> Every time I fix a bug, add a new feature or port BFQ to a new kernel
> version, I just run that script and compare new graphs with previous
> ones.  Any regression shows up immediately.  We already have a
> similar, working script for Android too, although covering only
> throughput, responsiveness and frame drops for the moment.  Of course,
> the coverage of these scripts is limited to only the goals for which I
> have devised and tuned BFQ so far.  But I hope that it won't be too
> hard to extend them to other important use cases (e.g., dbms).

This is great, btw, and a really nice tool set to have when evaluating
new changes.

> Second, IMO BFQ is complex also because it contains a lot of features.
> We have adopted the usual approach for handling this type of
> complexity: find clean cuts to get independent pieces, and put each
> piece in a separate file, plus one header glue file.  The pieces were:
> scheduling engine, hierarchical-scheduling support (allowing the
> engine to scheduler generic nodes in the hierarchy), cgroups support.
> Yet, Tejun last year, and Jens more recently, have asked to put
> everything in one file; for other good reasons of course.  If you do
> think that turning back to multiple files may somehow help, and there
> are no strong objections from others, then I'm willing to resume this
> option and possibly find event better splits.
> 
> Third and last, a proposal: why don't we discuss this issue at LSF
> too?  In particular, we could talk about the parts of BFQ that seem
> more complex to understand, until they become clearer to you.  Then I
> could try to understand what helped make them clearer, and translate
> it into extra comments in the code or into other, more radical
> changes.

A big issue here is the lack of nicely structured code. It's one massive
file of code, 8751 lines, or almost 270K of code. It might be a lot
easier to read and understand if it was split into smaller files,
containing various parts of it.

-- 
Jens Axboe



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Jens Axboe
On 03/14/2017 09:35 AM, Paolo Valente wrote:
> First, I've developed BFQ in a sort of
> first-the-problem-then-the-solution way.  That is, each time, I have
> first implemented a benchmark that enabled me to highlight the problem
> and get all relevant statistics on it, then I have worked on BFQ to
> try to solve that problem, using the benchmark as a support.  All
> those benchmarks are in the public S suite now.  In particular, by
> running one script, and waiting at most one hour, you get graphs of
> - throughput with read/write/random/sequential workloads
> - start-up times of bash, xterm, gnome terminal and libreoffice, when
> all the above combinations of workloads are executed in the background
> - frame drop rate for the playback of a movie, again with both all the
> above combinations of workloads and the recurrent start of a bash
> shell in the background
> - kernel-task execution times (compilation, merge, ...), again with
> all the above combinations of workloads in the background
> - fairness with various combinations of weights and processes
> - throughput against interleaved I/O, with a number of readers ranging
> from 2 to 9
> 
> Every time I fix a bug, add a new feature or port BFQ to a new kernel
> version, I just run that script and compare new graphs with previous
> ones.  Any regression shows up immediately.  We already have a
> similar, working script for Android too, although covering only
> throughput, responsiveness and frame drops for the moment.  Of course,
> the coverage of these scripts is limited to only the goals for which I
> have devised and tuned BFQ so far.  But I hope that it won't be too
> hard to extend them to other important use cases (e.g., dbms).

This is great, btw, and a really nice tool set to have when evaluating
new changes.

> Second, IMO BFQ is complex also because it contains a lot of features.
> We have adopted the usual approach for handling this type of
> complexity: find clean cuts to get independent pieces, and put each
> piece in a separate file, plus one header glue file.  The pieces were:
> scheduling engine, hierarchical-scheduling support (allowing the
> engine to scheduler generic nodes in the hierarchy), cgroups support.
> Yet, Tejun last year, and Jens more recently, have asked to put
> everything in one file; for other good reasons of course.  If you do
> think that turning back to multiple files may somehow help, and there
> are no strong objections from others, then I'm willing to resume this
> option and possibly find event better splits.
> 
> Third and last, a proposal: why don't we discuss this issue at LSF
> too?  In particular, we could talk about the parts of BFQ that seem
> more complex to understand, until they become clearer to you.  Then I
> could try to understand what helped make them clearer, and translate
> it into extra comments in the code or into other, more radical
> changes.

A big issue here is the lack of nicely structured code. It's one massive
file of code, 8751 lines, or almost 270K of code. It might be a lot
easier to read and understand if it was split into smaller files,
containing various parts of it.

-- 
Jens Axboe



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Paolo Valente

> Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche 
>  ha scritto:
> 
> On Sat, 2017-03-04 at 17:01 +0100, Paolo Valente wrote:
>> Finally, a few details on the patchset.
>> 
>> The first two patches introduce BFQ-v0, which is more or less the
>> first version of BFQ submitted a few years ago [1].  The remaining
>> patches turn progressively BFQ-v0 into BFQ-v8r8, the current version
>> of BFQ.
> 
> Hello Paolo,
> 

Hi Bart,

> Thank you for having done the work to improve, test, fix and post the
> BFQ scheduler as a patch series. However, from what I have seen in the
> patches there is a large number of tunable constants in the code for
> which no scientific approach exists to chose an optimal value.

I'm very sorry about that, I have exported those parameters over the
years, just as an aid for debugging and tuning.  Then I have forgot to
remove them :(

They'll disappear in my next submission.

> Additionally, the complexity of the code is huge. Just like for CFQ,
> sooner or later someone will run into a bug or a performance issue
> and will post a patch to fix it. However, the complexity of BFQ is
> such that a source code review alone won't be sufficient to verify
> whether or not such a patch negatively affects a workload or device
> that has not been tested by the author of the patch. This makes me
> wonder what process should be followed to verify future BFQ patches?
> 

I've a sort of triple reply for that.

First, I've developed BFQ in a sort of
first-the-problem-then-the-solution way.  That is, each time, I have
first implemented a benchmark that enabled me to highlight the problem
and get all relevant statistics on it, then I have worked on BFQ to
try to solve that problem, using the benchmark as a support.  All
those benchmarks are in the public S suite now.  In particular, by
running one script, and waiting at most one hour, you get graphs of
- throughput with read/write/random/sequential workloads
- start-up times of bash, xterm, gnome terminal and libreoffice, when
all the above combinations of workloads are executed in the background
- frame drop rate for the playback of a movie, again with both all the
above combinations of workloads and the recurrent start of a bash
shell in the background
- kernel-task execution times (compilation, merge, ...), again with
all the above combinations of workloads in the background
- fairness with various combinations of weights and processes
- throughput against interleaved I/O, with a number of readers ranging
from 2 to 9

Every time I fix a bug, add a new feature or port BFQ to a new kernel
version, I just run that script and compare new graphs with previous
ones.  Any regression shows up immediately.  We already have a
similar, working script for Android too, although covering only
throughput, responsiveness and frame drops for the moment.  Of course,
the coverage of these scripts is limited to only the goals for which I
have devised and tuned BFQ so far.  But I hope that it won't be too
hard to extend them to other important use cases (e.g., dbms).

Second, IMO BFQ is complex also because it contains a lot of features.
We have adopted the usual approach for handling this type of
complexity: find clean cuts to get independent pieces, and put each
piece in a separate file, plus one header glue file.  The pieces were:
scheduling engine, hierarchical-scheduling support (allowing the
engine to scheduler generic nodes in the hierarchy), cgroups support.
Yet, Tejun last year, and Jens more recently, have asked to put
everything in one file; for other good reasons of course.  If you do
think that turning back to multiple files may somehow help, and there
are no strong objections from others, then I'm willing to resume this
option and possibly find event better splits.

Third and last, a proposal: why don't we discuss this issue at LSF
too?  In particular, we could talk about the parts of BFQ that seem
more complex to understand, until they become clearer to you.  Then I
could try to understand what helped make them clearer, and translate
it into extra comments in the code or into other, more radical
changes.

Thanks,
Paolo

> Thanks,
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Paolo Valente

> Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche 
>  ha scritto:
> 
> On Sat, 2017-03-04 at 17:01 +0100, Paolo Valente wrote:
>> Finally, a few details on the patchset.
>> 
>> The first two patches introduce BFQ-v0, which is more or less the
>> first version of BFQ submitted a few years ago [1].  The remaining
>> patches turn progressively BFQ-v0 into BFQ-v8r8, the current version
>> of BFQ.
> 
> Hello Paolo,
> 

Hi Bart,

> Thank you for having done the work to improve, test, fix and post the
> BFQ scheduler as a patch series. However, from what I have seen in the
> patches there is a large number of tunable constants in the code for
> which no scientific approach exists to chose an optimal value.

I'm very sorry about that, I have exported those parameters over the
years, just as an aid for debugging and tuning.  Then I have forgot to
remove them :(

They'll disappear in my next submission.

> Additionally, the complexity of the code is huge. Just like for CFQ,
> sooner or later someone will run into a bug or a performance issue
> and will post a patch to fix it. However, the complexity of BFQ is
> such that a source code review alone won't be sufficient to verify
> whether or not such a patch negatively affects a workload or device
> that has not been tested by the author of the patch. This makes me
> wonder what process should be followed to verify future BFQ patches?
> 

I've a sort of triple reply for that.

First, I've developed BFQ in a sort of
first-the-problem-then-the-solution way.  That is, each time, I have
first implemented a benchmark that enabled me to highlight the problem
and get all relevant statistics on it, then I have worked on BFQ to
try to solve that problem, using the benchmark as a support.  All
those benchmarks are in the public S suite now.  In particular, by
running one script, and waiting at most one hour, you get graphs of
- throughput with read/write/random/sequential workloads
- start-up times of bash, xterm, gnome terminal and libreoffice, when
all the above combinations of workloads are executed in the background
- frame drop rate for the playback of a movie, again with both all the
above combinations of workloads and the recurrent start of a bash
shell in the background
- kernel-task execution times (compilation, merge, ...), again with
all the above combinations of workloads in the background
- fairness with various combinations of weights and processes
- throughput against interleaved I/O, with a number of readers ranging
from 2 to 9

Every time I fix a bug, add a new feature or port BFQ to a new kernel
version, I just run that script and compare new graphs with previous
ones.  Any regression shows up immediately.  We already have a
similar, working script for Android too, although covering only
throughput, responsiveness and frame drops for the moment.  Of course,
the coverage of these scripts is limited to only the goals for which I
have devised and tuned BFQ so far.  But I hope that it won't be too
hard to extend them to other important use cases (e.g., dbms).

Second, IMO BFQ is complex also because it contains a lot of features.
We have adopted the usual approach for handling this type of
complexity: find clean cuts to get independent pieces, and put each
piece in a separate file, plus one header glue file.  The pieces were:
scheduling engine, hierarchical-scheduling support (allowing the
engine to scheduler generic nodes in the hierarchy), cgroups support.
Yet, Tejun last year, and Jens more recently, have asked to put
everything in one file; for other good reasons of course.  If you do
think that turning back to multiple files may somehow help, and there
are no strong objections from others, then I'm willing to resume this
option and possibly find event better splits.

Third and last, a proposal: why don't we discuss this issue at LSF
too?  In particular, we could talk about the parts of BFQ that seem
more complex to understand, until they become clearer to you.  Then I
could try to understand what helped make them clearer, and translate
it into extra comments in the code or into other, more radical
changes.

Thanks,
Paolo

> Thanks,
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Paolo Valente

> Il giorno 07 mar 2017, alle ore 01:22, Bart Van Assche 
>  ha scritto:
> 
> On 03/04/2017 08:01 AM, Paolo Valente wrote:
>> Some patch generates WARNINGS with checkpatch.pl, but these WARNINGS
>> seem to be either unavoidable for the involved pieces of code (which
>> the patch just extends), or false positives.
> 
> The code in this series looks reasonably clean from a code style point
> of view,

Great, thanks!

> but please address all checkpatch warnings that can be
> addressed easily. A few examples of such checkpatch warnings:
> 
> ERROR: "foo * bar" should be "foo *bar"
> 

The offending line is:
*(__PTR) = (u64)__data * NSEC_PER_USEC;

so this seems a false positive.

> WARNING: Symbolic permissions 'S_IRUGO|S_IWUSR' are not preferred.
> Consider using octal permissions '0644'.
> 

I have used symbolic permissions because I find them much easier to
remember and decode than numeric constants, and because it is done so
in cfq-iosched.c, deadline-iosched.c and now mq-deadline.c.  But,
since you share this checkpatch complain, I will switch to constants.

Thanks,
Paolo

> Thanks,
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-14 Thread Paolo Valente

> Il giorno 07 mar 2017, alle ore 01:22, Bart Van Assche 
>  ha scritto:
> 
> On 03/04/2017 08:01 AM, Paolo Valente wrote:
>> Some patch generates WARNINGS with checkpatch.pl, but these WARNINGS
>> seem to be either unavoidable for the involved pieces of code (which
>> the patch just extends), or false positives.
> 
> The code in this series looks reasonably clean from a code style point
> of view,

Great, thanks!

> but please address all checkpatch warnings that can be
> addressed easily. A few examples of such checkpatch warnings:
> 
> ERROR: "foo * bar" should be "foo *bar"
> 

The offending line is:
*(__PTR) = (u64)__data * NSEC_PER_USEC;

so this seems a false positive.

> WARNING: Symbolic permissions 'S_IRUGO|S_IWUSR' are not preferred.
> Consider using octal permissions '0644'.
> 

I have used symbolic permissions because I find them much easier to
remember and decode than numeric constants, and because it is done so
in cfq-iosched.c, deadline-iosched.c and now mq-deadline.c.  But,
since you share this checkpatch complain, I will switch to constants.

Thanks,
Paolo

> Thanks,
> 
> Bart.



Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-06 Thread Bart Van Assche
On 03/04/2017 08:01 AM, Paolo Valente wrote:
> Some patch generates WARNINGS with checkpatch.pl, but these WARNINGS
> seem to be either unavoidable for the involved pieces of code (which
> the patch just extends), or false positives.

The code in this series looks reasonably clean from a code style point
of view, but please address all checkpatch warnings that can be
addressed easily. A few examples of such checkpatch warnings:

ERROR: "foo * bar" should be "foo *bar"

WARNING: Symbolic permissions 'S_IRUGO|S_IWUSR' are not preferred.
Consider using octal permissions '0644'.

Thanks,

Bart.


Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-06 Thread Bart Van Assche
On 03/04/2017 08:01 AM, Paolo Valente wrote:
> Some patch generates WARNINGS with checkpatch.pl, but these WARNINGS
> seem to be either unavoidable for the involved pieces of code (which
> the patch just extends), or false positives.

The code in this series looks reasonably clean from a code style point
of view, but please address all checkpatch warnings that can be
addressed easily. A few examples of such checkpatch warnings:

ERROR: "foo * bar" should be "foo *bar"

WARNING: Symbolic permissions 'S_IRUGO|S_IWUSR' are not preferred.
Consider using octal permissions '0644'.

Thanks,

Bart.


Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-06 Thread Bart Van Assche
On Sat, 2017-03-04 at 17:01 +0100, Paolo Valente wrote:
> Finally, a few details on the patchset.
> 
> The first two patches introduce BFQ-v0, which is more or less the
> first version of BFQ submitted a few years ago [1].  The remaining
> patches turn progressively BFQ-v0 into BFQ-v8r8, the current version
> of BFQ.

Hello Paolo,

Thank you for having done the work to improve, test, fix and post the
BFQ scheduler as a patch series. However, from what I have seen in the
patches there is a large number of tunable constants in the code for
which no scientific approach exists to chose an optimal value.
Additionally, the complexity of the code is huge. Just like for CFQ,
sooner or later someone will run into a bug or a performance issue
and will post a patch to fix it. However, the complexity of BFQ is
such that a source code review alone won't be sufficient to verify
whether or not such a patch negatively affects a workload or device
that has not been tested by the author of the patch. This makes me
wonder what process should be followed to verify future BFQ patches?

Thanks,

Bart.

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-06 Thread Bart Van Assche
On Sat, 2017-03-04 at 17:01 +0100, Paolo Valente wrote:
> Finally, a few details on the patchset.
> 
> The first two patches introduce BFQ-v0, which is more or less the
> first version of BFQ submitted a few years ago [1].  The remaining
> patches turn progressively BFQ-v0 into BFQ-v8r8, the current version
> of BFQ.

Hello Paolo,

Thank you for having done the work to improve, test, fix and post the
BFQ scheduler as a patch series. However, from what I have seen in the
patches there is a large number of tunable constants in the code for
which no scientific approach exists to chose an optimal value.
Additionally, the complexity of the code is huge. Just like for CFQ,
sooner or later someone will run into a bug or a performance issue
and will post a patch to fix it. However, the complexity of BFQ is
such that a source code review alone won't be sufficient to verify
whether or not such a patch negatively affects a workload or device
that has not been tested by the author of the patch. This makes me
wonder what process should be followed to verify future BFQ patches?

Thanks,

Bart.

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-05 Thread Markus Trippelsdorf
On 2017.03.04 at 17:01 +0100, Paolo Valente wrote:
> Hi,
> at last, here is my first patch series meant for merging. It adds BFQ
> to blk-mq. Don't worry, in this message I won't bore you again with
> the wonderful properties of BFQ :)

I gave BFQ a quick try. Unfortunately it hangs when I try to delete
btrfs snapshots:

root   124  0.0  0.0  0 0 ?D07:19   0:03 
[btrfs-cleaner]  
root   125  0.0  0.0  0 0 ?D07:19   0:00 
[btrfs-transacti]

[ 4372.880116] sysrq: SysRq : Show Blocked State
[ 4372.880125]   taskPC stack   pid father
[ 4372.880148] btrfs-cleaner   D0   124  2 0x
[ 4372.880156] Call Trace:
[ 4372.880166]  ? __schedule+0x160/0x7c0
[ 4372.880174]  ? io_schedule+0x64/0xe0
[ 4372.880179]  ? wait_on_page_bit+0x7a/0x100
[ 4372.880183]  ? devm_memunmap+0x40/0x40
[ 4372.880189]  ? read_extent_buffer_pages+0x25c/0x2c0
[ 4372.880195]  ? run_one_async_done+0xc0/0xc0
[ 4372.880200]  ? btree_read_extent_buffer_pages+0x60/0x2e0
[ 4372.880206]  ? read_tree_block+0x2c/0x60
[ 4372.880211]  ? read_block_for_search.isra.38+0xec/0x3a0
[ 4372.880217]  ? btrfs_search_slot+0x214/0xbc0
[ 4372.880221]  ? lookup_inline_extent_backref+0xfb/0x8c0
[ 4372.880225]  ? __btrfs_free_extent.isra.74+0xe9/0xdc0
[ 4372.880231]  ? btrfs_merge_delayed_refs+0x57/0x6e0
[ 4372.880235]  ? __btrfs_run_delayed_refs+0x60d/0x1340
[ 4372.880239]  ? btrfs_run_delayed_refs+0x64/0x280
[ 4372.880243]  ? btrfs_should_end_transaction+0x3b/0xa0
[ 4372.880247]  ? btrfs_drop_snapshot+0x3b2/0x800
[ 4372.880251]  ? __schedule+0x168/0x7c0
[ 4372.880254]  ? btrfs_clean_one_deleted_snapshot+0xa4/0xe0
[ 4372.880259]  ? cleaner_kthread+0x13a/0x180
[ 4372.880264]  ? btree_invalidatepage+0xc0/0xc0
[ 4372.880268]  ? kthread+0x144/0x180
[ 4372.880272]  ? kthread_flush_work_fn+0x20/0x20
[ 4372.880277]  ? ret_from_fork+0x23/0x30
[ 4372.880280] btrfs-transacti D0   125  2 0x
[ 4372.880285] Call Trace:
[ 4372.880290]  ? __schedule+0x160/0x7c0
[ 4372.880295]  ? io_schedule+0x64/0xe0
[ 4372.880300]  ? wait_on_page_bit_common.constprop.57+0x160/0x180
[ 4372.880303]  ? devm_memunmap+0x40/0x40
[ 4372.880307]  ? __filemap_fdatawait_range+0xd3/0x140
[ 4372.880311]  ? clear_state_bit.constprop.82+0xf7/0x180
[ 4372.880315]  ? __clear_extent_bit.constprop.79+0x138/0x3c0
[ 4372.880319]  ? filemap_fdatawait_range+0x9/0x60
[ 4372.880323]  ? __btrfs_wait_marked_extents.isra.18+0xc1/0x100
[ 4372.880327]  ? btrfs_write_and_wait_marked_extents.constprop.23+0x49/0x80
[ 4372.880331]  ? btrfs_commit_transaction+0x8e1/0xb00
[ 4372.880334]  ? join_transaction.constprop.24+0x10/0xa0
[ 4372.880340]  ? wake_bit_function+0x60/0x60
[ 4372.880345]  ? transaction_kthread+0x185/0x1a0
[ 4372.880350]  ? btrfs_cleanup_transaction+0x500/0x500
[ 4372.880354]  ? kthread+0x144/0x180
[ 4372.880358]  ? kthread_flush_work_fn+0x20/0x20
[ 4372.880362]  ? ret_from_fork+0x23/0x30
[ 4372.880367] ntpdD0   175  1 0x0004
[ 4372.880372] Call Trace:
[ 4372.880375]  ? __schedule+0x160/0x7c0
[ 4372.880379]  ? schedule_preempt_disabled+0x2d/0x80
[ 4372.880383]  ? __mutex_lock.isra.5+0x17b/0x4c0
[ 4372.880386]  ? wait_current_trans+0x15/0xc0
[ 4372.880391]  ? btrfs_free_path+0xe/0x20
[ 4372.880395]  ? btrfs_pin_log_trans+0x14/0x40
[ 4372.880400]  ? btrfs_rename2+0x28e/0x19c0
[ 4372.880404]  ? path_init+0x187/0x3e0
[ 4372.880407]  ? unlazy_walk+0x4b/0x100
[ 4372.880410]  ? terminate_walk+0x8d/0x100
[ 4372.880414]  ? filename_parentat+0x1e9/0x2c0
[ 4372.880420]  ? __kmalloc_track_caller+0xc4/0x100
[ 4372.880424]  ? vfs_rename+0x33f/0x7e0
[ 4372.880428]  ? SYSC_renameat2+0x53c/0x680
[ 4372.880433]  ? entry_SYSCALL_64_fastpath+0x13/0x94
[ 4372.880437] fcron   D0   178  1 0x
[ 4372.880441] Call Trace:
[ 4372.880445]  ? __schedule+0x160/0x7c0
[ 4372.880448]  ? schedule_preempt_disabled+0x2d/0x80
[ 4372.880452]  ? __mutex_lock.isra.5+0x17b/0x4c0
[ 4372.880458]  ? pagevec_lookup_tag+0x18/0x20
[ 4372.880462]  ? btrfs_log_dentry_safe+0x4cd/0xac0
[ 4372.880466]  ? btrfs_start_transaction+0x249/0x460
[ 4372.880470]  ? btrfs_sync_file+0x288/0x3c0
[ 4372.880475]  ? btrfs_file_write_iter+0x3a9/0x4e0
[ 4372.880479]  ? vfs_write+0x26c/0x2c0
[ 4372.880483]  ? SyS_write+0x3d/0xa0
[ 4372.880486]  ? SyS_fchown+0x7b/0xa0
[ 4372.880491]  ? entry_SYSCALL_64_fastpath+0x13/0x94
[ 4372.880508] kworker/u8:8D0   759  2 0x
[ 4372.880518] Workqueue: btrfs-submit btrfs_submit_helper
[ 4372.880520] Call Trace:
[ 4372.880524]  ? __schedule+0x160/0x7c0
[ 4372.880529]  ? io_schedule+0x64/0xe0
[ 4372.880534]  ? blk_mq_get_tag+0x212/0x320
[ 4372.880538]  ? wake_bit_function+0x60/0x60
[ 4372.880544]  ? __blk_mq_alloc_request+0x11/0x1c0
[ 4372.880548]  ? blk_mq_sched_get_request+0x17e/0x220
[ 4372.880553]  ? blk_sq_make_request+0xd3/0x4c0
[ 4372.880557]  ? blk_mq_sched_dispatch_requests+0x104/0x160
[ 4372.880561]  ? generic_make_request+0xc3/0x2e0
[ 4372.880564]  ? submit_bio+0x58/0x100
[ 

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-05 Thread Markus Trippelsdorf
On 2017.03.04 at 17:01 +0100, Paolo Valente wrote:
> Hi,
> at last, here is my first patch series meant for merging. It adds BFQ
> to blk-mq. Don't worry, in this message I won't bore you again with
> the wonderful properties of BFQ :)

I gave BFQ a quick try. Unfortunately it hangs when I try to delete
btrfs snapshots:

root   124  0.0  0.0  0 0 ?D07:19   0:03 
[btrfs-cleaner]  
root   125  0.0  0.0  0 0 ?D07:19   0:00 
[btrfs-transacti]

[ 4372.880116] sysrq: SysRq : Show Blocked State
[ 4372.880125]   taskPC stack   pid father
[ 4372.880148] btrfs-cleaner   D0   124  2 0x
[ 4372.880156] Call Trace:
[ 4372.880166]  ? __schedule+0x160/0x7c0
[ 4372.880174]  ? io_schedule+0x64/0xe0
[ 4372.880179]  ? wait_on_page_bit+0x7a/0x100
[ 4372.880183]  ? devm_memunmap+0x40/0x40
[ 4372.880189]  ? read_extent_buffer_pages+0x25c/0x2c0
[ 4372.880195]  ? run_one_async_done+0xc0/0xc0
[ 4372.880200]  ? btree_read_extent_buffer_pages+0x60/0x2e0
[ 4372.880206]  ? read_tree_block+0x2c/0x60
[ 4372.880211]  ? read_block_for_search.isra.38+0xec/0x3a0
[ 4372.880217]  ? btrfs_search_slot+0x214/0xbc0
[ 4372.880221]  ? lookup_inline_extent_backref+0xfb/0x8c0
[ 4372.880225]  ? __btrfs_free_extent.isra.74+0xe9/0xdc0
[ 4372.880231]  ? btrfs_merge_delayed_refs+0x57/0x6e0
[ 4372.880235]  ? __btrfs_run_delayed_refs+0x60d/0x1340
[ 4372.880239]  ? btrfs_run_delayed_refs+0x64/0x280
[ 4372.880243]  ? btrfs_should_end_transaction+0x3b/0xa0
[ 4372.880247]  ? btrfs_drop_snapshot+0x3b2/0x800
[ 4372.880251]  ? __schedule+0x168/0x7c0
[ 4372.880254]  ? btrfs_clean_one_deleted_snapshot+0xa4/0xe0
[ 4372.880259]  ? cleaner_kthread+0x13a/0x180
[ 4372.880264]  ? btree_invalidatepage+0xc0/0xc0
[ 4372.880268]  ? kthread+0x144/0x180
[ 4372.880272]  ? kthread_flush_work_fn+0x20/0x20
[ 4372.880277]  ? ret_from_fork+0x23/0x30
[ 4372.880280] btrfs-transacti D0   125  2 0x
[ 4372.880285] Call Trace:
[ 4372.880290]  ? __schedule+0x160/0x7c0
[ 4372.880295]  ? io_schedule+0x64/0xe0
[ 4372.880300]  ? wait_on_page_bit_common.constprop.57+0x160/0x180
[ 4372.880303]  ? devm_memunmap+0x40/0x40
[ 4372.880307]  ? __filemap_fdatawait_range+0xd3/0x140
[ 4372.880311]  ? clear_state_bit.constprop.82+0xf7/0x180
[ 4372.880315]  ? __clear_extent_bit.constprop.79+0x138/0x3c0
[ 4372.880319]  ? filemap_fdatawait_range+0x9/0x60
[ 4372.880323]  ? __btrfs_wait_marked_extents.isra.18+0xc1/0x100
[ 4372.880327]  ? btrfs_write_and_wait_marked_extents.constprop.23+0x49/0x80
[ 4372.880331]  ? btrfs_commit_transaction+0x8e1/0xb00
[ 4372.880334]  ? join_transaction.constprop.24+0x10/0xa0
[ 4372.880340]  ? wake_bit_function+0x60/0x60
[ 4372.880345]  ? transaction_kthread+0x185/0x1a0
[ 4372.880350]  ? btrfs_cleanup_transaction+0x500/0x500
[ 4372.880354]  ? kthread+0x144/0x180
[ 4372.880358]  ? kthread_flush_work_fn+0x20/0x20
[ 4372.880362]  ? ret_from_fork+0x23/0x30
[ 4372.880367] ntpdD0   175  1 0x0004
[ 4372.880372] Call Trace:
[ 4372.880375]  ? __schedule+0x160/0x7c0
[ 4372.880379]  ? schedule_preempt_disabled+0x2d/0x80
[ 4372.880383]  ? __mutex_lock.isra.5+0x17b/0x4c0
[ 4372.880386]  ? wait_current_trans+0x15/0xc0
[ 4372.880391]  ? btrfs_free_path+0xe/0x20
[ 4372.880395]  ? btrfs_pin_log_trans+0x14/0x40
[ 4372.880400]  ? btrfs_rename2+0x28e/0x19c0
[ 4372.880404]  ? path_init+0x187/0x3e0
[ 4372.880407]  ? unlazy_walk+0x4b/0x100
[ 4372.880410]  ? terminate_walk+0x8d/0x100
[ 4372.880414]  ? filename_parentat+0x1e9/0x2c0
[ 4372.880420]  ? __kmalloc_track_caller+0xc4/0x100
[ 4372.880424]  ? vfs_rename+0x33f/0x7e0
[ 4372.880428]  ? SYSC_renameat2+0x53c/0x680
[ 4372.880433]  ? entry_SYSCALL_64_fastpath+0x13/0x94
[ 4372.880437] fcron   D0   178  1 0x
[ 4372.880441] Call Trace:
[ 4372.880445]  ? __schedule+0x160/0x7c0
[ 4372.880448]  ? schedule_preempt_disabled+0x2d/0x80
[ 4372.880452]  ? __mutex_lock.isra.5+0x17b/0x4c0
[ 4372.880458]  ? pagevec_lookup_tag+0x18/0x20
[ 4372.880462]  ? btrfs_log_dentry_safe+0x4cd/0xac0
[ 4372.880466]  ? btrfs_start_transaction+0x249/0x460
[ 4372.880470]  ? btrfs_sync_file+0x288/0x3c0
[ 4372.880475]  ? btrfs_file_write_iter+0x3a9/0x4e0
[ 4372.880479]  ? vfs_write+0x26c/0x2c0
[ 4372.880483]  ? SyS_write+0x3d/0xa0
[ 4372.880486]  ? SyS_fchown+0x7b/0xa0
[ 4372.880491]  ? entry_SYSCALL_64_fastpath+0x13/0x94
[ 4372.880508] kworker/u8:8D0   759  2 0x
[ 4372.880518] Workqueue: btrfs-submit btrfs_submit_helper
[ 4372.880520] Call Trace:
[ 4372.880524]  ? __schedule+0x160/0x7c0
[ 4372.880529]  ? io_schedule+0x64/0xe0
[ 4372.880534]  ? blk_mq_get_tag+0x212/0x320
[ 4372.880538]  ? wake_bit_function+0x60/0x60
[ 4372.880544]  ? __blk_mq_alloc_request+0x11/0x1c0
[ 4372.880548]  ? blk_mq_sched_get_request+0x17e/0x220
[ 4372.880553]  ? blk_sq_make_request+0xd3/0x4c0
[ 4372.880557]  ? blk_mq_sched_dispatch_requests+0x104/0x160
[ 4372.880561]  ? generic_make_request+0xc3/0x2e0
[ 4372.880564]  ? submit_bio+0x58/0x100
[