Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Qu Wenruo


On 2018/8/12 下午12:18, Dan Merillat wrote:
> On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo  wrote:
> 
>>> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
>>> to disable quota offline.
>>
>> Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue:
>> Add ability to disable quota offline".
>> Can also be fetched from github:
>> https://github.com/adam900710/btrfs-progs/tree/quota_disable
>>
>> Usage is:
>> # btrfs rescue disable-quota 
>>
>> Tested locally, it would just toggle the ON/OFF flag for quota, so the
>> modification should be minimal.
> 
> Noticed one thing while testing this, but it's not related to the
> patch so I'll keep it here.
> I still had the ,ro mounts in fstab, and while it mounted ro quickly
> *unmounting* the filesystem, even readonly,
> got hung up:
> 
> Aug 11 23:47:27 fileserver kernel: [  484.314725] INFO: task
> umount:5422 blocked for more than 120 seconds.
> Aug 11 23:47:27 fileserver kernel: [  484.314787]   Not tainted
> 4.17.14-dirty #3
> Aug 11 23:47:27 fileserver kernel: [  484.314892] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 11 23:47:27 fileserver kernel: [  484.315006] umount  D
> 0  5422   4656 0x0080
> Aug 11 23:47:27 fileserver kernel: [  484.315122] Call Trace:
> Aug 11 23:47:27 fileserver kernel: [  484.315176]  ? __schedule+0x2c0/0x820
> Aug 11 23:47:27 fileserver kernel: [  484.315270]  ?
> kmem_cache_alloc+0x167/0x1b0
> Aug 11 23:47:27 fileserver kernel: [  484.315358]  schedule+0x3c/0x90
> Aug 11 23:47:27 fileserver kernel: [  484.315493]  
> schedule_timeout+0x1e4/0x430
> Aug 11 23:47:27 fileserver kernel: [  484.315542]  ?
> kmem_cache_alloc+0x167/0x1b0
> Aug 11 23:47:27 fileserver kernel: [  484.315686]  wait_for_common+0xb1/0x170
> Aug 11 23:47:27 fileserver kernel: [  484.315798]  ? wake_up_q+0x70/0x70
> Aug 11 23:47:27 fileserver kernel: [  484.315911]
> btrfs_qgroup_wait_for_completion+0x5f/0x80

This normally waits for rescan.
It may be possible that your original "btrfs quota enable" kicked in
rescan but hasn't finished before umount.

But for RO mount we shouldn't have any rescan running.

Maybe I could find some spare time looking into it.

Thanks for the report,
Qu

> Aug 11 23:47:27 fileserver kernel: [  484.316031]  close_ctree+0x27/0x2d0
> Aug 11 23:47:27 fileserver kernel: [  484.316138]
> generic_shutdown_super+0x69/0x110
> Aug 11 23:47:27 fileserver kernel: [  484.316252]  kill_anon_super+0xe/0x20
> Aug 11 23:47:27 fileserver kernel: [  484.316301]  btrfs_kill_super+0x13/0x100
> Aug 11 23:47:27 fileserver kernel: [  484.316349]
> deactivate_locked_super+0x39/0x70
> Aug 11 23:47:27 fileserver kernel: [  484.316399]  cleanup_mnt+0x3b/0x70
> Aug 11 23:47:27 fileserver kernel: [  484.316459]  task_work_run+0x89/0xb0
> Aug 11 23:47:27 fileserver kernel: [  484.316519]
> exit_to_usermode_loop+0x8c/0x90
> Aug 11 23:47:27 fileserver kernel: [  484.316579]  do_syscall_64+0xf1/0x110
> Aug 11 23:47:27 fileserver kernel: [  484.316639]
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Is it trying to write changes to a ro mount, or is it doing a bunch of
> work that it's just going to throw away?  I ended up using sysrq-b
> after commenting out the entries in fstab.
> 
> Everything seems fine with the filesystem now.  I appreciate all the help!
> 



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Dan Merillat
On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo  wrote:

> > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
> > to disable quota offline.
>
> Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue:
> Add ability to disable quota offline".
> Can also be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/quota_disable
>
> Usage is:
> # btrfs rescue disable-quota 
>
> Tested locally, it would just toggle the ON/OFF flag for quota, so the
> modification should be minimal.

Noticed one thing while testing this, but it's not related to the
patch so I'll keep it here.
I still had the ,ro mounts in fstab, and while it mounted ro quickly
*unmounting* the filesystem, even readonly,
got hung up:

Aug 11 23:47:27 fileserver kernel: [  484.314725] INFO: task
umount:5422 blocked for more than 120 seconds.
Aug 11 23:47:27 fileserver kernel: [  484.314787]   Not tainted
4.17.14-dirty #3
Aug 11 23:47:27 fileserver kernel: [  484.314892] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 11 23:47:27 fileserver kernel: [  484.315006] umount  D
0  5422   4656 0x0080
Aug 11 23:47:27 fileserver kernel: [  484.315122] Call Trace:
Aug 11 23:47:27 fileserver kernel: [  484.315176]  ? __schedule+0x2c0/0x820
Aug 11 23:47:27 fileserver kernel: [  484.315270]  ?
kmem_cache_alloc+0x167/0x1b0
Aug 11 23:47:27 fileserver kernel: [  484.315358]  schedule+0x3c/0x90
Aug 11 23:47:27 fileserver kernel: [  484.315493]  schedule_timeout+0x1e4/0x430
Aug 11 23:47:27 fileserver kernel: [  484.315542]  ?
kmem_cache_alloc+0x167/0x1b0
Aug 11 23:47:27 fileserver kernel: [  484.315686]  wait_for_common+0xb1/0x170
Aug 11 23:47:27 fileserver kernel: [  484.315798]  ? wake_up_q+0x70/0x70
Aug 11 23:47:27 fileserver kernel: [  484.315911]
btrfs_qgroup_wait_for_completion+0x5f/0x80
Aug 11 23:47:27 fileserver kernel: [  484.316031]  close_ctree+0x27/0x2d0
Aug 11 23:47:27 fileserver kernel: [  484.316138]
generic_shutdown_super+0x69/0x110
Aug 11 23:47:27 fileserver kernel: [  484.316252]  kill_anon_super+0xe/0x20
Aug 11 23:47:27 fileserver kernel: [  484.316301]  btrfs_kill_super+0x13/0x100
Aug 11 23:47:27 fileserver kernel: [  484.316349]
deactivate_locked_super+0x39/0x70
Aug 11 23:47:27 fileserver kernel: [  484.316399]  cleanup_mnt+0x3b/0x70
Aug 11 23:47:27 fileserver kernel: [  484.316459]  task_work_run+0x89/0xb0
Aug 11 23:47:27 fileserver kernel: [  484.316519]
exit_to_usermode_loop+0x8c/0x90
Aug 11 23:47:27 fileserver kernel: [  484.316579]  do_syscall_64+0xf1/0x110
Aug 11 23:47:27 fileserver kernel: [  484.316639]
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Is it trying to write changes to a ro mount, or is it doing a bunch of
work that it's just going to throw away?  I ended up using sysrq-b
after commenting out the entries in fstab.

Everything seems fine with the filesystem now.  I appreciate all the help!


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Qu Wenruo


On 2018/8/12 上午8:30, Qu Wenruo wrote:
> 
> 
> On 2018/8/12 上午5:10, Dan Merillat wrote:
>> 19 hours later, still going extremely slowly and taking longer and
>> longer for progress made.  Main symptom is the mount process is
>> spinning at 100% CPU, interspersed with btrfs-transaction spinning at
>> 100% CPU.
>> So far it's racked up 14h45m of CPU time on mount and an additional
>> 3h40m on btrfs-transaction.
>>
>> The current drop key changes every 10-15 minutes when I check it via
>> inspect-internal, so some progress is slowly being made.
>>
>> I built the kernel with ftrace to see what's going on internally, this
>> is the pattern I'm seeing:
>>
> [snip]
> 
> It looks pretty like qgroup, but too many noise.
> The pin point trace event would btrfs_find_all_roots().
> 
>>
>> Repeats indefinitely.  btrace shows basically zero activity on the
>> array while it spins, with the occasional burst when mount &
>> btrfs-transaction swap off.
>>
>> To recap the chain of events leading up to this:
>> 11TB Array got completely full and started fragmenting badly.
>> Ran bedup and it found 600gb of duplicate files that it offline-shared.
>> Reboot for unrelated reasons
> 
> 11T, with highly deduped usage is really the worst scenario case for qgroup.
> Qgroup is not really good at handle hight reflinked files, nor balance.
> When they combines, it goes worse.
> 
>> Enabled quota on all subvolumes to try to track where the new data is
>> coming from
>> Tried to balance metadata due to transaction CPU spikes
>> Force-rebooted after the array was completely lagged out.
>>
>> Now attempting to mount it RW.  Readonly works, but RW has taken well
>> over 24 hours at this point.
> 
> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
> to disable quota offline.

Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue:
Add ability to disable quota offline".
Can also be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/quota_disable

Usage is:
# btrfs rescue disable-quota 

Tested locally, it would just toggle the ON/OFF flag for quota, so the
modification should be minimal.

Thanks,
Qu

> 
> Thanks,
> Qu
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Qu Wenruo


On 2018/8/12 上午8:59, Dan Merillat wrote:
> On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo  wrote:
>>
>> It looks pretty like qgroup, but too many noise.
>> The pin point trace event would btrfs_find_all_roots().
> 
> I had this half-written when you replied.
> 
> Agreed: looks like bulk of time spent resides in qgroups.  Spent some
> time with sysrq-l and ftrace:
> 
> ? __rcu_read_unlock+0x5/0x50
> ? return_to_handler+0x15/0x36
> __rcu_read_unlock+0x5/0x50
> find_extent_buffer+0x47/0x90extent_io.c:4888
> read_block_for_search.isra.12+0xc8/0x350ctree.c:2399
> btrfs_search_slot+0x3e7/0x9c0   ctree.c:2837
> btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702
> btrfs_next_old_item ctree.h:2952
> add_all_parents backref.c:487
> resolve_indirect_refs+0x3f7/0x7e0   backref.c:575
> find_parent_nodes+0x42d/0x1290  backref.c:1236
> ? find_parent_nodes+0x5/0x1290  backref.c:1114
> btrfs_find_all_roots_safe+0x98/0x100backref.c:1414
> btrfs_find_all_roots+0x52/0x70  backref.c:1442
> btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503
> btrfs_qgroup_trace_leaf_items+0x104/0x130   qgroup.c:1589
> btrfs_qgroup_trace_subtree+0x26a/0x3a0  qgroup.c:1750
> do_walk_down+0x33c/0x5a0extent-tree.c:8883
> walk_down_tree+0xa8/0xd0extent-tree.c:9041
> btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203
> merge_reloc_roots+0xcf/0x220
> btrfs_recover_relocation+0x26d/0x400
> ? btrfs_cleanup_fs_roots+0x16a/0x180
> btrfs_remount+0x32e/0x510
> do_remount_sb+0x67/0x1e0
> do_mount+0x712/0xc90
> 
> The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by
> the following ftrace filter:
> fileserver:/sys/kernel/tracing# cat set_ftrace_filter
> btrfs_qgroup_trace_extent
> btrfs_qgroup_trace_subtree

Yep, it's quota causing the hang.

> 
[snip]
> 
> So 10-13 minutes per cycle.
> 
>> 11T, with highly deduped usage is really the worst scenario case for qgroup.
>> Qgroup is not really good at handle hight reflinked files, nor balance.
>> When they combines, it goes worse.
> 
> I'm not really understanding the use-case of qgroup if it melts down
> on large systems with a shared base + individual changes.

The problem is, for balance btrfs is doing a trick by switch tree reloc
tree with real fs tree.
However, tree reloc tree doesn't account to quota, but for real fs tree
it contributes to quota.

And since above owner changes, btrfs needs to do a full subtree rescan.
For small subvolume it's not a problem, but for large subvolume, quota
needs to rescan thousands tree blocks, and due to highly deduped files,
each tree blocks needs extra iterations for each deduped files.

Both factors contribute to the slow mount.

There are several workaround patches in the mail list, one is to make
the balance background for mount, so it won't hang mount.
But it still makes transaction pretty slow (write will still be blocked
for a long time)

There is also plan to skip subtree rescan completely, but it needs extra
review to ensure such tree block switch won't change quota number.

Thanks,
Qu

> 
>> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
>> to disable quota offline.
> 
> Ok.  I was looking at just doing this to speed things up:
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 51b5e2da708c..c5bf937b79f0 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct
> btrfs_trans_handle *trans,
> parent = 0;
> }
> 
> -   if (need_account) {
> +   if (0) {
> ret = btrfs_qgroup_trace_subtree(trans, root, next,
>  generation, level - 
> 1);
> if (ret) {
> 
> 
> btrfs_err_rl(fs_info,
>   "Error %d accounting shared subtree. Quota
> is out of sync, rescan required.",
>   ret);
>  }
> 
> 
> If I follow, this will leave me with inconsistent qgroups and a full
> rescan is required.  That seems an acceptable tradeoff, since it seems
> like the best plan going forward is to nuke the qgroups anyway.
> 
> There's still the btrfs-transaction spin, but I'm hoping that's
> related to qgroups as well.
> 
>>
>> Thanks,
>> Qu
> 
> Appreciate it.  I was going to go with my hackjob patch to avoid any
> untested rewriting - there's already an error path for "something went
> wrong updating qgroups during walk_tree" so it seemed safest to take
> advantage of it.  I'll patch either the kernel or the btrfs programs,
> whichever you think is best.
> 



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Dan Merillat
On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo  wrote:
>
> It looks pretty like qgroup, but too many noise.
> The pin point trace event would btrfs_find_all_roots().

I had this half-written when you replied.

Agreed: looks like bulk of time spent resides in qgroups.  Spent some
time with sysrq-l and ftrace:

? __rcu_read_unlock+0x5/0x50
? return_to_handler+0x15/0x36
__rcu_read_unlock+0x5/0x50
find_extent_buffer+0x47/0x90extent_io.c:4888
read_block_for_search.isra.12+0xc8/0x350ctree.c:2399
btrfs_search_slot+0x3e7/0x9c0   ctree.c:2837
btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702
btrfs_next_old_item ctree.h:2952
add_all_parents backref.c:487
resolve_indirect_refs+0x3f7/0x7e0   backref.c:575
find_parent_nodes+0x42d/0x1290  backref.c:1236
? find_parent_nodes+0x5/0x1290  backref.c:1114
btrfs_find_all_roots_safe+0x98/0x100backref.c:1414
btrfs_find_all_roots+0x52/0x70  backref.c:1442
btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503
btrfs_qgroup_trace_leaf_items+0x104/0x130   qgroup.c:1589
btrfs_qgroup_trace_subtree+0x26a/0x3a0  qgroup.c:1750
do_walk_down+0x33c/0x5a0extent-tree.c:8883
walk_down_tree+0xa8/0xd0extent-tree.c:9041
btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203
merge_reloc_roots+0xcf/0x220
btrfs_recover_relocation+0x26d/0x400
? btrfs_cleanup_fs_roots+0x16a/0x180
btrfs_remount+0x32e/0x510
do_remount_sb+0x67/0x1e0
do_mount+0x712/0xc90

The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by
the following ftrace filter:
fileserver:/sys/kernel/tracing# cat set_ftrace_filter
btrfs_qgroup_trace_extent
btrfs_qgroup_trace_subtree

# cat trace
...
   mount-6803  [003]  80407.649752:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree
   mount-6803  [003]  80407.649772:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.649797:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.649821:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.649846:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.701652:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754547:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754574:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754598:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754622:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754646:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items

... repeats 240 times

   mount-6803  [002]  80412.568804:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [002]  80412.568825:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [002]  80412.568850:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree
   mount-6803  [002]  80412.568872:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items

Looks like invocations of btrfs_qgroup_trace_subtree are taking forever:

   mount-6803  [006]  80641.627709:
btrfs_qgroup_trace_subtree <-do_walk_down
   mount-6803  [003]  81433.760945:
btrfs_qgroup_trace_subtree <-do_walk_down
(add do_walk_down to the trace here)
   mount-6803  [001]  82124.623557: do_walk_down <-walk_down_tree
   mount-6803  [001]  82124.623567:
btrfs_qgroup_trace_subtree <-do_walk_down
   mount-6803  [006]  82695.241306: do_walk_down <-walk_down_tree
   mount-6803  [006]  82695.241316:
btrfs_qgroup_trace_subtree <-do_walk_down

So 10-13 minutes per cycle.

> 11T, with highly deduped usage is really the worst scenario case for qgroup.
> Qgroup is not really good at handle hight reflinked files, nor balance.
> When they combines, it goes worse.

I'm not really understanding the use-case of qgroup if it melts down
on large systems with a shared base + individual changes.

> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
> to disable quota offline.

Ok.  I was looking at just doing this to speed things up:

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 51b5e2da708c..c5bf937b79f0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct
btrfs_trans_handle *trans,
parent = 0;
}

-   if (need_account) {
+   if (0) {
ret 

Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Qu Wenruo


On 2018/8/12 上午5:10, Dan Merillat wrote:
> 19 hours later, still going extremely slowly and taking longer and
> longer for progress made.  Main symptom is the mount process is
> spinning at 100% CPU, interspersed with btrfs-transaction spinning at
> 100% CPU.
> So far it's racked up 14h45m of CPU time on mount and an additional
> 3h40m on btrfs-transaction.
> 
> The current drop key changes every 10-15 minutes when I check it via
> inspect-internal, so some progress is slowly being made.
> 
> I built the kernel with ftrace to see what's going on internally, this
> is the pattern I'm seeing:
> 
[snip]

It looks pretty like qgroup, but too many noise.
The pin point trace event would btrfs_find_all_roots().

> 
> Repeats indefinitely.  btrace shows basically zero activity on the
> array while it spins, with the occasional burst when mount &
> btrfs-transaction swap off.
> 
> To recap the chain of events leading up to this:
> 11TB Array got completely full and started fragmenting badly.
> Ran bedup and it found 600gb of duplicate files that it offline-shared.
> Reboot for unrelated reasons

11T, with highly deduped usage is really the worst scenario case for qgroup.
Qgroup is not really good at handle hight reflinked files, nor balance.
When they combines, it goes worse.

> Enabled quota on all subvolumes to try to track where the new data is
> coming from
> Tried to balance metadata due to transaction CPU spikes
> Force-rebooted after the array was completely lagged out.
> 
> Now attempting to mount it RW.  Readonly works, but RW has taken well
> over 24 hours at this point.

I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
to disable quota offline.

Thanks,
Qu




signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Dan Merillat
19 hours later, still going extremely slowly and taking longer and
longer for progress made.  Main symptom is the mount process is
spinning at 100% CPU, interspersed with btrfs-transaction spinning at
100% CPU.
So far it's racked up 14h45m of CPU time on mount and an additional
3h40m on btrfs-transaction.

The current drop key changes every 10-15 minutes when I check it via
inspect-internal, so some progress is slowly being made.

I built the kernel with ftrace to see what's going on internally, this
is the pattern I'm seeing:

   mount-6803  [002] ...1 69023.970964: btrfs_next_old_leaf
<-resolve_indirect_refs
   mount-6803  [002] ...1 69023.970965: btrfs_release_path
<-btrfs_next_old_leaf
   mount-6803  [002] ...1 69023.970965: btrfs_search_slot
<-btrfs_next_old_leaf
   mount-6803  [002] ...1 69023.970966:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970966:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.970967: btrfs_bin_search
<-btrfs_search_slot
   mount-6803  [002] ...1 69023.970967: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970967: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970968: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970968: btrfs_node_key
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970969: btrfs_buffer_uptodate
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970969:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970970:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.970970: btrfs_bin_search
<-btrfs_search_slot
   mount-6803  [002] ...1 69023.970970: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970971: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970971: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970972: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970972: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970973: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970973: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970973: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970974: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970974: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970975: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970975: btrfs_node_key
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970976: btrfs_buffer_uptodate
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970976:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970976:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.970977: btrfs_bin_search
<-btrfs_search_slot
   mount-6803  [002] ...1 69023.970977: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970978: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970978: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970978: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970979: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970979: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970980: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970980: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970980: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970981: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970981: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970982: btrfs_node_key
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970982: btrfs_buffer_uptodate
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970983:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970983:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.97

Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
On Fri, Aug 10, 2018 at 6:51 AM, Qu Wenruo  wrote:
>
>
> On 8/10/18 6:42 PM, Dan Merillat wrote:
>> On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo  wrote:
>
> But considering your amount of block groups, mount itself may take some
> time (before trying to resume balance).

I'd believe it, a clean mount took 2-3 minutes normally.

btrfs check ran out of RAM eventually so I killed it and went on to
trying to mount again.

readonly mounted pretty quickly, so I'm just letting -o remount,rw
spin for however long it needs to.  Readonly access is fine over the
weekend, and hopefully it will be done by monday.

To be clear, what exactly am I watching with dump-tree to monitor
forward progress?

Thanks again for the help!


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Qu Wenruo


On 8/10/18 6:42 PM, Dan Merillat wrote:
> On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo  wrote:
> 
>>
>> Although not sure about the details, but the fs looks pretty huge.
>> Tons of subvolume and its free space cache inodes.
> 
> 11TB, 3 or so subvolumes and two snapshots I think.  Not particularly
> large for NAS.
> 
>> But only 3 tree reloc trees, unless you have tons of reflinked files
>> (off-line deduped), it shouldn't cause a lot of problem.
> 
> There's going to be a ton of reflinked files.  Both cp --reflink and
> via the wholefile dedup.
> 
> I freed up ~1/2 TB last month doing dedup.
> 
>> At least, we have some progress dropping tree reloc tree for subvolume 6482.
> 
> Is there a way to get an idea of how much work is left to be done on
> the reloc tree?

You could inspect by the level.
But each level change is a huge step forward.

>  Can I walk it
> with btrfs-inspect?

Of course you can. Using -b  could show the remaining tree.

> 
> dump-tree -t TREE_RELOC is quite enormous (13+ million lines before I gave up)

No wonder, since it's kind of snapshot of your subvolume.

> 
>> If you check the dump-tree output for the following data, the "drop key"
>> should change during mount: (inspect dump-tree can be run mounted)
>> item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439
>> 
>> drop key (2769795 EXTENT_DATA 12665933824) level 2
>>  ^
>>
>> So for the worst case scenario, there is some way to determine whether
>> it's processing.
> 
> I'll keep an eye on that.
> 
>> And according to the level (3), which is not small for each subvolume, I
>> doubt that's the reason why it's so slow.
>>
>> BTW, for last skip_balance mount, is there any kernel message like
>> "balance: resume skipped"?
> 
> No, the only reference to balance in kern.log is a hung
> btrfs_cancel_balance from the first reboot.

That's strange.

But considering your amount of block groups, mount itself may take some
time (before trying to resume balance).

Thanks,
Qu

> 
>> Have you tried mount the fs readonly with skip_balance? And then remount
>> rw, still with skip_balance?
> 
> No, every operation takes a long time.  It's still running the btrfs
> check, although I'm
> going to cancel it and try mount -o ro,skip_balance before I go to
> sleep and see where it is tomorrow.
> 
> Thank you for taking the time to help me with this.
> 



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo  wrote:

>
> Although not sure about the details, but the fs looks pretty huge.
> Tons of subvolume and its free space cache inodes.

11TB, 3 or so subvolumes and two snapshots I think.  Not particularly
large for NAS.

> But only 3 tree reloc trees, unless you have tons of reflinked files
> (off-line deduped), it shouldn't cause a lot of problem.

There's going to be a ton of reflinked files.  Both cp --reflink and
via the wholefile dedup.

I freed up ~1/2 TB last month doing dedup.

> At least, we have some progress dropping tree reloc tree for subvolume 6482.

Is there a way to get an idea of how much work is left to be done on
the reloc tree?  Can I walk it
with btrfs-inspect?

dump-tree -t TREE_RELOC is quite enormous (13+ million lines before I gave up)

> If you check the dump-tree output for the following data, the "drop key"
> should change during mount: (inspect dump-tree can be run mounted)
> item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439
> 
> drop key (2769795 EXTENT_DATA 12665933824) level 2
>  ^
>
> So for the worst case scenario, there is some way to determine whether
> it's processing.

I'll keep an eye on that.

> And according to the level (3), which is not small for each subvolume, I
> doubt that's the reason why it's so slow.
>
> BTW, for last skip_balance mount, is there any kernel message like
> "balance: resume skipped"?

No, the only reference to balance in kern.log is a hung
btrfs_cancel_balance from the first reboot.

> Have you tried mount the fs readonly with skip_balance? And then remount
> rw, still with skip_balance?

No, every operation takes a long time.  It's still running the btrfs
check, although I'm
going to cancel it and try mount -o ro,skip_balance before I go to
sleep and see where it is tomorrow.

Thank you for taking the time to help me with this.


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Qu Wenruo


On 8/10/18 5:39 PM, Dan Merillat wrote:
> On Fri, Aug 10, 2018 at 5:13 AM, Qu Wenruo  wrote:
>>
>>
>> On 8/10/18 4:47 PM, Dan Merillat wrote:
>>> Unfortunately that doesn't appear to be it, a forced restart and
>>> attempted to mount with skip_balance leads to the same thing.
>>
>> That's strange.
>>
>> Would you please provide the following output to determine whether we
>> have any balance running?
>>
>> # btrfs inspect dump-super -fFa 
[snip]

Nothing special for super dump.

> 
>>
>> # btrfs inspect dump-tree -t root 
>>
> 
> Too large to include inline, hopefully attaching works.

Although not sure about the details, but the fs looks pretty huge.
Tons of subvolume and its free space cache inodes.

But only 3 tree reloc trees, unless you have tons of reflinked files
(off-line deduped), it shouldn't cause a lot of problem.

At least, we have some progress dropping tree reloc tree for subvolume 6482.

If you check the dump-tree output for the following data, the "drop key"
should change during mount: (inspect dump-tree can be run mounted)
item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439

drop key (2769795 EXTENT_DATA 12665933824) level 2
 ^

So for the worst case scenario, there is some way to determine whether
it's processing.

And according to the level (3), which is not small for each subvolume, I
doubt that's the reason why it's so slow.

BTW, for last skip_balance mount, is there any kernel message like
"balance: resume skipped"?
Have you tried mount the fs readonly with skip_balance? And then remount
rw, still with skip_balance?

> 
> I think there's a balance though:
> item 178 key (BALANCE TEMPORARY_ITEM 0) itemoff 6945 itemsize 448
> temporary item objectid BALANCE offset 0
> balance status flags 6
> DATA
> profiles 0 devid 0 target 0 flags 0
> usage_min 0 usage_max 0 pstart 0 pend 0
> vstart 0 vend 0 limit_min 0 limit_max 0
> stripes_min 0 stripes_max 0
> METADATA
> profiles 0 devid 0 target 0 flags 0
> usage_min 0 usage_max 0 pstart 0 pend 0
> vstart 0 vend 0 limit_min 0 limit_max 0
> stripes_min 0 stripes_max 0
> SYSTEM
> profiles 0 devid 0 target 0 flags 0
> usage_min 0 usage_max 0 pstart 0 pend 0
> vstart 0 vend 0 limit_min 0 limit_max 0
> stripes_min 0 stripes_max 0
> 
> btrfs check is still running, it's found one thing so far:
> 
> Checking filesystem on /dev/bcache0
> UUID: 16adc029-64c5-45ff-8114-e2f5b2f2d331
> checking extents
> ref mismatch on [13135707160576 16384] extent item 0, found 1
> tree backref 13135707160576 parent 13136850550784 root 13136850550784 not 
> found
> in extent tree
> backpointer mismatch on [13135707160576 16384]

Just single error, not bad (compared to some catastrophic error).
It may be a false alert since it's a backref for reloc tree, which is
not that common for us to test.

So the fs should be OK to go, just need to figure out how and why
skip_balance is not working.

Thanks,
Qu

> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache
> checking fs roots
> 



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
E: Resending without the 500k attachment.

On Fri, Aug 10, 2018 at 5:13 AM, Qu Wenruo  wrote:
>
>
> On 8/10/18 4:47 PM, Dan Merillat wrote:
>> Unfortunately that doesn't appear to be it, a forced restart and
>> attempted to mount with skip_balance leads to the same thing.
>
> That's strange.
>
> Would you please provide the following output to determine whether we
> have any balance running?
>
> # btrfs inspect dump-super -fFa 

superblock: bytenr=65536, device=/dev/bcache0
-
csum_type0 (crc32c)
csum_size4
csum0xaeff2ec3 [match]
bytenr65536
flags0x1
( WRITTEN )
magic_BHRfS_M [match]
fsid16adc029-64c5-45ff-8114-e2f5b2f2d331
labelMEDIA
generation4584957
root33947648
sys_array_size129
chunk_root_generation4534813
root_level1
chunk_root13681127653376
chunk_root_level1
log_root0
log_root_transid0
log_root_level0
total_bytes12001954226176
bytes_used11387838865408
sectorsize4096
nodesize16384
leafsize (deprecated)16384
stripesize4096
root_dir6
num_devices1
compat_flags0x0
compat_ro_flags0x0
incompat_flags0x169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA )
cache_generation4584957
uuid_tree_generation4584925
dev_item.uuidec51cc1f-992a-47a2-b7b2-83af026723fd
dev_item.fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 [match]
dev_item.type0
dev_item.total_bytes12001954226176
dev_item.bytes_used11613258579968
dev_item.io_align4096
dev_item.io_width4096
dev_item.sector_size4096
dev_item.devid1
dev_item.dev_group0
dev_item.seek_speed0
dev_item.bandwidth0
dev_item.generation0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13681127456768)
length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 1
stripe 0 devid 1 offset 353298808832
dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd
stripe 1 devid 1 offset 353332363264
dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd
backup_roots[4]:
backup 0:
backup_tree_root:3666753175552gen: 4584956level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:3666740674560gen: 4584956level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:199376896gen: 4584935level: 1
backup_csum_root:3666753568768gen: 4584956level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838865408
backup_num_devices:1

backup 1:
backup_tree_root:33947648gen: 4584957level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:33980416gen: 4584957level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:34160640gen: 4584957level: 1
backup_csum_root:34357248gen: 4584957level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838865408
backup_num_devices:1

backup 2:
backup_tree_root:3666598461440gen: 4584954level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:3666595233792gen: 4584954level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:199376896gen: 4584935level: 1
backup_csum_root:300034304gen: 4584954level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838898176
backup_num_devices:1

backup 3:
backup_tree_root:390998272gen: 4584955level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:390293760gen: 4584955level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:199376896gen: 4584935level: 1
backup_csum_root:391604480gen: 4584955level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838881792
backup_num_devices:1


superblock: bytenr=67108864, device=/dev/bcache0
-
csum_type0 (crc32c)
csum_size4
csum0x0e9e060d [match]
bytenr67108864
flags0x1
( WRITTEN )
magic_BHRfS_M [match]
fsid16adc029-64c5-45ff-8114-e2f5b2f2d331
labelMEDIA
generatio

Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Qu Wenruo


On 8/10/18 4:47 PM, Dan Merillat wrote:
> Unfortunately that doesn't appear to be it, a forced restart and
> attempted to mount with skip_balance leads to the same thing.

That's strange.

Would you please provide the following output to determine whether we
have any balance running?

# btrfs inspect dump-super -fFa 

# btrfs inspect dump-tree -t root 

> 
> 20 minutes in btrfs-transactio had a large burst of reads then started
> spinning the CPU with the disk idle.
> 
> Is this recoverable? I could leave it for a day or so if it may make
> progress, but if not I'd like to start on other options.

When umounted, would you please also try "btrfs check --readonly
" to see if there is anything wrong about the fs?

Thanks,
Qu

> 
> On Fri, Aug 10, 2018 at 3:59 AM, Qu Wenruo  wrote:
>>
>>
>> On 8/10/18 3:40 PM, Dan Merillat wrote:
>>> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)
>>>
>>> I was testing something out and enabled quota groups and started getting
>>> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%.
>>
>> Looks pretty like a running balance and quota.
>>
>> Would you please try with balance disabled (temporarily) with
>> skip_balance mount option to see if it works.
>>
>> If it works, then either try resume balance, or just cancel the balance.
>>
>> Nowadays balance is not needed routinely, especially when you still have
>> unallocated space and enabled quota.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Post-reboot the mount process spinds at 100% CPU, occasinally yielding
>>> to a btrfs-transaction thread at 100% CPU.  The switchover is marked
>>> by a burst of disk activity in btrace.
>>>
>>> Btrace shows all disk activity is returning promptly - no hanging submits.
>>>
>>> Currently the mount is at 6+ hours.
>>>
>>> Suggestions on how to go about debugging this?
>>>
>>



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
Unfortunately that doesn't appear to be it, a forced restart and
attempted to mount with skip_balance leads to the same thing.

20 minutes in btrfs-transactio had a large burst of reads then started
spinning the CPU with the disk idle.

Is this recoverable? I could leave it for a day or so if it may make
progress, but if not I'd like to start on other options.

On Fri, Aug 10, 2018 at 3:59 AM, Qu Wenruo  wrote:
>
>
> On 8/10/18 3:40 PM, Dan Merillat wrote:
>> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)
>>
>> I was testing something out and enabled quota groups and started getting
>> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%.
>
> Looks pretty like a running balance and quota.
>
> Would you please try with balance disabled (temporarily) with
> skip_balance mount option to see if it works.
>
> If it works, then either try resume balance, or just cancel the balance.
>
> Nowadays balance is not needed routinely, especially when you still have
> unallocated space and enabled quota.
>
> Thanks,
> Qu
>
>>
>> Post-reboot the mount process spinds at 100% CPU, occasinally yielding
>> to a btrfs-transaction thread at 100% CPU.  The switchover is marked
>> by a burst of disk activity in btrace.
>>
>> Btrace shows all disk activity is returning promptly - no hanging submits.
>>
>> Currently the mount is at 6+ hours.
>>
>> Suggestions on how to go about debugging this?
>>
>


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Qu Wenruo


On 8/10/18 3:40 PM, Dan Merillat wrote:
> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)
> 
> I was testing something out and enabled quota groups and started getting
> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%.

Looks pretty like a running balance and quota.

Would you please try with balance disabled (temporarily) with
skip_balance mount option to see if it works.

If it works, then either try resume balance, or just cancel the balance.

Nowadays balance is not needed routinely, especially when you still have
unallocated space and enabled quota.

Thanks,
Qu

> 
> Post-reboot the mount process spinds at 100% CPU, occasinally yielding
> to a btrfs-transaction thread at 100% CPU.  The switchover is marked
> by a burst of disk activity in btrace.
> 
> Btrace shows all disk activity is returning promptly - no hanging submits.
> 
> Currently the mount is at 6+ hours.
> 
> Suggestions on how to go about debugging this?
> 



signature.asc
Description: OpenPGP digital signature


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
[23084.426006] sysrq: SysRq : Show Blocked State
[23084.426085]   taskPC stack   pid father
[23084.426332] mount   D0  4857   4618 0x0080
[23084.426403] Call Trace:
[23084.426531]  ? __schedule+0x2c3/0x830
[23084.426628]  ? __wake_up_common+0x6f/0x120
[23084.426751]  schedule+0x2d/0x90
[23084.426871]  wait_current_trans+0x98/0xc0
[23084.426953]  ? wait_woken+0x80/0x80
[23084.427058]  start_transaction+0x2e9/0x3e0
[23084.427128]  btrfs_drop_snapshot+0x48c/0x860
[23084.427220]  merge_reloc_roots+0xca/0x210
[23084.427277]  btrfs_recover_relocation+0x290/0x420
[23084.427399]  ? btrfs_cleanup_fs_roots+0x174/0x190
[23084.427533]  open_ctree+0x2158/0x2549
[23084.427592]  ? bdi_register_va.part.2+0x10a/0x1a0
[23084.427652]  btrfs_mount_root+0x678/0x730
[23084.427709]  ? pcpu_next_unpop+0x32/0x40
[23084.427797]  ? pcpu_alloc+0x2f6/0x680
[23084.427884]  ? mount_fs+0x30/0x150
[23084.427939]  ? btrfs_decode_error+0x20/0x20
[23084.427996]  mount_fs+0x30/0x150
[23084.428054]  vfs_kern_mount.part.7+0x4f/0xf0
[23084.428111]  btrfs_mount+0x156/0x8ad
[23084.428167]  ? pcpu_block_update_hint_alloc+0x15e/0x1d0
[23084.428226]  ? pcpu_next_unpop+0x32/0x40
[23084.428282]  ? pcpu_alloc+0x2f6/0x680
[23084.428338]  ? mount_fs+0x30/0x150
[23084.428393]  mount_fs+0x30/0x150
[23084.428450]  vfs_kern_mount.part.7+0x4f/0xf0
[23084.428507]  do_mount+0x5b0/0xc60
[23084.428563]  ksys_mount+0x7b/0xd0
[23084.428618]  __x64_sys_mount+0x1c/0x20
[23084.428676]  do_syscall_64+0x55/0x110
[23084.428734]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[23084.428794] RIP: 0033:0x7efeb90daa1a
[23084.428849] RSP: 002b:7ffcc8b8fee8 EFLAGS: 0206 ORIG_RAX:
00a5
[23084.428925] RAX: ffda RBX: 55d5bef05420 RCX: 7efeb90daa1a
[23084.428987] RDX: 55d5bef05600 RSI: 55d5bef05ab0 RDI: 55d5bef05b70
[23084.429048] RBP:  R08: 55d5bef08e40 R09: 003f
[23084.429109] R10: c0ed R11: 0206 R12: 55d5bef05b70
[23084.429170] R13: 55d5bef05600 R14:  R15: 


On Fri, Aug 10, 2018 at 3:40 AM, Dan Merillat  wrote:
> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)
>
> I was testing something out and enabled quota groups and started getting
> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%.
>
> Post-reboot the mount process spinds at 100% CPU, occasinally yielding
> to a btrfs-transaction thread at 100% CPU.  The switchover is marked
> by a burst of disk activity in btrace.
>
> Btrace shows all disk activity is returning promptly - no hanging submits.
>
> Currently the mount is at 6+ hours.
>
> Suggestions on how to go about debugging this?


Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)

I was testing something out and enabled quota groups and started getting
2-5 minute long pauses where a btrfs-transaction thread spun at 100%.

Post-reboot the mount process spinds at 100% CPU, occasinally yielding
to a btrfs-transaction thread at 100% CPU.  The switchover is marked
by a burst of disk activity in btrace.

Btrace shows all disk activity is returning promptly - no hanging submits.

Currently the mount is at 6+ hours.

Suggestions on how to go about debugging this?