Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 下午12:18, Dan Merillat wrote: > On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo wrote: > >>> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you >>> to disable quota offline. >> >> Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: >> Add ability to disable quota offline". >> Can also be fetched from github: >> https://github.com/adam900710/btrfs-progs/tree/quota_disable >> >> Usage is: >> # btrfs rescue disable-quota >> >> Tested locally, it would just toggle the ON/OFF flag for quota, so the >> modification should be minimal. > > Noticed one thing while testing this, but it's not related to the > patch so I'll keep it here. > I still had the ,ro mounts in fstab, and while it mounted ro quickly > *unmounting* the filesystem, even readonly, > got hung up: > > Aug 11 23:47:27 fileserver kernel: [ 484.314725] INFO: task > umount:5422 blocked for more than 120 seconds. > Aug 11 23:47:27 fileserver kernel: [ 484.314787] Not tainted > 4.17.14-dirty #3 > Aug 11 23:47:27 fileserver kernel: [ 484.314892] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Aug 11 23:47:27 fileserver kernel: [ 484.315006] umount D > 0 5422 4656 0x0080 > Aug 11 23:47:27 fileserver kernel: [ 484.315122] Call Trace: > Aug 11 23:47:27 fileserver kernel: [ 484.315176] ? __schedule+0x2c0/0x820 > Aug 11 23:47:27 fileserver kernel: [ 484.315270] ? > kmem_cache_alloc+0x167/0x1b0 > Aug 11 23:47:27 fileserver kernel: [ 484.315358] schedule+0x3c/0x90 > Aug 11 23:47:27 fileserver kernel: [ 484.315493] > schedule_timeout+0x1e4/0x430 > Aug 11 23:47:27 fileserver kernel: [ 484.315542] ? > kmem_cache_alloc+0x167/0x1b0 > Aug 11 23:47:27 fileserver kernel: [ 484.315686] wait_for_common+0xb1/0x170 > Aug 11 23:47:27 fileserver kernel: [ 484.315798] ? wake_up_q+0x70/0x70 > Aug 11 23:47:27 fileserver kernel: [ 484.315911] > btrfs_qgroup_wait_for_completion+0x5f/0x80 This normally waits for rescan. It may be possible that your original "btrfs quota enable" kicked in rescan but hasn't finished before umount. But for RO mount we shouldn't have any rescan running. Maybe I could find some spare time looking into it. Thanks for the report, Qu > Aug 11 23:47:27 fileserver kernel: [ 484.316031] close_ctree+0x27/0x2d0 > Aug 11 23:47:27 fileserver kernel: [ 484.316138] > generic_shutdown_super+0x69/0x110 > Aug 11 23:47:27 fileserver kernel: [ 484.316252] kill_anon_super+0xe/0x20 > Aug 11 23:47:27 fileserver kernel: [ 484.316301] btrfs_kill_super+0x13/0x100 > Aug 11 23:47:27 fileserver kernel: [ 484.316349] > deactivate_locked_super+0x39/0x70 > Aug 11 23:47:27 fileserver kernel: [ 484.316399] cleanup_mnt+0x3b/0x70 > Aug 11 23:47:27 fileserver kernel: [ 484.316459] task_work_run+0x89/0xb0 > Aug 11 23:47:27 fileserver kernel: [ 484.316519] > exit_to_usermode_loop+0x8c/0x90 > Aug 11 23:47:27 fileserver kernel: [ 484.316579] do_syscall_64+0xf1/0x110 > Aug 11 23:47:27 fileserver kernel: [ 484.316639] > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Is it trying to write changes to a ro mount, or is it doing a bunch of > work that it's just going to throw away? I ended up using sysrq-b > after commenting out the entries in fstab. > > Everything seems fine with the filesystem now. I appreciate all the help! > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo wrote: > > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > > to disable quota offline. > > Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: > Add ability to disable quota offline". > Can also be fetched from github: > https://github.com/adam900710/btrfs-progs/tree/quota_disable > > Usage is: > # btrfs rescue disable-quota > > Tested locally, it would just toggle the ON/OFF flag for quota, so the > modification should be minimal. Noticed one thing while testing this, but it's not related to the patch so I'll keep it here. I still had the ,ro mounts in fstab, and while it mounted ro quickly *unmounting* the filesystem, even readonly, got hung up: Aug 11 23:47:27 fileserver kernel: [ 484.314725] INFO: task umount:5422 blocked for more than 120 seconds. Aug 11 23:47:27 fileserver kernel: [ 484.314787] Not tainted 4.17.14-dirty #3 Aug 11 23:47:27 fileserver kernel: [ 484.314892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 11 23:47:27 fileserver kernel: [ 484.315006] umount D 0 5422 4656 0x0080 Aug 11 23:47:27 fileserver kernel: [ 484.315122] Call Trace: Aug 11 23:47:27 fileserver kernel: [ 484.315176] ? __schedule+0x2c0/0x820 Aug 11 23:47:27 fileserver kernel: [ 484.315270] ? kmem_cache_alloc+0x167/0x1b0 Aug 11 23:47:27 fileserver kernel: [ 484.315358] schedule+0x3c/0x90 Aug 11 23:47:27 fileserver kernel: [ 484.315493] schedule_timeout+0x1e4/0x430 Aug 11 23:47:27 fileserver kernel: [ 484.315542] ? kmem_cache_alloc+0x167/0x1b0 Aug 11 23:47:27 fileserver kernel: [ 484.315686] wait_for_common+0xb1/0x170 Aug 11 23:47:27 fileserver kernel: [ 484.315798] ? wake_up_q+0x70/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.315911] btrfs_qgroup_wait_for_completion+0x5f/0x80 Aug 11 23:47:27 fileserver kernel: [ 484.316031] close_ctree+0x27/0x2d0 Aug 11 23:47:27 fileserver kernel: [ 484.316138] generic_shutdown_super+0x69/0x110 Aug 11 23:47:27 fileserver kernel: [ 484.316252] kill_anon_super+0xe/0x20 Aug 11 23:47:27 fileserver kernel: [ 484.316301] btrfs_kill_super+0x13/0x100 Aug 11 23:47:27 fileserver kernel: [ 484.316349] deactivate_locked_super+0x39/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.316399] cleanup_mnt+0x3b/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.316459] task_work_run+0x89/0xb0 Aug 11 23:47:27 fileserver kernel: [ 484.316519] exit_to_usermode_loop+0x8c/0x90 Aug 11 23:47:27 fileserver kernel: [ 484.316579] do_syscall_64+0xf1/0x110 Aug 11 23:47:27 fileserver kernel: [ 484.316639] entry_SYSCALL_64_after_hwframe+0x49/0xbe Is it trying to write changes to a ro mount, or is it doing a bunch of work that it's just going to throw away? I ended up using sysrq-b after commenting out the entries in fstab. Everything seems fine with the filesystem now. I appreciate all the help!
Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 上午8:30, Qu Wenruo wrote: > > > On 2018/8/12 上午5:10, Dan Merillat wrote: >> 19 hours later, still going extremely slowly and taking longer and >> longer for progress made. Main symptom is the mount process is >> spinning at 100% CPU, interspersed with btrfs-transaction spinning at >> 100% CPU. >> So far it's racked up 14h45m of CPU time on mount and an additional >> 3h40m on btrfs-transaction. >> >> The current drop key changes every 10-15 minutes when I check it via >> inspect-internal, so some progress is slowly being made. >> >> I built the kernel with ftrace to see what's going on internally, this >> is the pattern I'm seeing: >> > [snip] > > It looks pretty like qgroup, but too many noise. > The pin point trace event would btrfs_find_all_roots(). > >> >> Repeats indefinitely. btrace shows basically zero activity on the >> array while it spins, with the occasional burst when mount & >> btrfs-transaction swap off. >> >> To recap the chain of events leading up to this: >> 11TB Array got completely full and started fragmenting badly. >> Ran bedup and it found 600gb of duplicate files that it offline-shared. >> Reboot for unrelated reasons > > 11T, with highly deduped usage is really the worst scenario case for qgroup. > Qgroup is not really good at handle hight reflinked files, nor balance. > When they combines, it goes worse. > >> Enabled quota on all subvolumes to try to track where the new data is >> coming from >> Tried to balance metadata due to transaction CPU spikes >> Force-rebooted after the array was completely lagged out. >> >> Now attempting to mount it RW. Readonly works, but RW has taken well >> over 24 hours at this point. > > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > to disable quota offline. Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: Add ability to disable quota offline". Can also be fetched from github: https://github.com/adam900710/btrfs-progs/tree/quota_disable Usage is: # btrfs rescue disable-quota Tested locally, it would just toggle the ON/OFF flag for quota, so the modification should be minimal. Thanks, Qu > > Thanks, > Qu > > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 上午8:59, Dan Merillat wrote: > On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo wrote: >> >> It looks pretty like qgroup, but too many noise. >> The pin point trace event would btrfs_find_all_roots(). > > I had this half-written when you replied. > > Agreed: looks like bulk of time spent resides in qgroups. Spent some > time with sysrq-l and ftrace: > > ? __rcu_read_unlock+0x5/0x50 > ? return_to_handler+0x15/0x36 > __rcu_read_unlock+0x5/0x50 > find_extent_buffer+0x47/0x90extent_io.c:4888 > read_block_for_search.isra.12+0xc8/0x350ctree.c:2399 > btrfs_search_slot+0x3e7/0x9c0 ctree.c:2837 > btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702 > btrfs_next_old_item ctree.h:2952 > add_all_parents backref.c:487 > resolve_indirect_refs+0x3f7/0x7e0 backref.c:575 > find_parent_nodes+0x42d/0x1290 backref.c:1236 > ? find_parent_nodes+0x5/0x1290 backref.c:1114 > btrfs_find_all_roots_safe+0x98/0x100backref.c:1414 > btrfs_find_all_roots+0x52/0x70 backref.c:1442 > btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503 > btrfs_qgroup_trace_leaf_items+0x104/0x130 qgroup.c:1589 > btrfs_qgroup_trace_subtree+0x26a/0x3a0 qgroup.c:1750 > do_walk_down+0x33c/0x5a0extent-tree.c:8883 > walk_down_tree+0xa8/0xd0extent-tree.c:9041 > btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203 > merge_reloc_roots+0xcf/0x220 > btrfs_recover_relocation+0x26d/0x400 > ? btrfs_cleanup_fs_roots+0x16a/0x180 > btrfs_remount+0x32e/0x510 > do_remount_sb+0x67/0x1e0 > do_mount+0x712/0xc90 > > The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by > the following ftrace filter: > fileserver:/sys/kernel/tracing# cat set_ftrace_filter > btrfs_qgroup_trace_extent > btrfs_qgroup_trace_subtree Yep, it's quota causing the hang. > [snip] > > So 10-13 minutes per cycle. > >> 11T, with highly deduped usage is really the worst scenario case for qgroup. >> Qgroup is not really good at handle hight reflinked files, nor balance. >> When they combines, it goes worse. > > I'm not really understanding the use-case of qgroup if it melts down > on large systems with a shared base + individual changes. The problem is, for balance btrfs is doing a trick by switch tree reloc tree with real fs tree. However, tree reloc tree doesn't account to quota, but for real fs tree it contributes to quota. And since above owner changes, btrfs needs to do a full subtree rescan. For small subvolume it's not a problem, but for large subvolume, quota needs to rescan thousands tree blocks, and due to highly deduped files, each tree blocks needs extra iterations for each deduped files. Both factors contribute to the slow mount. There are several workaround patches in the mail list, one is to make the balance background for mount, so it won't hang mount. But it still makes transaction pretty slow (write will still be blocked for a long time) There is also plan to skip subtree rescan completely, but it needs extra review to ensure such tree block switch won't change quota number. Thanks, Qu > >> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you >> to disable quota offline. > > Ok. I was looking at just doing this to speed things up: > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 51b5e2da708c..c5bf937b79f0 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct > btrfs_trans_handle *trans, > parent = 0; > } > > - if (need_account) { > + if (0) { > ret = btrfs_qgroup_trace_subtree(trans, root, next, > generation, level - > 1); > if (ret) { > > > btrfs_err_rl(fs_info, > "Error %d accounting shared subtree. Quota > is out of sync, rescan required.", > ret); > } > > > If I follow, this will leave me with inconsistent qgroups and a full > rescan is required. That seems an acceptable tradeoff, since it seems > like the best plan going forward is to nuke the qgroups anyway. > > There's still the btrfs-transaction spin, but I'm hoping that's > related to qgroups as well. > >> >> Thanks, >> Qu > > Appreciate it. I was going to go with my hackjob patch to avoid any > untested rewriting - there's already an error path for "something went > wrong updating qgroups during walk_tree" so it seemed safest to take > advantage of it. I'll patch either the kernel or the btrfs programs, > whichever you think is best. > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo wrote: > > It looks pretty like qgroup, but too many noise. > The pin point trace event would btrfs_find_all_roots(). I had this half-written when you replied. Agreed: looks like bulk of time spent resides in qgroups. Spent some time with sysrq-l and ftrace: ? __rcu_read_unlock+0x5/0x50 ? return_to_handler+0x15/0x36 __rcu_read_unlock+0x5/0x50 find_extent_buffer+0x47/0x90extent_io.c:4888 read_block_for_search.isra.12+0xc8/0x350ctree.c:2399 btrfs_search_slot+0x3e7/0x9c0 ctree.c:2837 btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702 btrfs_next_old_item ctree.h:2952 add_all_parents backref.c:487 resolve_indirect_refs+0x3f7/0x7e0 backref.c:575 find_parent_nodes+0x42d/0x1290 backref.c:1236 ? find_parent_nodes+0x5/0x1290 backref.c:1114 btrfs_find_all_roots_safe+0x98/0x100backref.c:1414 btrfs_find_all_roots+0x52/0x70 backref.c:1442 btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503 btrfs_qgroup_trace_leaf_items+0x104/0x130 qgroup.c:1589 btrfs_qgroup_trace_subtree+0x26a/0x3a0 qgroup.c:1750 do_walk_down+0x33c/0x5a0extent-tree.c:8883 walk_down_tree+0xa8/0xd0extent-tree.c:9041 btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203 merge_reloc_roots+0xcf/0x220 btrfs_recover_relocation+0x26d/0x400 ? btrfs_cleanup_fs_roots+0x16a/0x180 btrfs_remount+0x32e/0x510 do_remount_sb+0x67/0x1e0 do_mount+0x712/0xc90 The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by the following ftrace filter: fileserver:/sys/kernel/tracing# cat set_ftrace_filter btrfs_qgroup_trace_extent btrfs_qgroup_trace_subtree # cat trace ... mount-6803 [003] 80407.649752: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree mount-6803 [003] 80407.649772: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649797: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649821: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649846: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.701652: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754547: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754574: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754598: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754622: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754646: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items ... repeats 240 times mount-6803 [002] 80412.568804: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [002] 80412.568825: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [002] 80412.568850: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree mount-6803 [002] 80412.568872: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items Looks like invocations of btrfs_qgroup_trace_subtree are taking forever: mount-6803 [006] 80641.627709: btrfs_qgroup_trace_subtree <-do_walk_down mount-6803 [003] 81433.760945: btrfs_qgroup_trace_subtree <-do_walk_down (add do_walk_down to the trace here) mount-6803 [001] 82124.623557: do_walk_down <-walk_down_tree mount-6803 [001] 82124.623567: btrfs_qgroup_trace_subtree <-do_walk_down mount-6803 [006] 82695.241306: do_walk_down <-walk_down_tree mount-6803 [006] 82695.241316: btrfs_qgroup_trace_subtree <-do_walk_down So 10-13 minutes per cycle. > 11T, with highly deduped usage is really the worst scenario case for qgroup. > Qgroup is not really good at handle hight reflinked files, nor balance. > When they combines, it goes worse. I'm not really understanding the use-case of qgroup if it melts down on large systems with a shared base + individual changes. > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > to disable quota offline. Ok. I was looking at just doing this to speed things up: diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 51b5e2da708c..c5bf937b79f0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans, parent = 0; } - if (need_account) { + if (0) { ret
Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 上午5:10, Dan Merillat wrote: > 19 hours later, still going extremely slowly and taking longer and > longer for progress made. Main symptom is the mount process is > spinning at 100% CPU, interspersed with btrfs-transaction spinning at > 100% CPU. > So far it's racked up 14h45m of CPU time on mount and an additional > 3h40m on btrfs-transaction. > > The current drop key changes every 10-15 minutes when I check it via > inspect-internal, so some progress is slowly being made. > > I built the kernel with ftrace to see what's going on internally, this > is the pattern I'm seeing: > [snip] It looks pretty like qgroup, but too many noise. The pin point trace event would btrfs_find_all_roots(). > > Repeats indefinitely. btrace shows basically zero activity on the > array while it spins, with the occasional burst when mount & > btrfs-transaction swap off. > > To recap the chain of events leading up to this: > 11TB Array got completely full and started fragmenting badly. > Ran bedup and it found 600gb of duplicate files that it offline-shared. > Reboot for unrelated reasons 11T, with highly deduped usage is really the worst scenario case for qgroup. Qgroup is not really good at handle hight reflinked files, nor balance. When they combines, it goes worse. > Enabled quota on all subvolumes to try to track where the new data is > coming from > Tried to balance metadata due to transaction CPU spikes > Force-rebooted after the array was completely lagged out. > > Now attempting to mount it RW. Readonly works, but RW has taken well > over 24 hours at this point. I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you to disable quota offline. Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
19 hours later, still going extremely slowly and taking longer and longer for progress made. Main symptom is the mount process is spinning at 100% CPU, interspersed with btrfs-transaction spinning at 100% CPU. So far it's racked up 14h45m of CPU time on mount and an additional 3h40m on btrfs-transaction. The current drop key changes every 10-15 minutes when I check it via inspect-internal, so some progress is slowly being made. I built the kernel with ftrace to see what's going on internally, this is the pattern I'm seeing: mount-6803 [002] ...1 69023.970964: btrfs_next_old_leaf <-resolve_indirect_refs mount-6803 [002] ...1 69023.970965: btrfs_release_path <-btrfs_next_old_leaf mount-6803 [002] ...1 69023.970965: btrfs_search_slot <-btrfs_next_old_leaf mount-6803 [002] ...1 69023.970966: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970966: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970967: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970967: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970967: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970968: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970968: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970969: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970969: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970970: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970970: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970970: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970971: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970971: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970972: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970972: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970974: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970974: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970975: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970975: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970976: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970976: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970976: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970977: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970977: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970979: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970979: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970981: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970981: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970982: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970982: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970983: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970983: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.97
Re: Mount stalls indefinitely after enabling quota groups.
On Fri, Aug 10, 2018 at 6:51 AM, Qu Wenruo wrote: > > > On 8/10/18 6:42 PM, Dan Merillat wrote: >> On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo wrote: > > But considering your amount of block groups, mount itself may take some > time (before trying to resume balance). I'd believe it, a clean mount took 2-3 minutes normally. btrfs check ran out of RAM eventually so I killed it and went on to trying to mount again. readonly mounted pretty quickly, so I'm just letting -o remount,rw spin for however long it needs to. Readonly access is fine over the weekend, and hopefully it will be done by monday. To be clear, what exactly am I watching with dump-tree to monitor forward progress? Thanks again for the help!
Re: Mount stalls indefinitely after enabling quota groups.
On 8/10/18 6:42 PM, Dan Merillat wrote: > On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo wrote: > >> >> Although not sure about the details, but the fs looks pretty huge. >> Tons of subvolume and its free space cache inodes. > > 11TB, 3 or so subvolumes and two snapshots I think. Not particularly > large for NAS. > >> But only 3 tree reloc trees, unless you have tons of reflinked files >> (off-line deduped), it shouldn't cause a lot of problem. > > There's going to be a ton of reflinked files. Both cp --reflink and > via the wholefile dedup. > > I freed up ~1/2 TB last month doing dedup. > >> At least, we have some progress dropping tree reloc tree for subvolume 6482. > > Is there a way to get an idea of how much work is left to be done on > the reloc tree? You could inspect by the level. But each level change is a huge step forward. > Can I walk it > with btrfs-inspect? Of course you can. Using -b could show the remaining tree. > > dump-tree -t TREE_RELOC is quite enormous (13+ million lines before I gave up) No wonder, since it's kind of snapshot of your subvolume. > >> If you check the dump-tree output for the following data, the "drop key" >> should change during mount: (inspect dump-tree can be run mounted) >> item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439 >> >> drop key (2769795 EXTENT_DATA 12665933824) level 2 >> ^ >> >> So for the worst case scenario, there is some way to determine whether >> it's processing. > > I'll keep an eye on that. > >> And according to the level (3), which is not small for each subvolume, I >> doubt that's the reason why it's so slow. >> >> BTW, for last skip_balance mount, is there any kernel message like >> "balance: resume skipped"? > > No, the only reference to balance in kern.log is a hung > btrfs_cancel_balance from the first reboot. That's strange. But considering your amount of block groups, mount itself may take some time (before trying to resume balance). Thanks, Qu > >> Have you tried mount the fs readonly with skip_balance? And then remount >> rw, still with skip_balance? > > No, every operation takes a long time. It's still running the btrfs > check, although I'm > going to cancel it and try mount -o ro,skip_balance before I go to > sleep and see where it is tomorrow. > > Thank you for taking the time to help me with this. > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo wrote: > > Although not sure about the details, but the fs looks pretty huge. > Tons of subvolume and its free space cache inodes. 11TB, 3 or so subvolumes and two snapshots I think. Not particularly large for NAS. > But only 3 tree reloc trees, unless you have tons of reflinked files > (off-line deduped), it shouldn't cause a lot of problem. There's going to be a ton of reflinked files. Both cp --reflink and via the wholefile dedup. I freed up ~1/2 TB last month doing dedup. > At least, we have some progress dropping tree reloc tree for subvolume 6482. Is there a way to get an idea of how much work is left to be done on the reloc tree? Can I walk it with btrfs-inspect? dump-tree -t TREE_RELOC is quite enormous (13+ million lines before I gave up) > If you check the dump-tree output for the following data, the "drop key" > should change during mount: (inspect dump-tree can be run mounted) > item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439 > > drop key (2769795 EXTENT_DATA 12665933824) level 2 > ^ > > So for the worst case scenario, there is some way to determine whether > it's processing. I'll keep an eye on that. > And according to the level (3), which is not small for each subvolume, I > doubt that's the reason why it's so slow. > > BTW, for last skip_balance mount, is there any kernel message like > "balance: resume skipped"? No, the only reference to balance in kern.log is a hung btrfs_cancel_balance from the first reboot. > Have you tried mount the fs readonly with skip_balance? And then remount > rw, still with skip_balance? No, every operation takes a long time. It's still running the btrfs check, although I'm going to cancel it and try mount -o ro,skip_balance before I go to sleep and see where it is tomorrow. Thank you for taking the time to help me with this.
Re: Mount stalls indefinitely after enabling quota groups.
On 8/10/18 5:39 PM, Dan Merillat wrote: > On Fri, Aug 10, 2018 at 5:13 AM, Qu Wenruo wrote: >> >> >> On 8/10/18 4:47 PM, Dan Merillat wrote: >>> Unfortunately that doesn't appear to be it, a forced restart and >>> attempted to mount with skip_balance leads to the same thing. >> >> That's strange. >> >> Would you please provide the following output to determine whether we >> have any balance running? >> >> # btrfs inspect dump-super -fFa [snip] Nothing special for super dump. > >> >> # btrfs inspect dump-tree -t root >> > > Too large to include inline, hopefully attaching works. Although not sure about the details, but the fs looks pretty huge. Tons of subvolume and its free space cache inodes. But only 3 tree reloc trees, unless you have tons of reflinked files (off-line deduped), it shouldn't cause a lot of problem. At least, we have some progress dropping tree reloc tree for subvolume 6482. If you check the dump-tree output for the following data, the "drop key" should change during mount: (inspect dump-tree can be run mounted) item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439 drop key (2769795 EXTENT_DATA 12665933824) level 2 ^ So for the worst case scenario, there is some way to determine whether it's processing. And according to the level (3), which is not small for each subvolume, I doubt that's the reason why it's so slow. BTW, for last skip_balance mount, is there any kernel message like "balance: resume skipped"? Have you tried mount the fs readonly with skip_balance? And then remount rw, still with skip_balance? > > I think there's a balance though: > item 178 key (BALANCE TEMPORARY_ITEM 0) itemoff 6945 itemsize 448 > temporary item objectid BALANCE offset 0 > balance status flags 6 > DATA > profiles 0 devid 0 target 0 flags 0 > usage_min 0 usage_max 0 pstart 0 pend 0 > vstart 0 vend 0 limit_min 0 limit_max 0 > stripes_min 0 stripes_max 0 > METADATA > profiles 0 devid 0 target 0 flags 0 > usage_min 0 usage_max 0 pstart 0 pend 0 > vstart 0 vend 0 limit_min 0 limit_max 0 > stripes_min 0 stripes_max 0 > SYSTEM > profiles 0 devid 0 target 0 flags 0 > usage_min 0 usage_max 0 pstart 0 pend 0 > vstart 0 vend 0 limit_min 0 limit_max 0 > stripes_min 0 stripes_max 0 > > btrfs check is still running, it's found one thing so far: > > Checking filesystem on /dev/bcache0 > UUID: 16adc029-64c5-45ff-8114-e2f5b2f2d331 > checking extents > ref mismatch on [13135707160576 16384] extent item 0, found 1 > tree backref 13135707160576 parent 13136850550784 root 13136850550784 not > found > in extent tree > backpointer mismatch on [13135707160576 16384] Just single error, not bad (compared to some catastrophic error). It may be a false alert since it's a backref for reloc tree, which is not that common for us to test. So the fs should be OK to go, just need to figure out how and why skip_balance is not working. Thanks, Qu > ERROR: errors found in extent allocation tree or chunk allocation > checking free space cache > checking fs roots > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
E: Resending without the 500k attachment. On Fri, Aug 10, 2018 at 5:13 AM, Qu Wenruo wrote: > > > On 8/10/18 4:47 PM, Dan Merillat wrote: >> Unfortunately that doesn't appear to be it, a forced restart and >> attempted to mount with skip_balance leads to the same thing. > > That's strange. > > Would you please provide the following output to determine whether we > have any balance running? > > # btrfs inspect dump-super -fFa superblock: bytenr=65536, device=/dev/bcache0 - csum_type0 (crc32c) csum_size4 csum0xaeff2ec3 [match] bytenr65536 flags0x1 ( WRITTEN ) magic_BHRfS_M [match] fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 labelMEDIA generation4584957 root33947648 sys_array_size129 chunk_root_generation4534813 root_level1 chunk_root13681127653376 chunk_root_level1 log_root0 log_root_transid0 log_root_level0 total_bytes12001954226176 bytes_used11387838865408 sectorsize4096 nodesize16384 leafsize (deprecated)16384 stripesize4096 root_dir6 num_devices1 compat_flags0x0 compat_ro_flags0x0 incompat_flags0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation4584957 uuid_tree_generation4584925 dev_item.uuidec51cc1f-992a-47a2-b7b2-83af026723fd dev_item.fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 [match] dev_item.type0 dev_item.total_bytes12001954226176 dev_item.bytes_used11613258579968 dev_item.io_align4096 dev_item.io_width4096 dev_item.sector_size4096 dev_item.devid1 dev_item.dev_group0 dev_item.seek_speed0 dev_item.bandwidth0 dev_item.generation0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13681127456768) length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP io_align 65536 io_width 65536 sector_size 4096 num_stripes 2 sub_stripes 1 stripe 0 devid 1 offset 353298808832 dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd stripe 1 devid 1 offset 353332363264 dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd backup_roots[4]: backup 0: backup_tree_root:3666753175552gen: 4584956level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:3666740674560gen: 4584956level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:199376896gen: 4584935level: 1 backup_csum_root:3666753568768gen: 4584956level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838865408 backup_num_devices:1 backup 1: backup_tree_root:33947648gen: 4584957level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:33980416gen: 4584957level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:34160640gen: 4584957level: 1 backup_csum_root:34357248gen: 4584957level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838865408 backup_num_devices:1 backup 2: backup_tree_root:3666598461440gen: 4584954level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:3666595233792gen: 4584954level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:199376896gen: 4584935level: 1 backup_csum_root:300034304gen: 4584954level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838898176 backup_num_devices:1 backup 3: backup_tree_root:390998272gen: 4584955level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:390293760gen: 4584955level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:199376896gen: 4584935level: 1 backup_csum_root:391604480gen: 4584955level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838881792 backup_num_devices:1 superblock: bytenr=67108864, device=/dev/bcache0 - csum_type0 (crc32c) csum_size4 csum0x0e9e060d [match] bytenr67108864 flags0x1 ( WRITTEN ) magic_BHRfS_M [match] fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 labelMEDIA generatio
Re: Mount stalls indefinitely after enabling quota groups.
On 8/10/18 4:47 PM, Dan Merillat wrote: > Unfortunately that doesn't appear to be it, a forced restart and > attempted to mount with skip_balance leads to the same thing. That's strange. Would you please provide the following output to determine whether we have any balance running? # btrfs inspect dump-super -fFa # btrfs inspect dump-tree -t root > > 20 minutes in btrfs-transactio had a large burst of reads then started > spinning the CPU with the disk idle. > > Is this recoverable? I could leave it for a day or so if it may make > progress, but if not I'd like to start on other options. When umounted, would you please also try "btrfs check --readonly " to see if there is anything wrong about the fs? Thanks, Qu > > On Fri, Aug 10, 2018 at 3:59 AM, Qu Wenruo wrote: >> >> >> On 8/10/18 3:40 PM, Dan Merillat wrote: >>> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) >>> >>> I was testing something out and enabled quota groups and started getting >>> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. >> >> Looks pretty like a running balance and quota. >> >> Would you please try with balance disabled (temporarily) with >> skip_balance mount option to see if it works. >> >> If it works, then either try resume balance, or just cancel the balance. >> >> Nowadays balance is not needed routinely, especially when you still have >> unallocated space and enabled quota. >> >> Thanks, >> Qu >> >>> >>> Post-reboot the mount process spinds at 100% CPU, occasinally yielding >>> to a btrfs-transaction thread at 100% CPU. The switchover is marked >>> by a burst of disk activity in btrace. >>> >>> Btrace shows all disk activity is returning promptly - no hanging submits. >>> >>> Currently the mount is at 6+ hours. >>> >>> Suggestions on how to go about debugging this? >>> >> signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
Unfortunately that doesn't appear to be it, a forced restart and attempted to mount with skip_balance leads to the same thing. 20 minutes in btrfs-transactio had a large burst of reads then started spinning the CPU with the disk idle. Is this recoverable? I could leave it for a day or so if it may make progress, but if not I'd like to start on other options. On Fri, Aug 10, 2018 at 3:59 AM, Qu Wenruo wrote: > > > On 8/10/18 3:40 PM, Dan Merillat wrote: >> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) >> >> I was testing something out and enabled quota groups and started getting >> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. > > Looks pretty like a running balance and quota. > > Would you please try with balance disabled (temporarily) with > skip_balance mount option to see if it works. > > If it works, then either try resume balance, or just cancel the balance. > > Nowadays balance is not needed routinely, especially when you still have > unallocated space and enabled quota. > > Thanks, > Qu > >> >> Post-reboot the mount process spinds at 100% CPU, occasinally yielding >> to a btrfs-transaction thread at 100% CPU. The switchover is marked >> by a burst of disk activity in btrace. >> >> Btrace shows all disk activity is returning promptly - no hanging submits. >> >> Currently the mount is at 6+ hours. >> >> Suggestions on how to go about debugging this? >> >
Re: Mount stalls indefinitely after enabling quota groups.
On 8/10/18 3:40 PM, Dan Merillat wrote: > Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) > > I was testing something out and enabled quota groups and started getting > 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. Looks pretty like a running balance and quota. Would you please try with balance disabled (temporarily) with skip_balance mount option to see if it works. If it works, then either try resume balance, or just cancel the balance. Nowadays balance is not needed routinely, especially when you still have unallocated space and enabled quota. Thanks, Qu > > Post-reboot the mount process spinds at 100% CPU, occasinally yielding > to a btrfs-transaction thread at 100% CPU. The switchover is marked > by a burst of disk activity in btrace. > > Btrace shows all disk activity is returning promptly - no hanging submits. > > Currently the mount is at 6+ hours. > > Suggestions on how to go about debugging this? > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
[23084.426006] sysrq: SysRq : Show Blocked State [23084.426085] taskPC stack pid father [23084.426332] mount D0 4857 4618 0x0080 [23084.426403] Call Trace: [23084.426531] ? __schedule+0x2c3/0x830 [23084.426628] ? __wake_up_common+0x6f/0x120 [23084.426751] schedule+0x2d/0x90 [23084.426871] wait_current_trans+0x98/0xc0 [23084.426953] ? wait_woken+0x80/0x80 [23084.427058] start_transaction+0x2e9/0x3e0 [23084.427128] btrfs_drop_snapshot+0x48c/0x860 [23084.427220] merge_reloc_roots+0xca/0x210 [23084.427277] btrfs_recover_relocation+0x290/0x420 [23084.427399] ? btrfs_cleanup_fs_roots+0x174/0x190 [23084.427533] open_ctree+0x2158/0x2549 [23084.427592] ? bdi_register_va.part.2+0x10a/0x1a0 [23084.427652] btrfs_mount_root+0x678/0x730 [23084.427709] ? pcpu_next_unpop+0x32/0x40 [23084.427797] ? pcpu_alloc+0x2f6/0x680 [23084.427884] ? mount_fs+0x30/0x150 [23084.427939] ? btrfs_decode_error+0x20/0x20 [23084.427996] mount_fs+0x30/0x150 [23084.428054] vfs_kern_mount.part.7+0x4f/0xf0 [23084.428111] btrfs_mount+0x156/0x8ad [23084.428167] ? pcpu_block_update_hint_alloc+0x15e/0x1d0 [23084.428226] ? pcpu_next_unpop+0x32/0x40 [23084.428282] ? pcpu_alloc+0x2f6/0x680 [23084.428338] ? mount_fs+0x30/0x150 [23084.428393] mount_fs+0x30/0x150 [23084.428450] vfs_kern_mount.part.7+0x4f/0xf0 [23084.428507] do_mount+0x5b0/0xc60 [23084.428563] ksys_mount+0x7b/0xd0 [23084.428618] __x64_sys_mount+0x1c/0x20 [23084.428676] do_syscall_64+0x55/0x110 [23084.428734] entry_SYSCALL_64_after_hwframe+0x49/0xbe [23084.428794] RIP: 0033:0x7efeb90daa1a [23084.428849] RSP: 002b:7ffcc8b8fee8 EFLAGS: 0206 ORIG_RAX: 00a5 [23084.428925] RAX: ffda RBX: 55d5bef05420 RCX: 7efeb90daa1a [23084.428987] RDX: 55d5bef05600 RSI: 55d5bef05ab0 RDI: 55d5bef05b70 [23084.429048] RBP: R08: 55d5bef08e40 R09: 003f [23084.429109] R10: c0ed R11: 0206 R12: 55d5bef05b70 [23084.429170] R13: 55d5bef05600 R14: R15: On Fri, Aug 10, 2018 at 3:40 AM, Dan Merillat wrote: > Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) > > I was testing something out and enabled quota groups and started getting > 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. > > Post-reboot the mount process spinds at 100% CPU, occasinally yielding > to a btrfs-transaction thread at 100% CPU. The switchover is marked > by a burst of disk activity in btrace. > > Btrace shows all disk activity is returning promptly - no hanging submits. > > Currently the mount is at 6+ hours. > > Suggestions on how to go about debugging this?
Mount stalls indefinitely after enabling quota groups.
Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) I was testing something out and enabled quota groups and started getting 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. Post-reboot the mount process spinds at 100% CPU, occasinally yielding to a btrfs-transaction thread at 100% CPU. The switchover is marked by a burst of disk activity in btrace. Btrace shows all disk activity is returning promptly - no hanging submits. Currently the mount is at 6+ hours. Suggestions on how to go about debugging this?