Re: Mount stalls indefinitely after enabling quota groups.
On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo wrote: > > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > > to disable quota offline. > > Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: > Add ability to disable quota offline". > Can also be fetched from github: > https://github.com/adam900710/btrfs-progs/tree/quota_disable > > Usage is: > # btrfs rescue disable-quota > > Tested locally, it would just toggle the ON/OFF flag for quota, so the > modification should be minimal. Noticed one thing while testing this, but it's not related to the patch so I'll keep it here. I still had the ,ro mounts in fstab, and while it mounted ro quickly *unmounting* the filesystem, even readonly, got hung up: Aug 11 23:47:27 fileserver kernel: [ 484.314725] INFO: task umount:5422 blocked for more than 120 seconds. Aug 11 23:47:27 fileserver kernel: [ 484.314787] Not tainted 4.17.14-dirty #3 Aug 11 23:47:27 fileserver kernel: [ 484.314892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 11 23:47:27 fileserver kernel: [ 484.315006] umount D 0 5422 4656 0x0080 Aug 11 23:47:27 fileserver kernel: [ 484.315122] Call Trace: Aug 11 23:47:27 fileserver kernel: [ 484.315176] ? __schedule+0x2c0/0x820 Aug 11 23:47:27 fileserver kernel: [ 484.315270] ? kmem_cache_alloc+0x167/0x1b0 Aug 11 23:47:27 fileserver kernel: [ 484.315358] schedule+0x3c/0x90 Aug 11 23:47:27 fileserver kernel: [ 484.315493] schedule_timeout+0x1e4/0x430 Aug 11 23:47:27 fileserver kernel: [ 484.315542] ? kmem_cache_alloc+0x167/0x1b0 Aug 11 23:47:27 fileserver kernel: [ 484.315686] wait_for_common+0xb1/0x170 Aug 11 23:47:27 fileserver kernel: [ 484.315798] ? wake_up_q+0x70/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.315911] btrfs_qgroup_wait_for_completion+0x5f/0x80 Aug 11 23:47:27 fileserver kernel: [ 484.316031] close_ctree+0x27/0x2d0 Aug 11 23:47:27 fileserver kernel: [ 484.316138] generic_shutdown_super+0x69/0x110 Aug 11 23:47:27 fileserver kernel: [ 484.316252] kill_anon_super+0xe/0x20 Aug 11 23:47:27 fileserver kernel: [ 484.316301] btrfs_kill_super+0x13/0x100 Aug 11 23:47:27 fileserver kernel: [ 484.316349] deactivate_locked_super+0x39/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.316399] cleanup_mnt+0x3b/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.316459] task_work_run+0x89/0xb0 Aug 11 23:47:27 fileserver kernel: [ 484.316519] exit_to_usermode_loop+0x8c/0x90 Aug 11 23:47:27 fileserver kernel: [ 484.316579] do_syscall_64+0xf1/0x110 Aug 11 23:47:27 fileserver kernel: [ 484.316639] entry_SYSCALL_64_after_hwframe+0x49/0xbe Is it trying to write changes to a ro mount, or is it doing a bunch of work that it's just going to throw away? I ended up using sysrq-b after commenting out the entries in fstab. Everything seems fine with the filesystem now. I appreciate all the help!
Re: [PATCH] btrfs-progs: rescue: Add ability to disable quota offline
On Sat, Aug 11, 2018 at 9:34 PM Qu Wenruo wrote: > > Provide an offline tool to disable quota. > > For kernel which skip_balance doesn't work, there is no way to disable > quota on huge fs with balance, as quota will cause balance to hang for a > long long time for each tree block switch. > > So add an offline rescue tool to disable quota. > > Reported-by: Dan Merillat > Signed-off-by: Qu Wenruo That fixed it, thanks. Tested-By: Dan Merillat
Re: Mount stalls indefinitely after enabling quota groups.
On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo wrote: > > It looks pretty like qgroup, but too many noise. > The pin point trace event would btrfs_find_all_roots(). I had this half-written when you replied. Agreed: looks like bulk of time spent resides in qgroups. Spent some time with sysrq-l and ftrace: ? __rcu_read_unlock+0x5/0x50 ? return_to_handler+0x15/0x36 __rcu_read_unlock+0x5/0x50 find_extent_buffer+0x47/0x90extent_io.c:4888 read_block_for_search.isra.12+0xc8/0x350ctree.c:2399 btrfs_search_slot+0x3e7/0x9c0 ctree.c:2837 btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702 btrfs_next_old_item ctree.h:2952 add_all_parents backref.c:487 resolve_indirect_refs+0x3f7/0x7e0 backref.c:575 find_parent_nodes+0x42d/0x1290 backref.c:1236 ? find_parent_nodes+0x5/0x1290 backref.c:1114 btrfs_find_all_roots_safe+0x98/0x100backref.c:1414 btrfs_find_all_roots+0x52/0x70 backref.c:1442 btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503 btrfs_qgroup_trace_leaf_items+0x104/0x130 qgroup.c:1589 btrfs_qgroup_trace_subtree+0x26a/0x3a0 qgroup.c:1750 do_walk_down+0x33c/0x5a0extent-tree.c:8883 walk_down_tree+0xa8/0xd0extent-tree.c:9041 btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203 merge_reloc_roots+0xcf/0x220 btrfs_recover_relocation+0x26d/0x400 ? btrfs_cleanup_fs_roots+0x16a/0x180 btrfs_remount+0x32e/0x510 do_remount_sb+0x67/0x1e0 do_mount+0x712/0xc90 The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by the following ftrace filter: fileserver:/sys/kernel/tracing# cat set_ftrace_filter btrfs_qgroup_trace_extent btrfs_qgroup_trace_subtree # cat trace ... mount-6803 [003] 80407.649752: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree mount-6803 [003] 80407.649772: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649797: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649821: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649846: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.701652: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754547: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754574: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754598: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754622: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754646: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items ... repeats 240 times mount-6803 [002] 80412.568804: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [002] 80412.568825: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [002] 80412.568850: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree mount-6803 [002] 80412.568872: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items Looks like invocations of btrfs_qgroup_trace_subtree are taking forever: mount-6803 [006] 80641.627709: btrfs_qgroup_trace_subtree <-do_walk_down mount-6803 [003] 81433.760945: btrfs_qgroup_trace_subtree <-do_walk_down (add do_walk_down to the trace here) mount-6803 [001] 82124.623557: do_walk_down <-walk_down_tree mount-6803 [001] 82124.623567: btrfs_qgroup_trace_subtree <-do_walk_down mount-6803 [006] 82695.241306: do_walk_down <-walk_down_tree mount-6803 [006] 82695.241316: btrfs_qgroup_trace_subtree <-do_walk_down So 10-13 minutes per cycle. > 11T, with highly deduped usage is really the worst scenario case for qgroup. > Qgroup is not really good at handle hight reflinked files, nor balance. > When they combines, it goes worse. I'm not really understanding the use-case of qgroup if it melts down on large systems with a shared base + individual changes. > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > to disable quota offline. Ok. I was looking at just doing this to speed things up: diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 51b5e2da708c..c5bf937b79f0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans, parent = 0; } - if (need_account) { + if (0) { ret
Re: Mount stalls indefinitely after enabling quota groups.
19 hours later, still going extremely slowly and taking longer and longer for progress made. Main symptom is the mount process is spinning at 100% CPU, interspersed with btrfs-transaction spinning at 100% CPU. So far it's racked up 14h45m of CPU time on mount and an additional 3h40m on btrfs-transaction. The current drop key changes every 10-15 minutes when I check it via inspect-internal, so some progress is slowly being made. I built the kernel with ftrace to see what's going on internally, this is the pattern I'm seeing: mount-6803 [002] ...1 69023.970964: btrfs_next_old_leaf <-resolve_indirect_refs mount-6803 [002] ...1 69023.970965: btrfs_release_path <-btrfs_next_old_leaf mount-6803 [002] ...1 69023.970965: btrfs_search_slot <-btrfs_next_old_leaf mount-6803 [002] ...1 69023.970966: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970966: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970967: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970967: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970967: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970968: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970968: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970969: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970969: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970970: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970970: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970970: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970971: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970971: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970972: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970972: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970974: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970974: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970975: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970975: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970976: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970976: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970976: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970977: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970977: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970979: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970979: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970981: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970981: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970982: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970982: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970983: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970983: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1
Re: Mount stalls indefinitely after enabling quota groups.
On Fri, Aug 10, 2018 at 6:51 AM, Qu Wenruo wrote: > > > On 8/10/18 6:42 PM, Dan Merillat wrote: >> On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo wrote: > > But considering your amount of block groups, mount itself may take some > time (before trying to resume balance). I'd believe it, a clean mount took 2-3 minutes normally. btrfs check ran out of RAM eventually so I killed it and went on to trying to mount again. readonly mounted pretty quickly, so I'm just letting -o remount,rw spin for however long it needs to. Readonly access is fine over the weekend, and hopefully it will be done by monday. To be clear, what exactly am I watching with dump-tree to monitor forward progress? Thanks again for the help!
Re: Mount stalls indefinitely after enabling quota groups.
On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo wrote: > > Although not sure about the details, but the fs looks pretty huge. > Tons of subvolume and its free space cache inodes. 11TB, 3 or so subvolumes and two snapshots I think. Not particularly large for NAS. > But only 3 tree reloc trees, unless you have tons of reflinked files > (off-line deduped), it shouldn't cause a lot of problem. There's going to be a ton of reflinked files. Both cp --reflink and via the wholefile dedup. I freed up ~1/2 TB last month doing dedup. > At least, we have some progress dropping tree reloc tree for subvolume 6482. Is there a way to get an idea of how much work is left to be done on the reloc tree? Can I walk it with btrfs-inspect? dump-tree -t TREE_RELOC is quite enormous (13+ million lines before I gave up) > If you check the dump-tree output for the following data, the "drop key" > should change during mount: (inspect dump-tree can be run mounted) > item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439 > > drop key (2769795 EXTENT_DATA 12665933824) level 2 > ^ > > So for the worst case scenario, there is some way to determine whether > it's processing. I'll keep an eye on that. > And according to the level (3), which is not small for each subvolume, I > doubt that's the reason why it's so slow. > > BTW, for last skip_balance mount, is there any kernel message like > "balance: resume skipped"? No, the only reference to balance in kern.log is a hung btrfs_cancel_balance from the first reboot. > Have you tried mount the fs readonly with skip_balance? And then remount > rw, still with skip_balance? No, every operation takes a long time. It's still running the btrfs check, although I'm going to cancel it and try mount -o ro,skip_balance before I go to sleep and see where it is tomorrow. Thank you for taking the time to help me with this.
Re: Mount stalls indefinitely after enabling quota groups.
E: Resending without the 500k attachment. On Fri, Aug 10, 2018 at 5:13 AM, Qu Wenruo wrote: > > > On 8/10/18 4:47 PM, Dan Merillat wrote: >> Unfortunately that doesn't appear to be it, a forced restart and >> attempted to mount with skip_balance leads to the same thing. > > That's strange. > > Would you please provide the following output to determine whether we > have any balance running? > > # btrfs inspect dump-super -fFa superblock: bytenr=65536, device=/dev/bcache0 - csum_type0 (crc32c) csum_size4 csum0xaeff2ec3 [match] bytenr65536 flags0x1 ( WRITTEN ) magic_BHRfS_M [match] fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 labelMEDIA generation4584957 root33947648 sys_array_size129 chunk_root_generation4534813 root_level1 chunk_root13681127653376 chunk_root_level1 log_root0 log_root_transid0 log_root_level0 total_bytes12001954226176 bytes_used11387838865408 sectorsize4096 nodesize16384 leafsize (deprecated)16384 stripesize4096 root_dir6 num_devices1 compat_flags0x0 compat_ro_flags0x0 incompat_flags0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation4584957 uuid_tree_generation4584925 dev_item.uuidec51cc1f-992a-47a2-b7b2-83af026723fd dev_item.fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 [match] dev_item.type0 dev_item.total_bytes12001954226176 dev_item.bytes_used11613258579968 dev_item.io_align4096 dev_item.io_width4096 dev_item.sector_size4096 dev_item.devid1 dev_item.dev_group0 dev_item.seek_speed0 dev_item.bandwidth0 dev_item.generation0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13681127456768) length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP io_align 65536 io_width 65536 sector_size 4096 num_stripes 2 sub_stripes 1 stripe 0 devid 1 offset 353298808832 dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd stripe 1 devid 1 offset 353332363264 dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd backup_roots[4]: backup 0: backup_tree_root:3666753175552gen: 4584956level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:3666740674560gen: 4584956level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:199376896gen: 4584935level: 1 backup_csum_root:3666753568768gen: 4584956level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838865408 backup_num_devices:1 backup 1: backup_tree_root:33947648gen: 4584957level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:33980416gen: 4584957level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:34160640gen: 4584957level: 1 backup_csum_root:34357248gen: 4584957level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838865408 backup_num_devices:1 backup 2: backup_tree_root:3666598461440gen: 4584954level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:3666595233792gen: 4584954level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:199376896gen: 4584935level: 1 backup_csum_root:300034304gen: 4584954level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838898176 backup_num_devices:1 backup 3: backup_tree_root:390998272gen: 4584955level: 1 backup_chunk_root:13681127653376gen: 4534813level: 1 backup_extent_root:390293760gen: 4584955level: 2 backup_fs_root:0gen: 0level: 0 backup_dev_root:199376896gen: 4584935level: 1 backup_csum_root:391604480gen: 4584955level: 3 backup_total_bytes:12001954226176 backup_bytes_used:11387838881792 backup_num_devices:1 superblock: bytenr=67108864, device=/dev/bcache0 - csum_type0 (crc32c) csum_size4 csum0x0e9e060d [match] bytenr67108864 flags0x1 ( WRITTEN ) magic_BHRfS_M [match] fsid16adc029-64c5-45f
Re: Mount stalls indefinitely after enabling quota groups.
Unfortunately that doesn't appear to be it, a forced restart and attempted to mount with skip_balance leads to the same thing. 20 minutes in btrfs-transactio had a large burst of reads then started spinning the CPU with the disk idle. Is this recoverable? I could leave it for a day or so if it may make progress, but if not I'd like to start on other options. On Fri, Aug 10, 2018 at 3:59 AM, Qu Wenruo wrote: > > > On 8/10/18 3:40 PM, Dan Merillat wrote: >> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) >> >> I was testing something out and enabled quota groups and started getting >> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. > > Looks pretty like a running balance and quota. > > Would you please try with balance disabled (temporarily) with > skip_balance mount option to see if it works. > > If it works, then either try resume balance, or just cancel the balance. > > Nowadays balance is not needed routinely, especially when you still have > unallocated space and enabled quota. > > Thanks, > Qu > >> >> Post-reboot the mount process spinds at 100% CPU, occasinally yielding >> to a btrfs-transaction thread at 100% CPU. The switchover is marked >> by a burst of disk activity in btrace. >> >> Btrace shows all disk activity is returning promptly - no hanging submits. >> >> Currently the mount is at 6+ hours. >> >> Suggestions on how to go about debugging this? >> >
Re: Mount stalls indefinitely after enabling quota groups.
[23084.426006] sysrq: SysRq : Show Blocked State [23084.426085] taskPC stack pid father [23084.426332] mount D0 4857 4618 0x0080 [23084.426403] Call Trace: [23084.426531] ? __schedule+0x2c3/0x830 [23084.426628] ? __wake_up_common+0x6f/0x120 [23084.426751] schedule+0x2d/0x90 [23084.426871] wait_current_trans+0x98/0xc0 [23084.426953] ? wait_woken+0x80/0x80 [23084.427058] start_transaction+0x2e9/0x3e0 [23084.427128] btrfs_drop_snapshot+0x48c/0x860 [23084.427220] merge_reloc_roots+0xca/0x210 [23084.427277] btrfs_recover_relocation+0x290/0x420 [23084.427399] ? btrfs_cleanup_fs_roots+0x174/0x190 [23084.427533] open_ctree+0x2158/0x2549 [23084.427592] ? bdi_register_va.part.2+0x10a/0x1a0 [23084.427652] btrfs_mount_root+0x678/0x730 [23084.427709] ? pcpu_next_unpop+0x32/0x40 [23084.427797] ? pcpu_alloc+0x2f6/0x680 [23084.427884] ? mount_fs+0x30/0x150 [23084.427939] ? btrfs_decode_error+0x20/0x20 [23084.427996] mount_fs+0x30/0x150 [23084.428054] vfs_kern_mount.part.7+0x4f/0xf0 [23084.428111] btrfs_mount+0x156/0x8ad [23084.428167] ? pcpu_block_update_hint_alloc+0x15e/0x1d0 [23084.428226] ? pcpu_next_unpop+0x32/0x40 [23084.428282] ? pcpu_alloc+0x2f6/0x680 [23084.428338] ? mount_fs+0x30/0x150 [23084.428393] mount_fs+0x30/0x150 [23084.428450] vfs_kern_mount.part.7+0x4f/0xf0 [23084.428507] do_mount+0x5b0/0xc60 [23084.428563] ksys_mount+0x7b/0xd0 [23084.428618] __x64_sys_mount+0x1c/0x20 [23084.428676] do_syscall_64+0x55/0x110 [23084.428734] entry_SYSCALL_64_after_hwframe+0x49/0xbe [23084.428794] RIP: 0033:0x7efeb90daa1a [23084.428849] RSP: 002b:7ffcc8b8fee8 EFLAGS: 0206 ORIG_RAX: 00a5 [23084.428925] RAX: ffda RBX: 55d5bef05420 RCX: 7efeb90daa1a [23084.428987] RDX: 55d5bef05600 RSI: 55d5bef05ab0 RDI: 55d5bef05b70 [23084.429048] RBP: R08: 55d5bef08e40 R09: 003f [23084.429109] R10: c0ed R11: 0206 R12: 55d5bef05b70 [23084.429170] R13: 55d5bef05600 R14: R15: On Fri, Aug 10, 2018 at 3:40 AM, Dan Merillat wrote: > Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) > > I was testing something out and enabled quota groups and started getting > 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. > > Post-reboot the mount process spinds at 100% CPU, occasinally yielding > to a btrfs-transaction thread at 100% CPU. The switchover is marked > by a burst of disk activity in btrace. > > Btrace shows all disk activity is returning promptly - no hanging submits. > > Currently the mount is at 6+ hours. > > Suggestions on how to go about debugging this?
Mount stalls indefinitely after enabling quota groups.
Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid) I was testing something out and enabled quota groups and started getting 2-5 minute long pauses where a btrfs-transaction thread spun at 100%. Post-reboot the mount process spinds at 100% CPU, occasinally yielding to a btrfs-transaction thread at 100% CPU. The switchover is marked by a burst of disk activity in btrace. Btrace shows all disk activity is returning promptly - no hanging submits. Currently the mount is at 6+ hours. Suggestions on how to go about debugging this?
Re: mount time for big filesystems
On Fri, Sep 1, 2017 at 11:20 AM, Austin S. Hemmelgarnwrote: > No, that's not what I'm talking about. You always get one bcache device per > backing device, but multiple bcache devices can use the same physical cache > device (that is, backing devices map 1:1 to bcache devices, but cache > devices can map 1:N to bcache devices). So, in other words, the layout I'm > suggesting looks like this: > > This is actually simpler to manage for multiple reasons, and will avoid > wasting space on the cache device because of random choices made by BTRFS > when deciding where to read data. Be careful with bcache - if you lose the SSD and it has dirty data on it, your entire FS is gone. I ended up contributing a number of patches to the recovery tools digging my array out from that. Even if a single file is dirty, the new metadata tree will only exist on the cache device, which doesn't honor barriers writing back to the underlying storage. That means it's likely to have a root pointing at a metadata tree that's no longer there. The recovery method is finding an older root that has a complete tree, and recovery-walking the entire FS from that. I don't know if dm-cache honors write barriers from the cache to the backing storage, but I would still recommend using them both in write-through mode, not write-back. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-transaction spins forever on -next-20160818
I tried out -next to test the mm fixes, and immediately upon mounting my array (11TB, 98% full at the time) the btrfs-transaction thread for it spun at 100% CPU. It acted like read-only, write-discarding media - deleted files reappeared after a reboot every time. I'm not sure about writes, since it's running the crashplan backup target service - that resynchronizes, but I don't know enough about it to see if it complained about writes vanishing. I tried multiple reboots before going back to 4.6.7, where everything worked properly. 4.7 BTRFS works as well, but I was hitting the mm bug that OOMs improperly under high IO loads. The topology is 4x4GB drives in md-raid5, bcache'd with a 256gb SSD, btrfs on the bcache0 block device. (apologies for the quote, only way to convince Thunderbird to not mangle log files) > Aug 19 04:53:22 fileserver kernel: [ 605.152050] INFO: task kworker/u4:1:22 > blocked for more than 120 seconds. > Aug 19 04:53:22 fileserver kernel: [ 605.152097] Not tainted > 4.8.0-rc2-next-20160818 #15 > Aug 19 04:53:22 fileserver kernel: [ 605.152138] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Aug 19 04:53:22 fileserver kernel: [ 605.152175] kworker/u4:1D > 88022666bad8 022 2 0x > Aug 19 04:53:22 fileserver kernel: [ 605.152286] Workqueue: btrfs-submit > btrfs_submit_helper > Aug 19 04:53:22 fileserver kernel: [ 605.152357] 88022666bad8 > 81a65800 880179694040 88022100 > Aug 19 04:53:22 fileserver kernel: [ 605.152521] 88022666bae0 > 88022666c000 880220510ac0 88022666bb10 > Aug 19 04:53:22 fileserver kernel: [ 605.152688] 880220510ad8 > 88022051 88022666baf0 81888bfa > Aug 19 04:53:22 fileserver kernel: [ 605.152851] Call Trace: > Aug 19 04:53:22 fileserver kernel: [ 605.152892] [] > schedule+0x3a/0x90 > Aug 19 04:53:22 fileserver kernel: [ 605.152929] [] > rwsem_down_read_failed+0xe5/0x120 > Aug 19 04:53:22 fileserver kernel: [ 605.152970] [] > call_rwsem_down_read_failed+0x18/0x30 > Aug 19 04:53:22 fileserver kernel: [ 605.153006] [] > down_read+0x12/0x30 > Aug 19 04:53:22 fileserver kernel: [ 605.153047] [] > cached_dev_make_request+0x65e/0xd90 > Aug 19 04:53:22 fileserver kernel: [ 605.153083] [] > generic_make_request+0xdd/0x190 > Aug 19 04:53:22 fileserver kernel: [ 605.153122] [] > submit_bio+0x75/0x140 > Aug 19 04:53:22 fileserver kernel: [ 605.153159] [] ? > mempool_free+0x2d/0x90 > Aug 19 04:53:22 fileserver kernel: [ 605.153202] [] ? > preempt_count_sub+0x51/0x80 > Aug 19 04:53:22 fileserver kernel: [ 605.153240] [] > run_scheduled_bios+0x258/0x580 > Aug 19 04:53:22 fileserver kernel: [ 605.153283] [] ? > end_bio_extent_readpage+0x202/0x5b0 > Aug 19 04:53:22 fileserver kernel: [ 605.153322] [] > pending_bios_fn+0x10/0x20 > Aug 19 04:53:22 fileserver kernel: [ 605.153363] [] > btrfs_scrubparity_helper+0x77/0x340 > Aug 19 04:53:22 fileserver kernel: [ 605.153403] [] > btrfs_submit_helper+0x9/0x10 > Aug 19 04:53:22 fileserver kernel: [ 605.153446] [] > process_one_work+0x1e0/0x480 > Aug 19 04:53:22 fileserver kernel: [ 605.153484] [] > worker_thread+0x43/0x4e0 > Aug 19 04:53:22 fileserver kernel: [ 605.153526] [] ? > process_one_work+0x480/0x480 > Aug 19 04:53:22 fileserver kernel: [ 605.153564] [] > kthread+0xc4/0xe0 > Aug 19 04:53:22 fileserver kernel: [ 605.153607] [] > ret_from_fork+0x1f/0x40 > Aug 19 04:53:22 fileserver kernel: [ 605.153644] [] ? > kthread_worker_fn+0x110/0x110 > Aug 19 04:53:22 fileserver kernel: [ 605.153701] INFO: task > bcache_writebac:972 blocked for more than 120 seconds. > Aug 19 04:53:22 fileserver kernel: [ 605.153740] Not tainted > 4.8.0-rc2-next-20160818 #15 > Aug 19 04:53:22 fileserver kernel: [ 605.153780] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Aug 19 04:53:22 fileserver kernel: [ 605.153822] bcache_writebac D > 880220bc3cd8 0 972 2 0x > Aug 19 04:53:22 fileserver kernel: [ 605.153929] 880220bc3cd8 > 880220bc3cb0 81083200 8802248642c0 > Aug 19 04:53:22 fileserver kernel: [ 605.154100] 88022051 > 880220bc4000 880220510ac0 880220510ac0 > Aug 19 04:53:22 fileserver kernel: [ 605.154274] 880220510ad8 > 0001 880220bc3cf0 81888bfa > Aug 19 04:53:22 fileserver kernel: [ 605.154443] Call Trace: > Aug 19 04:53:22 fileserver kernel: [ 605.154484] [] ? > finish_task_switch+0x180/0x1d0 > Aug 19 04:53:22 fileserver kernel: [ 605.154522] [] > schedule+0x3a/0x90 > Aug 19 04:53:22 fileserver kernel: [ 605.154563] [] > rwsem_down_write_failed+0x109/0x280 > Aug 19 04:53:22 fileserver kernel: [ 605.154602] [] > call_rwsem_down_write_failed+0x17/0x30 > Aug 19 04:53:22 fileserver kernel: [ 605.154644] [] ? > schedule+0x44/0x90 > Aug 19 04:53:22 fileserver kernel: [ 605.154681] [] >
WARNING during btrfs send, kernel 4.1-rc1
Sand receive from the same machine, from a read-only mount to a freshly formatted fs. Aside from the warning everything appears to be working correctly, but since this is the latest btrfs code it needed reporting. I'll probably have another opportunity to test this again, since I'm blowing up my array repeatedly trying to track down a bcache bug. line number translates to: /* * This is done when we lookup the root, it should already be complete * by the time we get here. */ WARN_ON(send_root-orphan_cleanup_state != ORPHAN_CLEANUP_DONE); [ 267.379126] [ cut here ] [ 267.379202] WARNING: CPU: 1 PID: 4423 at fs/btrfs/send.c:5699 btrfs_ioctl_send+0x9d/0xe47() [ 267.379297] Modules linked in: binfmt_misc tun nbd rpcsec_gss_krb5 sit ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_conntrack xt_multiport iptable_filter xt_length xt_mark iptable_mangle iptable_raw ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables powernow_k8 pcspkr serio_raw k8temp i2c_piix4 i2c_core rtc_cmos wmi acpi_cpufreq netconsole it87 hwmon_vid ecryptfs ide_pci_generic firewire_ohci firewire_core sata_promise atiixp ide_core e100 pata_acpi ohci_pci sg ohci_hcd ehci_pci ehci_hcd [ 267.381752] CPU: 1 PID: 4423 Comm: btrfs Not tainted 4.1.0-rc1 #1 [ 267.381813] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F1 06/03/2008 [ 267.381880] 0009 8801f613fc18 8169d2be 8000 [ 267.382112] 8801f613fc58 81045577 8800bbd9a000 [ 267.382342] 81306f46 8800c51ee42c 7ffc4b080bc0 880224699100 [ 267.382566] Call Trace: [ 267.382630] [8169d2be] dump_stack+0x4f/0x7b [ 267.382686] [81045577] warn_slowpath_common+0x9c/0xb6 [ 267.382747] [81306f46] ? btrfs_ioctl_send+0x9d/0xe47 [ 267.382809] [81045625] warn_slowpath_null+0x15/0x17 [ 267.382869] [81306f46] btrfs_ioctl_send+0x9d/0xe47 [ 267.382931] [816a3c10] ? _raw_spin_unlock_irq+0x17/0x29 [ 267.382994] [816a0d66] ? __schedule+0x6df/0x90e [ 267.383057] [812d7691] btrfs_ioctl+0x18a/0x2436 [ 267.383119] [81068ccd] ? sched_move_task+0x185/0x194 [ 267.383183] [8137d28c] ? find_next_bit+0x15/0x1b [ 267.383244] [8106b1ba] ? __enqueue_entity+0x67/0x69 [ 267.383306] [8106d4ab] ? enqueue_task_fair+0xc00/0xcda [ 267.383367] [810697cc] ? sched_clock_cpu+0x67/0xbc [ 267.383431] [81322201] ? avc_has_perm+0x96/0xf8 [ 267.383495] [81153d51] do_vfs_ioctl+0x372/0x420 [ 267.383558] [8115bc80] ? __fget+0x6b/0x76 [ 267.383619] [81153e54] SyS_ioctl+0x55/0x7a [ 267.383680] [816a419b] system_call_fastpath+0x16/0x6e [ 267.383741] ---[ end trace 01d1110aa9307411 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs-progs: separate the overwrite check.
It's a good question. If path_name is absolute, the file descriptor is ignored. I used -1 (EBADF) instead of AT_FDCWD there so if a non-absolute path gets in there it errors out instead of attempting to use a relative path off the current directory. I'm not entirely sure if it's the best way, so if anyone else has ideas let me know. On Fri, Apr 24, 2015 at 11:24 AM, David Sterba dste...@suse.cz wrote: On Thu, Apr 23, 2015 at 12:51:33PM -0400, Dan Merillat wrote: +/* returns: + * 0 if the file exists and should be skipped. + * 1 if the file does NOT exist + * 2 if the file exists but is OK to overwrite + */ + +static int overwrite_ok(const char * path) +{ + static int warn = 0; + struct stat st; + int ret; + + /* don't be fooled by symlinks */ + ret = fstatat(-1, path_name, st, AT_SYMLINK_NOFOLLOW); Is the filedescriptor -1 correct? Previously, stat was used that uses AT_FDCWD for the dirfd, which is -100. -1 could be intepreted as a bad filedescriptor (EBADF). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] btrfs-progs: restore symlinks
At this point I I'm done - after writing the symlink patch I restored everything of importance off my array to a scratch disk, wiped the array and am in the process of copying everything back. I'll keep an eye on this thread if changes need to be made to my patches, but hopefully I won't be needing btrfs restore for a few more years! On Fri, Apr 24, 2015 at 12:38 AM, Duncan 1i5t5.dun...@cox.net wrote: Dan Merillat posted on Thu, 23 Apr 2015 12:47:29 -0400 as excerpted: Hopefully this is sufficiently paranoid, tested with PATH_MAX length symlinks, existing files, insufficient permissions, dangling symlinks. I think I got the coding style correct this time, I'll fix and resend if not. Includes a trivial fix from my metadata patch, the documentation got lost in the merge. Thanks for all this. I've only had to use restore once and hopefully won't be using it again in the near future, but having it restore the metadata and symlinks as well would surely have made the experience easier. There's a lot of people going to benefit from these patches over time as btrfs gains usage and the inevitable breakage happens to some of those filesystems. =:^/ -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: separate the overwrite check.
Symlink restore needs this, but the cutpaste became too complicated. Simplify everything. Signed-off-by: Dan Merillat dan.meril...@gmail.com --- cmds-restore.c | 53 ++--- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git a/cmds-restore.c b/cmds-restore.c index e877548..8869f2a 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -781,6 +781,37 @@ out: return ret; } +/* returns: + * 0 if the file exists and should be skipped. + * 1 if the file does NOT exist + * 2 if the file exists but is OK to overwrite + */ + +static int overwrite_ok(const char * path) +{ + static int warn = 0; + struct stat st; + int ret; + + /* don't be fooled by symlinks */ + ret = fstatat(-1, path_name, st, AT_SYMLINK_NOFOLLOW); + + if (!ret) { + if (overwrite) + return 2; + + if (verbose || !warn) + printf(Skipping existing file + %s\n, path); + if (!warn) + printf(If you wish to overwrite use + the -o option to overwrite\n); + warn = 1; + return 0; + } + return 1; +} + static int search_dir(struct btrfs_root *root, struct btrfs_key *key, const char *output_rootdir, const char *in_dir, const regex_t *mreg) @@ -897,25 +928,9 @@ static int search_dir(struct btrfs_root *root, struct btrfs_key *key, * files, no symlinks or anything else. */ if (type == BTRFS_FT_REG_FILE) { - if (!overwrite) { - static int warn = 0; - struct stat st; - - ret = stat(path_name, st); - if (!ret) { - loops = 0; - if (verbose || !warn) - printf(Skipping existing file - %s\n, path_name); - if (warn) - goto next; - printf(If you wish to overwrite use - the -o option to overwrite\n); - warn = 1; - goto next; - } - ret = 0; - } + if (!overwrite_ok(path_name)) + goto next; + if (verbose) printf(Restoring %s\n, path_name); if (dry_run) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] btrfs-progs: restore symlinks
Hopefully this is sufficiently paranoid, tested with PATH_MAX length symlinks, existing files, insufficient permissions, dangling symlinks. I think I got the coding style correct this time, I'll fix and resend if not. Includes a trivial fix from my metadata patch, the documentation got lost in the merge. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: restore: document metadata restore.
This was lost in the cleanup of 71a559 Signed-off-by: Dan Merillat dan.meril...@gmail.com --- Documentation/btrfs-restore.asciidoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/btrfs-restore.asciidoc b/Documentation/btrfs-restore.asciidoc index 20fc366..89e0c87 100644 --- a/Documentation/btrfs-restore.asciidoc +++ b/Documentation/btrfs-restore.asciidoc @@ -29,6 +29,9 @@ get snapshots, btrfs restore skips snapshots in default. -x:: get extended attributes. +-m|--metadata:: +restore owner, mode and times. + -v:: verbose. -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: optionally restore symlinks.
Restore symlinks, optionally with owner/times. Signed-off-by: Dan Merillat dan.meril...@gmail.com --- Documentation/btrfs-restore.asciidoc | 3 + cmds-restore.c | 140 ++- 2 files changed, 140 insertions(+), 3 deletions(-) diff --git a/Documentation/btrfs-restore.asciidoc b/Documentation/btrfs-restore.asciidoc index 89e0c87..06a0498 100644 --- a/Documentation/btrfs-restore.asciidoc +++ b/Documentation/btrfs-restore.asciidoc @@ -32,6 +32,9 @@ get extended attributes. -m|--metadata:: restore owner, mode and times. +-S|--symlinks:: +restore symbolic links as well as normal files. + -v:: verbose. diff --git a/cmds-restore.c b/cmds-restore.c index 8869f2a..c7a3e96 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -45,9 +45,11 @@ static char fs_name[PATH_MAX]; static char path_name[PATH_MAX]; +static char symlink_target[PATH_MAX]; static int get_snaps = 0; static int verbose = 0; static int restore_metadata = 0; +static int restore_symlinks = 0; static int ignore_errors = 0; static int overwrite = 0; static int get_xattrs = 0; @@ -812,6 +814,125 @@ static int overwrite_ok(const char * path) return 1; } +static int copy_symlink(struct btrfs_root *root, struct btrfs_key *key, +const char *file) +{ + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_file_extent_item *extent_item; + struct btrfs_inode_item *inode_item; + u32 len; + int ret; + + ret = overwrite_ok(path_name); + if (ret == 0) + return 0; // skip this file. + + if (ret == 2) { // symlink() can't overwrite, so unlink first. + ret = unlink(path_name); + if (ret) { + fprintf(stderr, failed to unlink '%s' for overwrite\n, + path_name); + return ret; + } + } + + key-type = BTRFS_EXTENT_DATA_KEY; + key-offset = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + goto out; + + leaf = path-nodes[0]; + if (!leaf) { + fprintf(stderr, Error getting leaf for symlink '%s'\n, file); + ret = -1; + goto out; + } + + extent_item = btrfs_item_ptr(leaf, path-slots[0], + struct btrfs_file_extent_item); + + len = btrfs_file_extent_inline_item_len(leaf, + btrfs_item_nr(path-slots[0])); + if (len PATH_MAX) { + fprintf(stderr, Symlink '%s' target length %d is longer than PATH_MAX\n, + fs_name, len); + ret = -1; + goto out; + } + + u32 name_offset = (unsigned long) extent_item + + offsetof(struct btrfs_file_extent_item, disk_bytenr); + read_extent_buffer(leaf, symlink_target, name_offset, len); + + symlink_target[len] = 0; + + if (!dry_run) { + ret = symlink(symlink_target, path_name); + if (ret0) { + fprintf(stderr, Failed to restore symlink '%s': %s\n, + path_name, strerror(errno)); + goto out; + } + } + printf(SYMLINK: '%s' = '%s'\n, path_name, symlink_target); + + ret = 0; + if (!restore_metadata) + goto out; + + /* Symlink metadata operates differently than files/directories, +* so do our own work here. +*/ + + key-type = BTRFS_INODE_ITEM_KEY; + key-offset = 0; + + btrfs_release_path(path); + + ret = btrfs_lookup_inode(NULL, root, path, key, 0); + if (ret) { + fprintf(stderr, Failed to lookup inode for '%s'\n, file); + goto out; + } + + inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0], + struct btrfs_inode_item); + + fchownat(-1, file, btrfs_inode_uid(path-nodes[0], inode_item), + btrfs_inode_gid(path-nodes[0], inode_item), + AT_SYMLINK_NOFOLLOW); + if (ret) { + fprintf(stderr, Failed to change owner: %s\n, + strerror(errno)); + goto out; + } + + struct btrfs_timespec *bts; + struct timespec times[2]; + + bts = btrfs_inode_atime(inode_item); + times[0].tv_sec = btrfs_timespec_sec(path-nodes[0], bts); + times[0].tv_nsec = btrfs_timespec_nsec(path-nodes[0], bts); + + bts = btrfs_inode_mtime(inode_item); + times[1].tv_sec = btrfs_timespec_sec(path-nodes[0], bts); + times[1].tv_nsec = btrfs_timespec_nsec(path-nodes[0], bts); + + ret = utimensat(-1, file, times, AT_SYMLINK_NOFOLLOW
Re: [PATCH v2 1/1] btrfs-progs: optionally restore metadata
On Wed, Apr 22, 2015 at 12:53 PM, David Sterba dste...@suse.cz wrote: Applied, thanks. In future patches, please stick to the coding style used in progs ([1]), I've fixed spacing around =, comments and moved declarations before the statements. [1] https://www.kernel.org/doc/Documentation/CodingStyle I'll try to clean it up more next time around. @@ -1168,10 +1275,12 @@ int cmd_restore(int argc, char **argv) static const struct option long_options[] = { { path-regex, 1, NULL, 256}, { dry-run, 0, NULL, 'D'}, + { metadata, 0, NULL, 'm'}, + { debug-regex, 0, NULL, 257}, This was unused and I've removed it. That's cruft and I thought I removed it from the patch, sorry. Got your symlink code, I'll look at that today. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/1] btrfs-progs: optionally restore metadata
As long as the inode is intact, the file metadata can be restored. Directory data is restored at the end of search_dir. Errors are checked and returned, unless ignore_errors is requested. Signed-off-by: Dan Merillat dan.meril...@gmail.com --- Documentation/btrfs-restore.txt | 3 ++ cmds-restore.c | 114 +++- 2 files changed, 116 insertions(+), 1 deletion(-) diff --git a/Documentation/btrfs-restore.txt b/Documentation/btrfs-restore.txt index 20fc366..a4e4d37 100644 --- a/Documentation/btrfs-restore.txt +++ b/Documentation/btrfs-restore.txt @@ -29,6 +29,9 @@ get snapshots, btrfs restore skips snapshots in default. -x:: get extended attributes. +-m|--metadata:: +set owner, permissions, access time and modify time. + -v:: verbose. diff --git a/cmds-restore.c b/cmds-restore.c index d2fc951..e95018f 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -48,6 +48,7 @@ static char fs_name[4096]; static char path_name[4096]; static int get_snaps = 0; static int verbose = 0; +static int restore_metadata = 0; static int ignore_errors = 0; static int overwrite = 0; static int get_xattrs = 0; @@ -547,6 +548,57 @@ out: return ret; } +static int copy_metadata(struct btrfs_root *root, int fd, + struct btrfs_key *key) +{ + struct btrfs_path *path; + struct btrfs_inode_item *inode_item; + int ret; + + path = btrfs_alloc_path(); + if (!path) { + fprintf(stderr, Ran out of memory\n); + return -ENOMEM; + } + + ret = btrfs_lookup_inode(NULL, root, path, key, 0); + if (ret == 0) { + + inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0], + struct btrfs_inode_item); + + ret=fchown(fd, btrfs_inode_uid(path-nodes[0], inode_item), + btrfs_inode_gid(path-nodes[0], inode_item)); + if (ret) { + fprintf(stderr, Failed to change owner: %s\n, strerror(errno)); + goto out; + } + ret=fchmod(fd, btrfs_inode_mode(path-nodes[0], inode_item)); + if (ret) { + fprintf(stderr, Failed to change mode: %s\n, strerror(errno)); + goto out; + } + struct btrfs_timespec *bts; + struct timespec times[2]; + + bts = btrfs_inode_atime(inode_item); + times[0].tv_sec=btrfs_timespec_sec(path-nodes[0], bts); + times[0].tv_nsec=btrfs_timespec_nsec(path-nodes[0], bts); + + bts = btrfs_inode_mtime(inode_item); + times[1].tv_sec=btrfs_timespec_sec(path-nodes[0], bts); + times[1].tv_nsec=btrfs_timespec_nsec(path-nodes[0], bts); + + ret=futimens(fd, times); + if (ret) { + fprintf(stderr, Failed to set times: %s\n, strerror(errno)); + goto out; + } + } +out: + btrfs_release_path(path); + return ret; +} static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, const char *file) @@ -555,6 +607,7 @@ static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, struct btrfs_path *path; struct btrfs_file_extent_item *fi; struct btrfs_inode_item *inode_item; + struct btrfs_timespec *bts; struct btrfs_key found_key; int ret; int extent_type; @@ -567,12 +620,41 @@ static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, fprintf(stderr, Ran out of memory\n); return -ENOMEM; } + struct timespec times[2]; + int times_ok=0; ret = btrfs_lookup_inode(NULL, root, path, key, 0); if (ret == 0) { inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0], struct btrfs_inode_item); found_size = btrfs_inode_size(path-nodes[0], inode_item); + + if (restore_metadata) { + /* change the ownership and mode now, set times when +* copyout is finished */ + + ret=fchown(fd, btrfs_inode_uid(path-nodes[0], inode_item), + btrfs_inode_gid(path-nodes[0], inode_item)); + if (ret !ignore_errors) { + btrfs_release_path(path); + return ret; + } + + ret=fchmod(fd, btrfs_inode_mode(path-nodes[0], inode_item)); + if (ret !ignore_errors) { + btrfs_release_path(path); + return ret; + } + + bts = btrfs_inode_atime(inode_item
[PATCH v2 0/1] btrfs-progs: optionally restore metadata
Changes since v1: * Documented in the manpage * Added to usage() for btrfs restore * Made it an optional flag (-m/--restore-metadata) * Use endian-safe macros to access the on-disk data. * Restore the proper mtime instead of atime twice. * Restore owner and mode * Restore metadata for directories as well as plain files. * Since it's now explicitly requested, errors are fatal unless ignore_errors is requested. I tested this on the array I'm restoring, it looks sane to me. Thanks to Noah Massey for the patch review, and Duncan for the prompt to add owner/permissions to the patch. Symlinks and hardlinks are beyond the scope of these changes, I'll look into them if this looks good to everyone. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: have restore set atime/mtime
On Fri, Apr 17, 2015 at 7:54 AM, Noah Massey noah.mas...@gmail.com wrote: On Thu, Apr 16, 2015 at 7:33 PM, Dan Merillat dan.meril...@gmail.com wrote: The inode is already found, use the data and make restore friendlier. Signed-off-by: Dan Merillat dan.meril...@gmail.com --- cmds-restore.c | 12 1 file changed, 12 insertions(+) diff --git a/cmds-restore.c b/cmds-restore.c index d2fc951..95ac487 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -567,12 +567,22 @@ static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, fprintf(stderr, Ran out of memory\n); return -ENOMEM; } + struct timespec times[2]; + int times_ok=0; ret = btrfs_lookup_inode(NULL, root, path, key, 0); if (ret == 0) { inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0], struct btrfs_inode_item); found_size = btrfs_inode_size(path-nodes[0], inode_item); + struct btrfs_timespec bts; + read_eb_member(path-nodes[0], inode_item, struct btrfs_inode_item, atime, bts); + times[0].tv_sec=bts.sec; + times[0].tv_nsec=bts.nsec; + read_eb_member(path-nodes[0], inode_item, struct btrfs_inode_item, atime, bts); I think you mean 'mtime' here I absolutely do, whoops. This is probably a good time to mention how much I dislike the fake pointers being used everywhere, coupled with the partially-implemented macro magic to get fields out of them. Is there a good reason why btrfs_item_ptr isn't just a type-pun, with the understanding that you'll need to copy it to keep it? + if (times_ok) + futimens(fd, times); return value isn't checked here. What could we do with the error if it occurred? Restoring times is a nice bonus if it works, but if it gets lost while the data was restored successfully, that shouldn't be an error condition. I can add a comment to that effect to make it clearer why it's being ignored though, or perhaps something like a warn_once if the filesystem being restored to doesn't support changing the times. On the subject of errors - is it possible for read_eb_member to fail the way I'm using it? It's defined, but never used anywhere else, so I have nothing to compare it to. My feeling is that if btrfs_item_ptr works the data in the structure returned is going to be there, but I'm not sure. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: have restore set atime/mtime
The inode is already found, use the data and make restore friendlier. Signed-off-by: Dan Merillat dan.meril...@gmail.com --- cmds-restore.c | 12 1 file changed, 12 insertions(+) diff --git a/cmds-restore.c b/cmds-restore.c index d2fc951..95ac487 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -567,12 +567,22 @@ static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key, fprintf(stderr, Ran out of memory\n); return -ENOMEM; } + struct timespec times[2]; + int times_ok=0; ret = btrfs_lookup_inode(NULL, root, path, key, 0); if (ret == 0) { inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0], struct btrfs_inode_item); found_size = btrfs_inode_size(path-nodes[0], inode_item); + struct btrfs_timespec bts; + read_eb_member(path-nodes[0], inode_item, struct btrfs_inode_item, atime, bts); + times[0].tv_sec=bts.sec; + times[0].tv_nsec=bts.nsec; + read_eb_member(path-nodes[0], inode_item, struct btrfs_inode_item, atime, bts); + times[1].tv_sec=bts.sec; + times[1].tv_nsec=bts.nsec; + times_ok=1; } btrfs_release_path(path); @@ -680,6 +690,8 @@ set_size: if (ret) return ret; } + if (times_ok) + futimens(fd, times); return 0; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: have restore set atime/mtime
That's not a bad idea. In my case it was all owned by the same user (media storage) so the only thing of interest was the timestamps. I can whip up a patch to do that as well. On Thu, Apr 16, 2015 at 9:09 PM, Duncan 1i5t5.dun...@cox.net wrote: Dan Merillat posted on Thu, 16 Apr 2015 19:33:46 -0400 as excerpted: The inode is already found, use the data and make restore friendlier. Unless things have changed recently, restore doesn't even restore user/ group ownership, let alone permissions. IOW, atime/mtime are the least of the problem (particularly if people are running noatime as is recommended, unless you really need it for some reason). It simply creates the files it restores as the owner/group it is run as (normally root), using standard umask rules, I believe. So if you're going to have it start restoring metadata at all, might as well have it do ownership/perms too, if it can. Otherwise atime/mtime are hardly worth bothering with. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: have restore set atime/mtime
I think thunderbird ate that patch, sorry. I didn't make it conditional - there's really no reason to not restore the information. I was actually surprised that it didn't restore before this patch. If it looks good I'll resend without the word-wrapping. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering BTRFS from bcache failure.
On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0 --init-extent-tree enabling repair mode parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Couldn't open file system Annoyingly: # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Open ctree failed create failed (Success) So I can't even send an image for people to look at. CCing some more people on this one, while this filesystem isn't important I'd like to know that restore from backup isn't the only option for BTRFS corruption. All of the tools simply throw up their hands and bail when confronted with this filesystem, even btrfs-image. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering BTRFS from bcache failure.
It's a known bug with bcache and enabling discard, it was discarding sections containing data it wanted. After a reboot bcache refused to accept the cache data, and of course it was dirty because I'm frankly too stupid to breathe sometimes. So yes, it's a bcache issue, but that's unresolvable. I'm trying to rescue the btrfs data that it trashed. On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote: Hello, I had some luck in the past with btrfs restore using the -r option. I don't recall how I determined the roots... Maybe I tried random numbers? I was able to recover nearly all of my data from a bcache related crash from over a year ago. What kind of bcache failure did you see? I've been doing some testing recently and ran into 2 bcache failures. With both of these failures, I had a ' bad btree header at bucket' error message (which is entirely different from the crash I had over a year back). I'm currently trying a different SSD to see if that alleviates the issue. The error makes me think that it's a bcache specific issue that's unrelated to btrfs or possibly (in my case) an issue with the previous SSD. Did you encounter this same error? With my 2 most recent crashes, I didn't try to recover very hard (or even try 'btrfs recover; at all) as I've been taking daily backups. I did try btrfsck, and not only would it fail, it would segfault. -Cameron On 04/08/2015 11:07 AM, Dan Merillat wrote: Any ideas on where to start with this? I did flush the cache out to disk before I made changes to the bcache configuration, so there shouldn't be anything completely missing, just some bits of stale metadata. If I can get the tools to take the closest match and run with it it would probably recover nearly everything. At worst, is there a way to scan the metadata blocks and rebuild from found extent-trees? On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent
Re: Recovering BTRFS from bcache failure.
Sorry I pressed send before I finished my thoughts. btrfs restore gets nowhere with any options. btrfs-recover says the superblocks are fine, and chunk recover does nothing after a few hours of reading. Everything else bails out with the errors I listed above. On Wed, Apr 8, 2015 at 2:36 PM, Dan Merillat dan.meril...@gmail.com wrote: It's a known bug with bcache and enabling discard, it was discarding sections containing data it wanted. After a reboot bcache refused to accept the cache data, and of course it was dirty because I'm frankly too stupid to breathe sometimes. So yes, it's a bcache issue, but that's unresolvable. I'm trying to rescue the btrfs data that it trashed. On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote: Hello, I had some luck in the past with btrfs restore using the -r option. I don't recall how I determined the roots... Maybe I tried random numbers? I was able to recover nearly all of my data from a bcache related crash from over a year ago. What kind of bcache failure did you see? I've been doing some testing recently and ran into 2 bcache failures. With both of these failures, I had a ' bad btree header at bucket' error message (which is entirely different from the crash I had over a year back). I'm currently trying a different SSD to see if that alleviates the issue. The error makes me think that it's a bcache specific issue that's unrelated to btrfs or possibly (in my case) an issue with the previous SSD. Did you encounter this same error? With my 2 most recent crashes, I didn't try to recover very hard (or even try 'btrfs recover; at all) as I've been taking daily backups. I did try btrfsck, and not only would it fail, it would segfault. -Cameron On 04/08/2015 11:07 AM, Dan Merillat wrote: Any ideas on where to start with this? I did flush the cache out to disk before I made changes to the bcache configuration, so there shouldn't be anything completely missing, just some bits of stale metadata. If I can get the tools to take the closest match and run with it it would probably recover nearly everything. At worst, is there a way to scan the metadata blocks and rebuild from found extent-trees? On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring
Re: Recovering BTRFS from bcache failure.
Any ideas on where to start with this? I did flush the cache out to disk before I made changes to the bcache configuration, so there shouldn't be anything completely missing, just some bits of stale metadata. If I can get the tools to take the closest match and run with it it would probably recover nearly everything. At worst, is there a way to scan the metadata blocks and rebuild from found extent-trees? On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0 --init-extent-tree enabling repair mode parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Couldn't open file system Annoyingly: # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Open ctree failed create failed (Success) So I can't even send an image for people to look at. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovering BTRFS from bcache failure.
Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0 --init-extent-tree enabling repair mode parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Couldn't open file system Annoyingly: # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Open ctree failed create failed (Success) So I can't even send an image for people to look at. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
On Thu, Oct 30, 2014 at 3:50 AM, Koen Kooi k...@dominion.thruhere.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dan Merillat schreef op 30-10-14 04:17: It's specifically BTRFS related, I was able to reproduce it on a bare drive (no lvm, no md, no bcache). It's not bad RAM, I was able to reproduce it on multiple machines running either 3.17 or late RCs. I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so that's good. If anyone else can reproduce this it'll probably need to be sent to 3.17-stable. 3.17.2 has a lot of btrfs backports queued[1] already, could you see if the fix for your problem is already present? Sorry about all the top-posting, I dislike the way gmail makes it the default. Yes, the patches queued for 3.17.2 appear to have fixed it. I didn't have time to run a bisection to see where it broke between .16 and .17, though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: old files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than go grab torrent files with rtorrent. On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote: Hi, it seems that when using rtorrent to download into a btrfs system, it leads to the creation of files that fail to read properly. For instance, I get rtorrent to crash, but if I try to rsync the file he was writting into someplace else, rsync also fails with the message can't map file $file: Input/Output error (5). If I give it time, eventually the file gets into a good state and I can rsync it somewhere else (as long as rtorrent doesn't keep writting into it). This doesn't happen using ext4 on the same system. No btrfs errors, or any other errors, show up in any log. Scrubbing or balancing don't turn up any issues. I've tried using a subvolume mounted with nodatacow and/or flushoncommit, which didn't help. I'm not using quotas and at some point had a single snapshot that I deleted. The filesystem was originally created recently (on a 3.16.4+ kernel). Here's what the array looks like: Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total devices 4 FS bytes used 3.14TiB devid4 size 2.73TiB used 2.36TiB path /dev/sdd1 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid7 size 1.82TiB used 1.45TiB path /dev/sda1 Btrfs v3.17 Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 AuthenticAMD GNU/Linux I'm utterly puzzled and clueless at how to dig into this issue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
The following code reliably throws a SIGBUS in the memset, and cat testfile /dev/null returns an IO error. I've sometimes gotten as high as iteration 900 before a SIGBUS, so don't assume a single clear is OK. linux 3.17.0, SATA - MD(raid5) - bcache (ssd) - btrfs Working on eliminating more variables. #include fcntl.h #include unistd.h #include sys/mman.h #include stdint.h #include stdlib.h #include stdio.h #include string.h #define MB (1024ull * 1024) #define GB (1024ull * MB) #define TEST_SIZE (4096) int main() { int fd; srandom(1024); fd=open(testfile, O_RDWR|O_CREAT, 0600); posix_fallocate(fd, 0, TEST_SIZE * MB); uint8_t * map = 0; int i; for(i=0;i1000;i++) { size_t location=(random() % (TEST_SIZE-1)) * MB; map = (uint8_t *) mmap(map, MB, PROT_READ|PROT_WRITE, MAP_SHARED, fd, location); printf(%d: writing at %04zd mb\n, i, location); memset(map, 0x5a, 1 * MB); msync(map, 1*MB, MS_ASYNC); munmap(map, MB); } } On Wed, Oct 29, 2014 at 5:50 PM, Dan Merillat dan.meril...@gmail.com wrote: I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: old files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than go grab torrent files with rtorrent. On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote: Hi, it seems that when using rtorrent to download into a btrfs system, it leads to the creation of files that fail to read properly. For instance, I get rtorrent to crash, but if I try to rsync the file he was writting into someplace else, rsync also fails with the message can't map file $file: Input/Output error (5). If I give it time, eventually the file gets into a good state and I can rsync it somewhere else (as long as rtorrent doesn't keep writting into it). This doesn't happen using ext4 on the same system. No btrfs errors, or any other errors, show up in any log. Scrubbing or balancing don't turn up any issues. I've tried using a subvolume mounted with nodatacow and/or flushoncommit, which didn't help. I'm not using quotas and at some point had a single snapshot that I deleted. The filesystem was originally created recently (on a 3.16.4+ kernel). Here's what the array looks like: Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total devices 4 FS bytes used 3.14TiB devid4 size 2.73TiB used 2.36TiB path /dev/sdd1 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid7 size 1.82TiB used 1.45TiB path /dev/sda1 Btrfs v3.17 Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 AuthenticAMD GNU/Linux I'm utterly puzzled and clueless at how to dig into this issue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs raid1 array has issues with rtorrent usage pattern.
It's specifically BTRFS related, I was able to reproduce it on a bare drive (no lvm, no md, no bcache). It's not bad RAM, I was able to reproduce it on multiple machines running either 3.17 or late RCs. I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so that's good. If anyone else can reproduce this it'll probably need to be sent to 3.17-stable. On Wed, Oct 29, 2014 at 7:24 PM, Alec Blayne a...@tevsa.net wrote: Really nice to know it's already getting handled :) I'm already downgrading to 3.16.6 now that I know I won't have that issue. I was already planning to because of the read-only snapshots issue. Thank you and good luck debugging! On 29-10-2014 21:50, Dan Merillat wrote: I'm in the middle of debugging the exact same thing. 3.17.0 - rtorrent dies with SIGBUS. I've done some debugging, the sequence is something like this: open a new file fallocate() to the final size mmap() all (or a portion) of the file write to the region run SHA1 on that mmap'd region to validate the chink crash, eventually. Generally not at the same point. Reading that file (cat /dev/null) returns -EIO. Looking up the process maps, the SIGBUS appears to be happening in the middle of a mapped region of a pre-allocated file - I.E. it shouldn't be. I'm not completely ruling out a rtorrent bug but it appears sane to me. Weirder: old files, that have been around a while, work just fine for seeding. I've re-hashed my entire collection without an error. Seeing this on both inherit-COW and no-inherit-COW files, and the filesystem is not using compression. The interesting part is going back and attempting to read the files later they sometimes don't throw an IO error. Absolutely nothing in dmesg. Working on a testcase that triggers it reliably but no luck so far. I thought I had bad RAM but two people upgrading to 3.17 and seeing the same bug at around the same time can't be a coincidence. I rebooted to 3.17 on the 25th, the first new download was on the 28th and that failed. Working on a testcase for it that's more reproducable than go grab torrent files with rtorrent. On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote: Hi, it seems that when using rtorrent to download into a btrfs system, it leads to the creation of files that fail to read properly. For instance, I get rtorrent to crash, but if I try to rsync the file he was writting into someplace else, rsync also fails with the message can't map file $file: Input/Output error (5). If I give it time, eventually the file gets into a good state and I can rsync it somewhere else (as long as rtorrent doesn't keep writting into it). This doesn't happen using ext4 on the same system. No btrfs errors, or any other errors, show up in any log. Scrubbing or balancing don't turn up any issues. I've tried using a subvolume mounted with nodatacow and/or flushoncommit, which didn't help. I'm not using quotas and at some point had a single snapshot that I deleted. The filesystem was originally created recently (on a 3.16.4+ kernel). Here's what the array looks like: Label: 'data' uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811 Total devices 4 FS bytes used 3.14TiB devid4 size 2.73TiB used 2.36TiB path /dev/sdd1 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1 devid7 size 1.82TiB used 1.45TiB path /dev/sda1 Btrfs v3.17 Data, RAID1: total=3.34TiB, used=3.13TiB System, RAID1: total=32.00MiB, used=512.00KiB Metadata, RAID1: total=10.00GiB, used=7.31GiB GlobalReserve, single: total=512.00MiB, used=0.00B On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3 AuthenticAMD GNU/Linux I'm utterly puzzled and clueless at how to dig into this issue. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.16 Managed to ENOSPC with 80% used
On Wed, Sep 24, 2014 at 6:23 PM, Holger Hoffstätte holger.hoffstae...@googlemail.com wrote: Basically it's been data allocation happy, since I haven't deleted 53GB at any point. Unfortunately, none of the chunks are at 0% usage so a balance -dusage=0 finds nothing to drop. Also try -musage=0..10, just for fun. Tried a few of them. When it's completely wedged, balance with any usage above zero won't work, because it needs one allocatable group to move to. I'm not sure if it was needing a new data chunk to merge partials into, or if it thought it needed more metadata space to write out the changes. (Metadata was also only 75% used). Is this recoverable, or do I need to copy to another disk and back? Another neat trick that will free up space is to convert to single metadata: -mconvert=single -f (to force). A subsequent balance with -musage=0..10 will likely free up quite some space. Deleting files or dropping snapshots is difficult when it's wedged as well, a lot of disk activity (journal thrash?) and no persistent progress - a reboot brigs the deleted files back. I eventually managed to empty a single data chunk and after that it was a trivial recovery. That particular workload seems to cause the block allocator to go on a spending spree; you're not the first to see this. I could see normal-user usage patterns getting ignored, but this is the patterns of the people working on BTRFS. Maybe they need to remove their balance cronjobs for a while. :) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.16 Managed to ENOSPC with 80% used
Any idea how to recover? I can't cut-paste but it's Total devices 1 FS bytes used 176.22GiB size 233.59GiB used 233.59GiB Basically it's been data allocation happy, since I haven't deleted 53GB at any point. Unfortunately, none of the chunks are at 0% usage so a balance -dusage=0 finds nothing to drop. Attempting a balance with -dusage=25 instantly dies with ENOSPC, since 100% of space is allocated. Is this recoverable, or do I need to copy to another disk and back? This is a really unfortunate failure mode for BTRFS. Usually I catch it before I get exactly 100% used and can use a balance to get it back into shape. What causes it to keep allocating datablocks when it's got so much free space? The workload is pretty standard (for devs, at least): git and kernel builds, and git and android builds. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rapid memory exhaustion during normal operation
I'm trying to track this down - this started happening without changing the kernel in use, so probably a corrupted filesystem. The symptoms are that all memory is suddenly used by no apparent source. OOM killer is invoked on every task, still can't free up enough memory to continue. When it goes wrong, it's extremely rapid - system goes from stable to dead in less than 30 seconds. Tested 3.9.0, 3.12.0, 3.12.8. Limited testing on 3.13 shows I think the same problem but I need to double-check that it's not a different issue. Blows up the exact same way on a real kernel or in UML. All sorts of things can trigger it - defrag, random writes to files. Balance and scrub don't, readonly mount doesn't. I can reproduce this trivially, mount the filesystem read-write and perform some activity. It only takes a few minutes. The other btrfs filesystems on the same machine don't show similar problems. Unfortunately, the output of btrfs-image -c9 is 75gb, much more than I can reasonably share. I've got a reliable reproducer in UML using UML-COW to always start with the same base image, defrag a file with 33,000 extents and the system explodes within a minute. Here's the OOM report, the formatting is a bit off due to being delivered via netconsole. Swap was disabled on this run, but it makes no difference. I get insta-OOM issues out of the blue with very little memory swapped out. [ 1184.871419] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.879873] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.894932] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.898207] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.902116] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.902454] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.90] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.903588] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.904592] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.904839] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1192.113082] verify_parent_transid: 16 callbacks suppressed [ 1192.113166] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.113269] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.176637] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.178119] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.203369] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.203503] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.204112] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.205324] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.814465] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.817226] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1219.366168] ntpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [ 1219.366270] CPU: 1 PID: 5479 Comm: ntpd Not tainted 3.12.8-00848-g97f15f1 #2 [ 1219.366324] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F1 06/03/2008 [ 1219.366402] 8800c02339a8 815ccf3b 3f51a67e [ 1219.366632] 8800c557ae40 8800c0233a48 815c8551 0100 [ 1219.366861] 0001 8800c02339e8 815d4f46 000ef3e4 [ 1219.367086] Call Trace: [ 1219.367155] [815ccf3b] dump_stack+0x50/0x85 [ 1219.367262] [815c8551] dump_header.isra.14+0x6d/0x1b5 [ 1219.367322] [815d4f46] ? sub_preempt_count+0x33/0x46 [ 1219.367390] [815d1b9d] ? _raw_spin_unlock_irqrestore+0x2b/0x48 [ 1219.367448] [8132849a] ? ___ratelimit+0xda/0xf8 [ 1219.367514] [810cf773] oom_kill_process+0x70/0x303 [ 1219.367614] [81041930] ? has_capability_noaudit+0x12/0x16 [ 1219.367672] [810cfe91] out_of_memory+0x314/0x347 [ 1219.367734] [810d3ee3] __alloc_pages_nodemask+0x629/0x7c8 [ 1219.367798] [811052db] alloc_pages_current+0xb2/0xbb [ 1219.367852] [810cd36e] __page_cache_alloc+0xb/0xd [ 1219.367915] [810ceb9a] filemap_fault+0x249/0x362 [ 1219.367973] [810eb378] __do_fault+0xa7/0x418 [ 1219.368071] [815d1b9d] ? _raw_spin_unlock_irqrestore+0x2b/0x48 [ 1219.368130] [810606c4] ? get_parent_ip+0xe/0x3e [ 1219.368184] [810eed47] handle_mm_fault+0x2b4/0x907 [ 1219.368239] [815d1a93] ? _raw_spin_unlock_irq+0x17/0x32 [ 1219.368297] [815d4dc4]
filesystem stuck RO after losing a device
first off: this was just junk data, and is all readable in degraded mode anyway. Label: 'ROOT' uuid: cc80d150-af98-4af4-bc68-c8df352bda4f Total devices 2 FS bytes used 138.00GB devid1 size 232.79GB used 189.04GB path /dev/sdc2 devid3 size 232.89GB used 14.06GB path /dev/sdb The filesystem was created in 3.6 or so, and abandoned when I moved to a SSD as my main root. Playing around with it, I added a raw disk to it and did some IO but wanted that disk back. Due to the automatic upgrade to 'dup' when adding a second device, I couldn't do a btrfs dev delete so I ended up just unmounting it and reformatting /dev/sdb as a backup for my SSD. Given the 'dup' profile, I should be able to just blow away the stub of sdb and continue using sdc, but I can't figure out any way to get it to allow that. $ uname -a $ uname -a Linux wolf 3.8.0-rc5-dan #3 SMP PREEMPT Tue Jan 29 00:55:14 EST 2013 x86_64 GNU/Linux (I forgot to remove extraversion, git is clean 3.8-rc5) $ sudo mount -o degraded /dev/sdc2 /mnt/t2 mount: wrong fs type, bad option, bad superblock on /dev/sdc2, $ dmesg | tail [1648243.075565] device label ROOT devid 1 transid 15051 /dev/sdc2 [1648243.076531] btrfs: allowing degraded mounts [1648243.076539] btrfs: disk space caching is enabled [1648243.891735] Btrfs: too many missing devices, writeable mount is not allowed [1648243.898122] btrfs: open_ctree failed $ sudo mount -o degraded,ro /dev/sdc2 /mnt/t2 $ dmesg | tail [1648331.898660] device label ROOT devid 1 transid 15051 /dev/sdc2 [1648331.900371] btrfs: allowing degraded mounts [1648331.900380] btrfs: disk space caching is enabled $ sudo btrfs dev del missing /mnt/t2 ERROR: error removing the device 'missing' - Read-only file system $ sudo btrfs dev add /dev/loop0 /mnt/t2 ERROR: error adding the device '/dev/loop0' - Read-only file system $ sudo umount /mnt/t2 $ sudo ./btrfsck --repair /dev/sdc2 [sudo] password for harik: enabling repair mode ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sdb' - Device or resource busy Check tree block failed, want=211559927808, have=0 Check tree block failed, want=211559927808, have=0 Check tree block failed, want=211644346368, have=3611932269563901032 Check tree block failed, want=211644346368, have=3611932269563901032 Check tree block failed, want=211559563264, have=70368744177680 Check tree block failed, want=211559563264, have=70368744177680 Check tree block failed, want=211641229312, have=2308722807962755443 Check tree block failed, want=211641229312, have=2308722807962755443 Check tree block failed, want=211640909824, have=651398145056990559 Check tree block failed, want=211640909824, have=651398145056990559 Checking filesystem on /dev/sdc2 UUID: cc80d150-af98-4af4-bc68-c8df352bda4f checking extents Check tree block failed, want=212375867392, have=3431074926722403215 thousands of these Check tree block failed, want=211559571456, have=15880152022637367237 checking root refs btrfsck: extent-tree.c:2553: btrfs_reserve_extent: Assertion `!(ret)' failed. btrfsck is from btrfs-progs master, g7854c8b66 So I can't mount RW because I only have one active disk, I can't add a new one, and I can't remove the missing disk. This seems somewhat awkward, if using 2-disk BTRFS and a drive dies, how do you replace recover? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem stuck RO after losing a device
On Fri, Apr 5, 2013 at 7:43 PM, Dan Merillat dan.meril...@gmail.com wrote: first off: this was just junk data, and is all readable in degraded mode anyway. Label: 'ROOT' uuid: cc80d150-af98-4af4-bc68-c8df352bda4f Total devices 2 FS bytes used 138.00GB devid1 size 232.79GB used 189.04GB path /dev/sdc2 devid3 size 232.89GB used 14.06GB path /dev/sdb The filesystem was created in 3.6 or so, and abandoned when I moved to a SSD as my main root. Playing around with it, I added a raw disk to it and did some IO but wanted that disk back. Due to the automatic upgrade to 'dup' when adding a second device, I couldn't do a btrfs dev delete so I ended up just unmounting it and reformatting /dev/sdb as a backup for my SSD. Given the 'dup' profile, I should be able to just blow away the stub of sdb and continue using sdc, but I can't figure out any way to get it to allow that. $ btrfs fi df /mnt/t2 Data: total=173.01GB, used=136.76GB System, DUP: total=40.00MB, used=32.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=14.00GB, used=504.24MB Metadata, DUP: total=1.00GB, used=756.14MB Metadata: total=8.00MB, used=0.00 So the problem is I ended up with DUP profile instead of RAID1. Is there any way to force it to mount RW and update that? (or update offline?) That's a usability issue, actually - adding a disk to a single makes it so that you can't fail gracefully unless you know to run a balance -draid1 -mraid1. Which I know, after looking up why this failed. Unfortunately, I can't recover from this. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Segregate metadata to SSD?
Is it possible to weight the allocations of data/system/metadata so that data goes on large, slow drives while system/metadata goes on a fast SSD? I don't have exact numbers, but I'd guess a vast majority of seeks during operation are lookups of tiny bits of data, while data readswrites are done in much larger chunks. Obviously a database load would be a different balance, but for most systems it would seem to be a rather vast improvement. Data: total=5625880576k (5.24TB), used=5455806964k (5.08TB) System, DUP: total=32768k (32.00MB), used=724k (724.00KB) System: total=4096k (4.00MB), used=0k (0.00) Metadata, DUP: total=117291008k (111.86GB), used=13509540k (12.88GB) Out of my nearly 6tb setup I could trivially accelerate the whole thing with a 128mb SSD. On a side note, that's a nearly 10:1 metadata overusage and I've never had more than 3 snapshots at a given time - current, rollback1, rollback2 - I think it grew that large during a rebalance. Aside from that, I could get away with a tiny 64gb SSD. pretty_sizes was too granular to use in monitoring scripts, so: diff --git a/cmds-filesystem.c b/cmds-filesystem.c index b1457de..dc5fea6 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -145,8 +145,9 @@ static int cmd_df(int argc, char **argv) total_bytes = pretty_sizes(sargs-spaces[i].total_bytes); used_bytes = pretty_sizes(sargs-spaces[i].used_bytes); - printf(%s: total=%s, used=%s\n, description, total_bytes, - used_bytes); + printf(%s: total=%ldk (%s), used=%ldk (%s)\n, description, + sargs-spaces[i].total_bytes/1024, total_bytes, + sargs-spaces[i].used_bytes/1024, used_bytes); } free(sargs); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Btrfs pull request
On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason chris.ma...@oracle.com wrote: Hi everyone, This pull request is pretty beefy, it ended up merging a number of long running projects and cleanup queues. I've got btrfs patches in the new kernel.org btrfs repo. There are two different branches with the same changes. for-linus is against 3.1 and has also been tested against Linus' tree as of yesterday. [91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2 [91795.123538] btrfs: open_ctree failed FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that whenI tried to mount on 3.1 (x64) again. Format change in 3.2 or 32/64 bit compatibility issues? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Btrfs pull request
On Tue, Nov 8, 2011 at 3:17 PM, Chris Mason chris.ma...@oracle.com wrote: On Tue, Nov 08, 2011 at 01:27:28PM -0500, Chris Mason wrote: On Tue, Nov 08, 2011 at 12:55:40PM -0500, Dan Merillat wrote: On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason chris.ma...@oracle.com wrote: Hi everyone, This pull request is pretty beefy, it ended up merging a number of long running projects and cleanup queues. I've got btrfs patches in the new kernel.org btrfs repo. There are two different branches with the same changes. for-linus is against 3.1 and has also been tested against Linus' tree as of yesterday. [91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2 [91795.123538] btrfs: open_ctree failed FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that whenI tried to mount on 3.1 (x64) again. Format change in 3.2 or 32/64 bit compatibility issues? I'm trying to reproduce right now but I did many bounces between 3.2 and 3.1 code before releasing. I didn't try jumping between 32 and 64 bit. Are there any other messages in dmesg? Could you please see what btrfs-debug-tree says? Ok, so I spun the wheel going between 32 and 64 and 3.1 and 3.2. I'm not having trouble with basic tests. So, we'll have to dig in and see why the open is failing. btrfsck or btrfs-debug-tree will help. This is on a USB device, however I had used the filesystem quite a bit on the 64bit machine before moving it to the 32bit 3.2 box. It's still mountable on the 32bit box even when I get the open_ctree failed on 3.1 [140865.425067] device label ROOT devid 1 transid 3436 /dev/sdi2 [140865.426291] btrfs: open_ctree failed harik@fileserver:~/src/3.0/3.2-rc1$ sudo btrfsck /dev/sdi2 [sudo] password for harik: found 3105894400 bytes used err is 0 total csum bytes: 2916272 total tree bytes: 119631872 total fs tree bytes: 109928448 btree space waste bytes: 33045213 file data blocks allocated: 5391962112 referenced 2984988672 Btrfs Btrfs v0.19 http://dl.dropbox.com/u/1071112/btrfs-debug-tree.sdi2.bz2 Exact kernel that won't mount is linus 3.1 + Author: David Sterba dste...@suse.cz Date: Wed Aug 3 11:08:02 2011 -0700 btrfs: allow cross-subvolume file clone Author: Li Zefan l...@cn.fujitsu.com Date: Fri Sep 2 15:56:25 2011 +0800 Btrfs: fix defragmentation regression -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Btrfs: fix defragmentation regression
On Fri, Sep 2, 2011 at 4:42 AM, Christoph Hellwig h...@infradead.org wrote: On Fri, Sep 02, 2011 at 03:56:25PM +0800, Li Zefan wrote: There's an off-by-one bug: # create a file with lots of 4K file extents # btrfs fi defrag /mnt/file # sync # filefrag -v /mnt/file Filesystem type is: 9123683e File size of /mnt/file is 1228800 (300 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3372 64 1 64 3136 3435 1 2 65 3436 3136 64 3 129 3201 3499 1 4 130 3500 3201 64 5 194 3266 3563 1 6 195 3564 3266 64 7 259 3331 3627 1 8 260 3628 3331 40 eof After this patch: Can you please create an xfstests testcase for this? Did this fix get lost? I don't see it in git, and defragmenting a file still results in 10x as many fragments as it started with. (3.1-rc9) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fixing slow sync(2)
On Sat, Oct 8, 2011 at 11:35 AM, Josef Bacik jo...@redhat.com wrote: I think I fixed this, try my git tree git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git I wanted to Ack this as well - 3.1-rc4 was completely unusable when firefox was running (30+ second pauses to read directories, btrfs threads were constantly running, even the mouse was jerky due to the load) Built from your tree (fa5cf66) and everything works like it should again. No load, fast response to IO requests. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester
On Tue, Aug 30, 2011 at 11:29 PM, Dave Chinner da...@fromorbit.com wrote: On Tue, Aug 30, 2011 at 06:17:02PM -0700, Sunil Mushran wrote: Instead we should let the fs weigh the cost of providing accurate information with the possible gain in performance. Data: A range in a file that could contain something other than nulls. If in doubt, it is data. Hole: A range in a file that only contains nulls. And that's -exactly- the ambiguous, vague definition that has raised all these questions in the first place. I was in doubt about whether unwritten extents can be considered a hole, and by your definition that means it should be data. But Andreas seems to be in no doubt it should be considered a hole. That's fine, though. Different filesystems have different abilities to recognize a data hole - FAT can't do it at all. Perhaps the requirements would be better stated in reverse: If the filesystem knows that a read() will return nulls (for whatever reason based on it's internal knowledge), it can report a hole. If it can't guarantee that, it's data. It's an absolute requirement that SEEK_DATA never miss data. SEEK_HOLE working is a nicety that userspace would appreciate - remember that the consumer here is cp(1), using it to skip empty portions of files and create sparse destination files. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: processes stuck in llseek
Here it is. http://marc.info/?l=linux-btrfsm=131176036219732w=2 That was it, thanks. Confirmed fixed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone
On Tue, Aug 9, 2011 at 1:50 PM, David Sterba d...@jikos.cz wrote: On Thu, Aug 04, 2011 at 09:19:26AM +0800, Miao Xie wrote: the patch has been applied on top of current linus which contains patches from both pull requests (ed8f37370d83). I think it is because the caller didn't reserve enough space.Could you try to apply the following patch? It might fix this bug. [PATCH v2] Btrfs: reserve enough space for file clone http://marc.info/?l=linux-btrfsm=131192686626576w=2 Thanks! Yes, it does not crash anymore. Trees reflinked succesfully, md5sums verified. This isn't a cross-subvolume problem, I hit the same bug trying to reflink a pile of files within the same subvolume. I applied the above patch and retried and it worked correctly. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html