Hello everyone, Apologies if this is the wrong place, this is my first time posting here.
I just had a crash on one of my Btrfs filesystems, that happened to prevent it from mounting afterwards. I'd like to help preventing it from occuring again, or at least to write it there so that other people know about it. Before anybody asks: backups of important files are fine and working. Also of note is that this filesystem is mostly a small (3TB RAID1) archival filesystem that seldom sees a lot of heavy writes. A small summary of the events as I understand them from various logs and graphs : * At 00:00 Snapper kicked in and did its thing on the crashed filesystem. * At 00:00, another Btrfs filesystem started being replicated (send/receive) on the crashed filesystem. * Around the same time, approx. 00:00, a few automatic archival processes started writing heavily to the disk, which was normal and usually lasts for a few minutes. * At 00:01, when the transaction finally comitted (from another process) the entire kernel just locked right away with an unhandled page request. The crash is summarized in the first log file, along with a prior warning that could be related. When I noticed this later, I rebooted and tried to mount, which resulted in the second log file. Since the error appeared to be log-related and I didn't care (much) about it, I used `btrfs recover zero-log` on the filesystem. Once I did that, the filesystem was able to be mounted again and seemed to work fine... aside from writing at the root. Writing files anywhere else worked fine, but writing a file at the root caused the error in the second log file and put the filesystem in read-only mode. The third log file contains the output of btrfsck on that filesystem. Since there were few errors I decided to try `btrfs check --repair` : the filesystem seems to be working properly again. A scrub is in progress, I'll report back if anything is wrong. Here is some system info : # uname -a Linux miyu 4.9.6-1-ARCH #1 SMP PREEMPT Thu Jan 26 09:22:26 CET 2017 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.9 # btrfs fi show Label: 'root' uuid: e1a079fa-bccf-4864-ac1d-0690281867e3 Total devices 1 FS bytes used 24.88GiB devid 1 size 55.90GiB used 55.90GiB path /dev/sda Label: 'tank' uuid: 837bd543-818b-4361-a5ac-8e56a4279ff9 Total devices 2 FS bytes used 1.81TiB devid 1 size 2.73TiB used 1.85TiB path /dev/sdb devid 2 size 2.73TiB used 1.85TiB path /dev/sdc Now, I have a few questions : 1) As I understand it, the log is expendable in case of an error like this, and the goal of Btrfs (or any journaled filesystem) is that the rest of the filesystem is fine. This wasn't the case here, could this be a bug? 2) Do the various recovery commands (btrfs check, btrfs recover) need to be run for all the disks in the filesystem, or just one of them? 3) What do you recommend I do with this filesystem? Should I destroy it and recreate a new one then copy all data? Do not hesitate if you want more information, I can definitely provide this. Cheers, François-Xavier Thomas
------------[ cut here ]------------ WARNING: CPU: 3 PID: 6290 at fs/btrfs/transaction.c:320 record_root_in_trans+0x10c/0x120 [btrfs] Modules linked in: mcryptd sha256_ssse3 sha256_generic hmac drbg ansi_cprng seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel ctr gcm unix_diag sctp_diag sctp libcrc32c crc32c_generic dccp_d ip6t_rt hci_uart mei snd_soc_rt5645 btbcm shpchp snd_hda_core snd_soc_rt5640 snd_soc_rl6231 snd_soc_sst_match btqca parport_pc btintel snd_hwdep thermal fan nf_conntrack_ipv6 nf_def btrfs hid_logitech_hidpp xor hid_logitech_dj usbhid hid raid6_pq sd_mod serio_raw atkbd libps2 ahci libahci crc32c_intel libata xhci_pci xhci_hcd scsi_mod usbcore usb_common i8042 s CPU: 3 PID: 6290 Comm: mbsync Not tainted 4.9.6-1-ARCH #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014 ffffc900039efb90 ffffffff81305440 0000000000000000 0000000000000000 ffffc900039efbd0 ffffffff8107eb0b 0000014000000202 ffff880137a0a800 ffff880137c8ddc0 0000000000000000 ffff880137c8ddc0 0000000000000001 Call Trace: [<ffffffff81305440>] dump_stack+0x63/0x83 [<ffffffff8107eb0b>] __warn+0xcb/0xf0 [<ffffffff8107ec3d>] warn_slowpath_null+0x1d/0x20 [<ffffffffa040c6dc>] record_root_in_trans+0x10c/0x120 [btrfs] [<ffffffffa040d8c6>] btrfs_record_root_in_trans+0x46/0x70 [btrfs] [<ffffffffa040fafa>] start_transaction+0x9a/0x4a0 [btrfs] [<ffffffffa040ff18>] btrfs_start_transaction+0x18/0x20 [btrfs] [<ffffffffa042090a>] btrfs_create+0x4a/0x200 [btrfs] [<ffffffff8122aa9d>] ? d_splice_alias+0x12d/0x400 [<ffffffff81220fe1>] path_openat+0xa81/0x1170 [<ffffffff812228a1>] do_filp_open+0x91/0x100 [<ffffffff81230912>] ? __alloc_fd+0xb2/0x160 [<ffffffff81210a97>] do_sys_open+0x147/0x210 [<ffffffff81210b7e>] SyS_open+0x1e/0x20 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9 ---[ end trace 61d798956f211f19 ]--- About 1 minute later: ===================== BUG: unable to handle kernel paging request at 00000000002a61bd IP: [<ffffffff811b2f7f>] list_lru_del+0xdf/0x130 PGD 125002067 PUD 1215ab067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: mcryptd sha256_ssse3 sha256_generic hmac drbg ansi_cprng seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel ctr gcm unix_diag sctp_diag sctp libcrc32c crc32c_generic dccp_d ip6t_rt hci_uart mei snd_soc_rt5645 btbcm shpchp snd_hda_core snd_soc_rt5640 snd_soc_rl6231 snd_soc_sst_match btqca parport_pc btintel snd_hwdep thermal fan nf_conntrack_ipv6 nf_def btrfs hid_logitech_hidpp xor hid_logitech_dj usbhid hid raid6_pq sd_mod serio_raw atkbd libps2 ahci libahci crc32c_intel libata xhci_pci xhci_hcd scsi_mod usbcore usb_common i8042 s CPU: 0 PID: 6290 Comm: mbsync Tainted: G W 4.9.6-1-ARCH #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014 task: ffff8800a5bbce00 task.stack: ffffc900039ec000 RIP: 0010:[<ffffffff811b2f7f>] [<ffffffff811b2f7f>] list_lru_del+0xdf/0x130 RSP: 0018:ffffc900039ef9d0 EFLAGS: 00010046 RAX: ffff88013a04fe48 RBX: ffff88003711a008 RCX: 00000000002a61b5 RDX: 0000000000000000 RSI: ffff88003711a008 RDI: ffff88013a04fe40 RBP: ffffc900039ef9f0 R08: 0000000000000018 R09: ffff88003711a0f8 R10: 0000000000000040 R11: 0000000000000001 R12: ffff88013a04fe40 R13: ffff8800b711a008 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f4dc1d77b40(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000002a61bd CR3: 000000012172d000 CR4: 00000000001006f0 Stack: ffff880137f7e968 ffffea0004dc9380 0000000000000000 ffffc900039efaa0 ffffc900039efa38 ffffffff8117fc67 ffff880037119ff0 ffff88003711a0f8 0000000094edafa0 ffffea0004dc9380 ffff880137f7e980 0000000000000000 Call Trace: [<ffffffff8117fc67>] page_cache_tree_insert+0xb7/0xf0 [<ffffffff8117fd75>] __add_to_page_cache_locked+0xd5/0x260 [<ffffffff8117ff6e>] add_to_page_cache_lru+0x4e/0xe0 [<ffffffff81181779>] pagecache_get_page+0x129/0x290 [<ffffffffa04354a3>] alloc_extent_buffer+0x113/0x480 [btrfs] [<ffffffffa0406670>] read_tree_block+0x20/0x60 [btrfs] [<ffffffffa0479c53>] find_parent_nodes+0x553/0x14b0 [btrfs] [<ffffffffa047ac6e>] __btrfs_find_all_roots+0xbe/0x130 [btrfs] [<ffffffffa047ad55>] btrfs_find_all_roots+0x55/0x70 [btrfs] [<ffffffffa047ea78>] btrfs_qgroup_prepare_account_extents+0x58/0xa0 [btrfs] [<ffffffffa040f344>] btrfs_commit_transaction.part.12+0x3e4/0xa90 [btrfs] [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60 [<ffffffffa040fa2b>] btrfs_commit_transaction+0x3b/0x70 [btrfs] [<ffffffffa0428304>] btrfs_sync_file+0x304/0x3f0 [btrfs] [<ffffffff8124608b>] vfs_fsync_range+0x4b/0xb0 [<ffffffff8124614d>] do_fsync+0x3d/0x70 [<ffffffff81246403>] SyS_fdatasync+0x13/0x20 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9 Code: 4c 0f 45 ea 49 8b 45 38 48 85 c0 74 b5 48 63 90 e8 02 00 00 49 8d 44 24 08 85 d2 78 04 48 8b 04 d1 48 8b 0b 48 8b 53 08 4c 89 e7 <48> 89 51 08 48 89 0a 48 89 1b 48 89 5b 08 48 RIP [<ffffffff811b2f7f>] list_lru_del+0xdf/0x130
# btrfs check --progress /dev/sdc Checking filesystem on /dev/sdc UUID: 837bd543-818b-4361-a5ac-8e56a4279ff9 ref mismatch on [1628179906560 16384] extent item 0, found 1 Backref 1628179906560 parent 5 root 5 not found in extent tree backpointer mismatch on [1628179906560 16384] ref mismatch on [1677446758400 16384] extent item 1, found 0 Backref 1677446758400 root 5 not referenced back 0x2e5ed30 Incorrect global backref count on 1677446758400 found 1 wanted 0 backpointer mismatch on [1677446758400 16384] owner ref check failed [1677446758400 16384] checking free space cache [o] root 5 root dir 256 not found checking fs roots [.] found 1994179276935 bytes used err is 1 total csum bytes: 1941207864 total tree bytes: 5601869824 total fs tree bytes: 2855731200 total extent tree bytes: 598835200 btree space waste bytes: 942703215 file data blocks allocated: 10862958895104 referenced 2598971158528 btrfs check --progress /dev/sdc 632.31s user 154.81s system 36% cpu 35:48.47 total
------------[ cut here ]------------ WARNING: CPU: 0 PID: 354 at fs/btrfs/extent-tree.c:6952 __btrfs_free_extent.isra.33+0x7d6/0xca0 [btrfs] Modules linked in: arc4 iTCO_wdt iTCO_vendor_support iwlmvm intel_rapl snd_hda_codec_hdmi mac80211 intel_soc_dts_iosf intel_powerclamp coretemp ppdev iwlwifi kvm_intel snd_hda_codec_ rfkill_gpio snd_timer ipt_REJECT nf_reject_ipv4 snd rfkill crc16 tpm_tis evdev tpm_tis_core i2c_designware_platform xt_comment spi_pxa2xx_platform tpm mac_hid i2c_designware_core so syscopyarea sysfillrect sysimgblt fb_sys_fops drm CPU: 0 PID: 354 Comm: mount Not tainted 4.9.6-1-ARCH #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014 ffffc9000094b740 ffffffff81305440 0000000000000000 0000000000000000 ffffc9000094b780 ffffffff8107eb0b 00001b28a03dfb9c 0000017b1715c000 00000000fffffffe ffff880137d101e0 ffff880138b73d90 0000000000000000 Call Trace: [<ffffffff81305440>] dump_stack+0x63/0x83 [<ffffffff8107eb0b>] __warn+0xcb/0xf0 [<ffffffff8107ec3d>] warn_slowpath_null+0x1d/0x20 [<ffffffffa03e4e46>] __btrfs_free_extent.isra.33+0x7d6/0xca0 [btrfs] [<ffffffffa03df381>] ? block_group_cache_tree_search+0xc1/0xe0 [btrfs] [<ffffffffa03e4301>] ? update_block_group.isra.32+0xd1/0x440 [btrfs] [<ffffffffa0457c83>] ? btrfs_merge_delayed_refs+0x93/0x600 [btrfs] [<ffffffffa03e9724>] __btrfs_run_delayed_refs+0xa74/0x13a0 [btrfs] [<ffffffffa03ed195>] btrfs_run_delayed_refs+0xa5/0x2b0 [btrfs] [<ffffffffa0404a19>] btrfs_commit_transaction+0x29/0x70 [btrfs] [<ffffffffa044e68c>] btrfs_recover_log_trees+0x3dc/0x460 [btrfs] [<ffffffffa0448f90>] ? replay_one_dir_item+0x190/0x190 [btrfs] [<ffffffffa04014da>] open_ctree+0x239a/0x2490 [btrfs] [<ffffffffa03d369a>] btrfs_mount+0xdea/0xf50 [btrfs] [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20 [<ffffffff811a9260>] ? pcpu_next_unpop+0x40/0x50 [<ffffffff812163a8>] mount_fs+0x38/0x150 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110 [<ffffffffa03d2a67>] btrfs_mount+0x1b7/0xf50 [btrfs] [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20 [<ffffffff812163a8>] mount_fs+0x38/0x150 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110 [<ffffffff8123505d>] do_mount+0x1dd/0xc60 [<ffffffff8120e734>] ? __check_object_size+0x54/0x1d6 [<ffffffff81235e13>] SyS_mount+0x83/0xd0 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9 ---[ end trace 0fd58644b4d50cdb ]--- BTRFS info (device sdc): leaf 1627857436672 total ptrs 108 free space 7769 item 0 key (1628178333696 168 16384) itemoff 16232 itemsize 51 extent refs 1 gen 328741 flags 2 tree block key (18446744073709551606 128 2176232226816) level 0 tree block backref root 7 ...more items... BTRFS error (device sdc): unable to find ref byte nr 1628179906560 parent 0 root 5 owner 2 offset 0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 354 at fs/btrfs/extent-tree.c:6958 __btrfs_free_extent.isra.33+0x995/0xca0 [btrfs] BTRFS: Transaction aborted (error -2) Modules linked in: arc4 iTCO_wdt iTCO_vendor_support iwlmvm intel_rapl snd_hda_codec_hdmi mac80211 intel_soc_dts_iosf intel_powerclamp coretemp ppdev iwlwifi kvm_intel snd_hda_codec_ rfkill_gpio snd_timer ipt_REJECT nf_reject_ipv4 snd rfkill crc16 tpm_tis evdev tpm_tis_core i2c_designware_platform xt_comment spi_pxa2xx_platform tpm mac_hid i2c_designware_core so syscopyarea sysfillrect sysimgblt fb_sys_fops drm CPU: 0 PID: 354 Comm: mount Tainted: G W 4.9.6-1-ARCH #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014 ffffc9000094b6e8 ffffffff81305440 ffffc9000094b738 0000000000000000 ffffc9000094b728 ffffffff8107eb0b 00001b2e00000002 0000017b1715c000 00000000fffffffe ffff880137d101e0 ffff880138b73d90 0000000000000000 Call Trace: [<ffffffff81305440>] dump_stack+0x63/0x83 [<ffffffff8107eb0b>] __warn+0xcb/0xf0 [<ffffffff8107eb8f>] warn_slowpath_fmt+0x5f/0x80 [<ffffffffa03e5005>] __btrfs_free_extent.isra.33+0x995/0xca0 [btrfs] [<ffffffffa03df381>] ? block_group_cache_tree_search+0xc1/0xe0 [btrfs] [<ffffffffa03e4301>] ? update_block_group.isra.32+0xd1/0x440 [btrfs] [<ffffffffa0457c83>] ? btrfs_merge_delayed_refs+0x93/0x600 [btrfs] [<ffffffffa03e9724>] __btrfs_run_delayed_refs+0xa74/0x13a0 [btrfs] [<ffffffffa03ed195>] btrfs_run_delayed_refs+0xa5/0x2b0 [btrfs] [<ffffffffa0404a19>] btrfs_commit_transaction+0x29/0x70 [btrfs] [<ffffffffa044e68c>] btrfs_recover_log_trees+0x3dc/0x460 [btrfs] [<ffffffffa0448f90>] ? replay_one_dir_item+0x190/0x190 [btrfs] [<ffffffffa04014da>] open_ctree+0x239a/0x2490 [btrfs] [<ffffffffa03d369a>] btrfs_mount+0xdea/0xf50 [btrfs] [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20 [<ffffffff811a9260>] ? pcpu_next_unpop+0x40/0x50 [<ffffffff812163a8>] mount_fs+0x38/0x150 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110 [<ffffffffa03d2a67>] btrfs_mount+0x1b7/0xf50 [btrfs] [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20 [<ffffffff812163a8>] mount_fs+0x38/0x150 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110 [<ffffffff8123505d>] do_mount+0x1dd/0xc60 [<ffffffff8120e734>] ? __check_object_size+0x54/0x1d6 [<ffffffff81235e13>] SyS_mount+0x83/0xd0 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9 ---[ end trace 0fd58644b4d50cdc ]--- BTRFS: error (device sdc) in __btrfs_free_extent:6958: errno=-2 No such entry BTRFS: error (device sdc) in btrfs_run_delayed_refs:2967: errno=-2 No such entry BTRFS: error (device sdc) in btrfs_replay_log:2506: errno=-2 No such entry (Failed to recover log tree) BTRFS error (device sdc): cleaner transaction attach returned -30 BTRFS error (device sdc): open_ctree failed