Hi, I have a home server on Linux Arch (kernel 3.11.2) that uses multi-device btrfs on root filesystem.
Until recently it worked completely fine. And yesterday I rebooted it and the machine did not wake up. I booted from a USB (kernel 3.10) and tried to mount the filesystem. Here is OOPs I see [ 41.676217] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 8 transid 164237 /dev/sda [ 41.684161] btrfs: disk space caching is enabled [ 67.266742] BTRFS error (device sdd3): block group 1141416919040 has wrong amount of free space [ 67.266796] BTRFS error (device sdd3): failed to load free space cache for block group 1141416919040 [ 68.126102] ------------[ cut here ]------------ [ 68.126138] kernel BUG at fs/btrfs/inode.c:873! [ 68.126164] invalid opcode: 0000 [#1] PREEMPT SMP [ 68.126203] Modules linked in: intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel cryptd iTCO_wdt iTCO_vendor_support ppdev microcode snd_hda_codec_hdmi psmouse snd_hda_codec_realtek serio_raw i2c_i801 snd_hda_intel pcspkr snd_hda_codec lpc_ich snd_hwdep parport_pc parport snd_pcm mperf snd_page_alloc snd_timer snd mei_me soundcore evdev mei processor nfs lockd sunrpc fscache ext4 crc16 mbcache jbd2 dm_snapshot dm_mod squashfs loop isofs btrfs raid6_pq libcrc32c zlib_deflate xor hid_generic usbhid hid usb_storage sd_mod i915 intel_agp intel_gtt ahci libahci crc32c_intel i2c_algo_bit xhci_hcd libata ehci_pci ehci_hcd scsi_mod atl1c drm_kms_helper usbcore usb_common drm i2c_core button video [ 68.126754] CPU: 1 PID: 386 Comm: mount Not tainted 3.10.10-1-ARCH #1 [ 68.126787] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H61M/U3S3, BIOS P2.20 07/30/2012 [ 68.126834] task: ffff880118869950 ti: ffff88011377e000 task.ti: ffff88011377e000 [ 68.126871] RIP: 0010:[<ffffffffa0471223>] [<ffffffffa0471223>] __cow_file_range+0x3e3/0x460 [btrfs] [ 68.126933] RSP: 0018:ffff88011377f328 EFLAGS: 00010206 [ 68.126961] RAX: 00000000000004d2 RBX: 0000000000000000 RCX: 0000000000001000 [ 68.126996] RDX: 00000000000004d2 RSI: ffff88001f438608 RDI: ffff880115eb3000 [ 68.127032] RBP: ffff88011377f3c8 R08: 0000000000000000 R09: 000000000003ffff [ 68.127068] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000000 [ 68.127103] R13: ffff880115f88630 R14: ffff88001f438608 R15: 0000000000000000 [ 68.127140] FS: 00007fac17768780(0000) GS:ffff88011f300000(0000) knlGS:0000000000000000 [ 68.127180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 68.127209] CR2: 00007f518d994000 CR3: 0000000117ab4000 CR4: 00000000000407e0 [ 68.127246] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 68.127281] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 68.127317] Stack: [ 68.127331] 00000109ffe26000 ffff880115f88c60 ffff88001f438428 000000000003ffff [ 68.127381] ffff88011700c010 ffffea0003231b40 ffff880115eb3000 f600000109ffd870 [ 68.127430] ffffffffa0482f29 ffff880118a31000 ffff880115f88638 ffff88001f438448 [ 68.127480] Call Trace: [ 68.127508] [<ffffffffa0482f29>] ? release_extent_buffer+0xa9/0xd0 [btrfs] [ 68.127553] [<ffffffffa048862f>] ? free_extent_buffer+0x4f/0xa0 [btrfs] [ 68.127598] [<ffffffffa04716d6>] run_delalloc_nocow+0x436/0xaf0 [btrfs] [ 68.127641] [<ffffffffa0472180>] run_delalloc_range+0x320/0x390 [btrfs] [ 68.127685] [<ffffffffa04854c1>] ? find_lock_delalloc_range.constprop.44+0x1d1/0x1f0 [btrfs] [ 68.127735] [<ffffffffa0487044>] __extent_writepage+0x354/0x7b0 [btrfs] [ 68.127772] [<ffffffff81122645>] ? find_get_pages_tag+0x105/0x180 [ 68.127813] [<ffffffffa0487722>] extent_write_cache_pages.isra.32.constprop.48+0x282/0x3e0 [btrfs] [ 68.127867] [<ffffffffa0487b7d>] extent_writepages+0x4d/0x70 [btrfs] [ 68.127909] [<ffffffffa046e080>] ? can_nocow_odirect+0x2f0/0x2f0 [btrfs] [ 68.127951] [<ffffffffa046cf28>] btrfs_writepages+0x28/0x30 [btrfs] [ 68.127985] [<ffffffff8112e28e>] do_writepages+0x1e/0x40 [ 68.128014] [<ffffffff81123669>] __filemap_fdatawrite_range+0x59/0x60 [ 68.128048] [<ffffffff81123733>] filemap_fdatawrite_range+0x13/0x20 [ 68.128090] [<ffffffffa0481c99>] btrfs_wait_ordered_range+0x49/0x110 [btrfs] [ 68.128135] [<ffffffffa04a64c0>] __btrfs_write_out_cache+0x6d0/0x8f0 [btrfs] [ 68.128180] [<ffffffffa04a774d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs] [ 68.128224] [<ffffffffa0459983>] btrfs_write_dirty_block_groups+0x533/0x620 [btrfs] [ 68.128271] [<ffffffffa04676e2>] commit_cowonly_roots+0x172/0x260 [btrfs] [ 68.128314] [<ffffffffa04695ad>] btrfs_commit_transaction+0x5bd/0xaf0 [btrfs] [ 68.128353] [<ffffffff8107b460>] ? wake_up_bit+0x30/0x30 [ 68.128391] [<ffffffffa04a4edd>] btrfs_recover_log_trees+0x3bd/0x490 [btrfs] [ 68.128434] [<ffffffffa04a3270>] ? replay_one_dir_item+0xf0/0xf0 [btrfs] [ 68.128477] [<ffffffffa0466689>] open_ctree+0x17b9/0x1e80 [btrfs] [ 68.128513] [<ffffffff813555d3>] ? proc_comm_connector+0x33/0x120 [ 68.128551] [<ffffffffa043f456>] btrfs_mount+0x636/0x830 [btrfs] [ 68.128584] [<ffffffff81141cd2>] ? pcpu_alloc+0x7d2/0x9e0 [ 68.128616] [<ffffffff8118f5d9>] mount_fs+0x39/0x1b0 [ 68.128643] [<ffffffff81141ef0>] ? __alloc_percpu+0x10/0x20 [ 68.128676] [<ffffffff811a8c27>] vfs_kern_mount+0x67/0x100 [ 68.128706] [<ffffffff811ab24e>] do_mount+0x23e/0xa20 [ 68.128737] [<ffffffff8113d73b>] ? strndup_user+0x4b/0xf0 [ 68.128766] [<ffffffff811abab3>] SyS_mount+0x83/0xc0 [ 68.128795] [<ffffffff814cfe1d>] system_call_fastpath+0x1a/0x1f [ 68.128826] Code: 8b 7d 90 4c 89 f6 e8 ad 9e 00 00 e9 dc fc ff ff 48 85 d2 74 40 80 be 30 fe ff ff 84 48 89 d0 74 34 48 83 f8 01 0f 84 87 fc ff ff <0f> 0b 48 8b 75 a8 48 8b 7d 90 41 89 c0 b9 9b 03 00 00 48 c7 c2 [ 68.129145] RIP [<ffffffffa0471223>] __cow_file_range+0x3e3/0x460 [btrfs] [ 68.129192] RSP <ffff88011377f328> [ 68.129230] ---[ end trace 7992880786c40076 ]--- Hm.... it looks like it crashed when it tries to restore logs. Ok, I ran 'btrfschk /dev/sda' and here is its output: [ 181.281546] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 8 transid 164237 /dev/sda [ 181.318148] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 3 transid 164237 /dev/sdb [ 181.408490] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 2 transid 164237 /dev/sdc1 [ 181.763300] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 9 transid 164237 /dev/sdd3 [ 181.782414] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 8 transid 164237 /dev/sda [ 181.784634] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 3 transid 164237 /dev/sdb [ 181.788715] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 2 transid 164237 /dev/sdc1 [ 181.803161] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid 9 transid 164237 /dev/sdd3 [ 337.445525] systemd-journald[160]: Failed to write entry, ignoring: Argument list too long Then I tried to mount the filesystem again and it stuck. I see several processes in UNINTERRUPTIBLE state: [ 828.034908] SysRq : Show Blocked State [ 828.036150] task PC stack pid father [ 828.037418] btrfs-transacti D ffff880118a311e8 0 407 2 0x00000000 [ 828.038709] ffff880115051dc8 0000000000000046 0000000000014340 ffff880115051fd8 [ 828.040030] ffff880115051fd8 0000000000014340 ffff88011886f620 ffff88011886f620 [ 828.041364] 000000000000082f ffff880115051d30 ffffffff8109d141 ffff880118868048 [ 828.042724] Call Trace: [ 828.044071] [<ffffffff8109d141>] ? cpuacct_charge+0x61/0x70 [ 828.045445] [<ffffffff8108dd88>] ? __enqueue_entity+0x78/0x80 [ 828.046829] [<ffffffff810920f6>] ? enqueue_entity+0x286/0xa20 [ 828.048227] [<ffffffff81065bbb>] ? lock_timer_base.isra.35+0x2b/0x50 [ 828.049648] [<ffffffff814c6f09>] schedule+0x29/0x70 [ 828.051076] [<ffffffffa0468695>] wait_current_trans.isra.14+0xa5/0xf0 [btrfs] [ 828.052515] [<ffffffff8107b460>] ? wake_up_bit+0x30/0x30 [ 828.053970] [<ffffffffa0469e18>] start_transaction+0x338/0x530 [btrfs] [ 828.055453] [<ffffffffa046a0c7>] btrfs_attach_transaction+0x17/0x20 [btrfs] [ 828.056943] [<ffffffffa0460ca1>] transaction_kthread+0x141/0x230 [btrfs] [ 828.058455] [<ffffffffa0460b60>] ? free_fs_root+0x90/0x90 [btrfs] [ 828.059989] [<ffffffff8107a670>] kthread+0xc0/0xd0 [ 828.061508] [<ffffffff8107a5b0>] ? kthread_create_on_node+0x120/0x120 [ 828.063036] [<ffffffff814cfd6c>] ret_from_fork+0x7c/0xb0 [ 828.064568] [<ffffffff8107a5b0>] ? kthread_create_on_node+0x120/0x120 [ 828.066109] mount D ffff880118a31878 0 427 343 0x00000000 [ 828.067680] ffff880117bdbc18 0000000000000086 0000000000014340 ffff880117bdbfd8 [ 828.069278] ffff880117bdbfd8 0000000000014340 ffff880118facbf0 ffff8801169cb800 [ 828.070869] ffff88011f214340 ffffffff81085aa9 ffff8801169cb800 ffff88011f214340 [ 828.072430] Call Trace: [ 828.073966] [<ffffffff81085aa9>] ? finish_task_switch+0x49/0xe0 [ 828.075524] [<ffffffff814c6996>] ? __schedule+0x3f6/0x940 [ 828.077093] [<ffffffff81071194>] ? wake_up_worker+0x24/0x30 [ 828.078671] [<ffffffff814c5a34>] ? __mutex_lock_slowpath+0x284/0x3b0 [ 828.080275] [<ffffffff814c6f09>] schedule+0x29/0x70 [ 828.081867] [<ffffffff814c7d95>] rwsem_down_write_failed+0xf5/0x1c3 [ 828.083473] [<ffffffffa043a000>] ? 0xffffffffa0439fff [ 828.085083] [<ffffffff81279e33>] call_rwsem_down_write_failed+0x13/0x20 [ 828.086717] [<ffffffff814c5ec4>] ? down_write+0x24/0x26 [ 828.088366] [<ffffffff8118dffe>] grab_super+0x2e/0xa0 [ 828.090021] [<ffffffff8118e6f0>] sget+0x320/0x580 [ 828.091677] [<ffffffffa043e130>] ? btrfs_parse_early_options+0x190/0x190 [btrfs] [ 828.093368] [<ffffffff814c57ae>] ? mutex_unlock+0xe/0x10 [ 828.095068] [<ffffffffa043f228>] btrfs_mount+0x408/0x830 [btrfs] [ 828.096786] [<ffffffff81141cd2>] ? pcpu_alloc+0x7d2/0x9e0 [ 828.098516] [<ffffffff8118f5d9>] mount_fs+0x39/0x1b0 [ 828.100267] [<ffffffff81141ef0>] ? __alloc_percpu+0x10/0x20 [ 828.102017] [<ffffffff811a8c27>] vfs_kern_mount+0x67/0x100 [ 828.103775] [<ffffffff811ab24e>] do_mount+0x23e/0xa20 [ 828.105547] [<ffffffff8113d73b>] ? strndup_user+0x4b/0xf0 [ 828.107312] [<ffffffff811abab3>] SyS_mount+0x83/0xc0 [ 828.109096] [<ffffffff814cfe1d>] system_call_fastpath+0x1a/0x1f The only thing that I did recently is defrag /var/log/journal files (journalctl is very slow because of btrfs COW). Something like this http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg24878.html How to fix this problem? And restore the data... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html