Hi,

I have a home server on Linux Arch (kernel 3.11.2) that uses
multi-device btrfs on root filesystem.

Until recently it worked completely fine. And yesterday I rebooted it
and the machine did not wake up.

I booted from a USB (kernel 3.10) and tried to mount the filesystem.
Here is OOPs I see

[   41.676217] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
8 transid 164237 /dev/sda
[   41.684161] btrfs: disk space caching is enabled
[   67.266742] BTRFS error (device sdd3): block group 1141416919040
has wrong amount of free space
[   67.266796] BTRFS error (device sdd3): failed to load free space
cache for block group 1141416919040
[   68.126102] ------------[ cut here ]------------
[   68.126138] kernel BUG at fs/btrfs/inode.c:873!
[   68.126164] invalid opcode: 0000 [#1] PREEMPT SMP
[   68.126203] Modules linked in: intel_powerclamp coretemp kvm_intel
kvm crc32_pclmul ghash_clmulni_intel cryptd iTCO_wdt
iTCO_vendor_support ppdev microcode snd_hda_codec_hdmi psmouse
snd_hda_codec_realtek serio_raw i2c_i801 snd_hda_intel pcspkr
snd_hda_codec lpc_ich snd_hwdep parport_pc parport snd_pcm mperf
snd_page_alloc snd_timer snd mei_me soundcore evdev mei processor nfs
lockd sunrpc fscache ext4 crc16 mbcache jbd2 dm_snapshot dm_mod
squashfs loop isofs btrfs raid6_pq libcrc32c zlib_deflate xor
hid_generic usbhid hid usb_storage sd_mod i915 intel_agp intel_gtt
ahci libahci crc32c_intel i2c_algo_bit xhci_hcd libata ehci_pci
ehci_hcd scsi_mod atl1c drm_kms_helper usbcore usb_common drm i2c_core
button video
[   68.126754] CPU: 1 PID: 386 Comm: mount Not tainted 3.10.10-1-ARCH #1
[   68.126787] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./H61M/U3S3, BIOS P2.20 07/30/2012
[   68.126834] task: ffff880118869950 ti: ffff88011377e000 task.ti:
ffff88011377e000
[   68.126871] RIP: 0010:[<ffffffffa0471223>]  [<ffffffffa0471223>]
__cow_file_range+0x3e3/0x460 [btrfs]
[   68.126933] RSP: 0018:ffff88011377f328  EFLAGS: 00010206
[   68.126961] RAX: 00000000000004d2 RBX: 0000000000000000 RCX: 0000000000001000
[   68.126996] RDX: 00000000000004d2 RSI: ffff88001f438608 RDI: ffff880115eb3000
[   68.127032] RBP: ffff88011377f3c8 R08: 0000000000000000 R09: 000000000003ffff
[   68.127068] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000000
[   68.127103] R13: ffff880115f88630 R14: ffff88001f438608 R15: 0000000000000000
[   68.127140] FS:  00007fac17768780(0000) GS:ffff88011f300000(0000)
knlGS:0000000000000000
[   68.127180] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   68.127209] CR2: 00007f518d994000 CR3: 0000000117ab4000 CR4: 00000000000407e0
[   68.127246] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   68.127281] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   68.127317] Stack:
[   68.127331]  00000109ffe26000 ffff880115f88c60 ffff88001f438428
000000000003ffff
[   68.127381]  ffff88011700c010 ffffea0003231b40 ffff880115eb3000
f600000109ffd870
[   68.127430]  ffffffffa0482f29 ffff880118a31000 ffff880115f88638
ffff88001f438448
[   68.127480] Call Trace:
[   68.127508]  [<ffffffffa0482f29>] ? release_extent_buffer+0xa9/0xd0 [btrfs]
[   68.127553]  [<ffffffffa048862f>] ? free_extent_buffer+0x4f/0xa0 [btrfs]
[   68.127598]  [<ffffffffa04716d6>] run_delalloc_nocow+0x436/0xaf0 [btrfs]
[   68.127641]  [<ffffffffa0472180>] run_delalloc_range+0x320/0x390 [btrfs]
[   68.127685]  [<ffffffffa04854c1>] ?
find_lock_delalloc_range.constprop.44+0x1d1/0x1f0 [btrfs]
[   68.127735]  [<ffffffffa0487044>] __extent_writepage+0x354/0x7b0 [btrfs]
[   68.127772]  [<ffffffff81122645>] ? find_get_pages_tag+0x105/0x180
[   68.127813]  [<ffffffffa0487722>]
extent_write_cache_pages.isra.32.constprop.48+0x282/0x3e0 [btrfs]
[   68.127867]  [<ffffffffa0487b7d>] extent_writepages+0x4d/0x70 [btrfs]
[   68.127909]  [<ffffffffa046e080>] ? can_nocow_odirect+0x2f0/0x2f0 [btrfs]
[   68.127951]  [<ffffffffa046cf28>] btrfs_writepages+0x28/0x30 [btrfs]
[   68.127985]  [<ffffffff8112e28e>] do_writepages+0x1e/0x40
[   68.128014]  [<ffffffff81123669>] __filemap_fdatawrite_range+0x59/0x60
[   68.128048]  [<ffffffff81123733>] filemap_fdatawrite_range+0x13/0x20
[   68.128090]  [<ffffffffa0481c99>] btrfs_wait_ordered_range+0x49/0x110 [btrfs]
[   68.128135]  [<ffffffffa04a64c0>] __btrfs_write_out_cache+0x6d0/0x8f0 [btrfs]
[   68.128180]  [<ffffffffa04a774d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs]
[   68.128224]  [<ffffffffa0459983>]
btrfs_write_dirty_block_groups+0x533/0x620 [btrfs]
[   68.128271]  [<ffffffffa04676e2>] commit_cowonly_roots+0x172/0x260 [btrfs]
[   68.128314]  [<ffffffffa04695ad>]
btrfs_commit_transaction+0x5bd/0xaf0 [btrfs]
[   68.128353]  [<ffffffff8107b460>] ? wake_up_bit+0x30/0x30
[   68.128391]  [<ffffffffa04a4edd>] btrfs_recover_log_trees+0x3bd/0x490 [btrfs]
[   68.128434]  [<ffffffffa04a3270>] ? replay_one_dir_item+0xf0/0xf0 [btrfs]
[   68.128477]  [<ffffffffa0466689>] open_ctree+0x17b9/0x1e80 [btrfs]
[   68.128513]  [<ffffffff813555d3>] ? proc_comm_connector+0x33/0x120
[   68.128551]  [<ffffffffa043f456>] btrfs_mount+0x636/0x830 [btrfs]
[   68.128584]  [<ffffffff81141cd2>] ? pcpu_alloc+0x7d2/0x9e0
[   68.128616]  [<ffffffff8118f5d9>] mount_fs+0x39/0x1b0
[   68.128643]  [<ffffffff81141ef0>] ? __alloc_percpu+0x10/0x20
[   68.128676]  [<ffffffff811a8c27>] vfs_kern_mount+0x67/0x100
[   68.128706]  [<ffffffff811ab24e>] do_mount+0x23e/0xa20
[   68.128737]  [<ffffffff8113d73b>] ? strndup_user+0x4b/0xf0
[   68.128766]  [<ffffffff811abab3>] SyS_mount+0x83/0xc0
[   68.128795]  [<ffffffff814cfe1d>] system_call_fastpath+0x1a/0x1f
[   68.128826] Code: 8b 7d 90 4c 89 f6 e8 ad 9e 00 00 e9 dc fc ff ff
48 85 d2 74 40 80 be 30 fe ff ff 84 48 89 d0 74 34 48 83 f8 01 0f 84
87 fc ff ff <0f> 0b 48 8b 75 a8 48 8b 7d 90 41 89 c0 b9 9b 03 00 00 48
c7 c2
[   68.129145] RIP  [<ffffffffa0471223>] __cow_file_range+0x3e3/0x460 [btrfs]
[   68.129192]  RSP <ffff88011377f328>
[   68.129230] ---[ end trace 7992880786c40076 ]---



Hm.... it looks like it crashed when it tries to restore logs. Ok, I
ran 'btrfschk /dev/sda' and here is its output:

[  181.281546] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
8 transid 164237 /dev/sda
[  181.318148] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
3 transid 164237 /dev/sdb
[  181.408490] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
2 transid 164237 /dev/sdc1
[  181.763300] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
9 transid 164237 /dev/sdd3
[  181.782414] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
8 transid 164237 /dev/sda
[  181.784634] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
3 transid 164237 /dev/sdb
[  181.788715] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
2 transid 164237 /dev/sdc1
[  181.803161] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
9 transid 164237 /dev/sdd3
[  337.445525] systemd-journald[160]: Failed to write entry, ignoring:
Argument list too long


Then I tried to mount the filesystem again and it stuck. I see several
processes in UNINTERRUPTIBLE state:

[  828.034908] SysRq : Show Blocked State
[  828.036150]   task                        PC stack   pid father
[  828.037418] btrfs-transacti D ffff880118a311e8     0   407      2 0x00000000
[  828.038709]  ffff880115051dc8 0000000000000046 0000000000014340
ffff880115051fd8
[  828.040030]  ffff880115051fd8 0000000000014340 ffff88011886f620
ffff88011886f620
[  828.041364]  000000000000082f ffff880115051d30 ffffffff8109d141
ffff880118868048
[  828.042724] Call Trace:
[  828.044071]  [<ffffffff8109d141>] ? cpuacct_charge+0x61/0x70
[  828.045445]  [<ffffffff8108dd88>] ? __enqueue_entity+0x78/0x80
[  828.046829]  [<ffffffff810920f6>] ? enqueue_entity+0x286/0xa20
[  828.048227]  [<ffffffff81065bbb>] ? lock_timer_base.isra.35+0x2b/0x50
[  828.049648]  [<ffffffff814c6f09>] schedule+0x29/0x70
[  828.051076]  [<ffffffffa0468695>]
wait_current_trans.isra.14+0xa5/0xf0 [btrfs]
[  828.052515]  [<ffffffff8107b460>] ? wake_up_bit+0x30/0x30
[  828.053970]  [<ffffffffa0469e18>] start_transaction+0x338/0x530 [btrfs]
[  828.055453]  [<ffffffffa046a0c7>] btrfs_attach_transaction+0x17/0x20 [btrfs]
[  828.056943]  [<ffffffffa0460ca1>] transaction_kthread+0x141/0x230 [btrfs]
[  828.058455]  [<ffffffffa0460b60>] ? free_fs_root+0x90/0x90 [btrfs]
[  828.059989]  [<ffffffff8107a670>] kthread+0xc0/0xd0
[  828.061508]  [<ffffffff8107a5b0>] ? kthread_create_on_node+0x120/0x120
[  828.063036]  [<ffffffff814cfd6c>] ret_from_fork+0x7c/0xb0
[  828.064568]  [<ffffffff8107a5b0>] ? kthread_create_on_node+0x120/0x120
[  828.066109] mount           D ffff880118a31878     0   427    343 0x00000000
[  828.067680]  ffff880117bdbc18 0000000000000086 0000000000014340
ffff880117bdbfd8
[  828.069278]  ffff880117bdbfd8 0000000000014340 ffff880118facbf0
ffff8801169cb800
[  828.070869]  ffff88011f214340 ffffffff81085aa9 ffff8801169cb800
ffff88011f214340
[  828.072430] Call Trace:
[  828.073966]  [<ffffffff81085aa9>] ? finish_task_switch+0x49/0xe0
[  828.075524]  [<ffffffff814c6996>] ? __schedule+0x3f6/0x940
[  828.077093]  [<ffffffff81071194>] ? wake_up_worker+0x24/0x30
[  828.078671]  [<ffffffff814c5a34>] ? __mutex_lock_slowpath+0x284/0x3b0
[  828.080275]  [<ffffffff814c6f09>] schedule+0x29/0x70
[  828.081867]  [<ffffffff814c7d95>] rwsem_down_write_failed+0xf5/0x1c3
[  828.083473]  [<ffffffffa043a000>] ? 0xffffffffa0439fff
[  828.085083]  [<ffffffff81279e33>] call_rwsem_down_write_failed+0x13/0x20
[  828.086717]  [<ffffffff814c5ec4>] ? down_write+0x24/0x26
[  828.088366]  [<ffffffff8118dffe>] grab_super+0x2e/0xa0
[  828.090021]  [<ffffffff8118e6f0>] sget+0x320/0x580
[  828.091677]  [<ffffffffa043e130>] ?
btrfs_parse_early_options+0x190/0x190 [btrfs]
[  828.093368]  [<ffffffff814c57ae>] ? mutex_unlock+0xe/0x10
[  828.095068]  [<ffffffffa043f228>] btrfs_mount+0x408/0x830 [btrfs]
[  828.096786]  [<ffffffff81141cd2>] ? pcpu_alloc+0x7d2/0x9e0
[  828.098516]  [<ffffffff8118f5d9>] mount_fs+0x39/0x1b0
[  828.100267]  [<ffffffff81141ef0>] ? __alloc_percpu+0x10/0x20
[  828.102017]  [<ffffffff811a8c27>] vfs_kern_mount+0x67/0x100
[  828.103775]  [<ffffffff811ab24e>] do_mount+0x23e/0xa20
[  828.105547]  [<ffffffff8113d73b>] ? strndup_user+0x4b/0xf0
[  828.107312]  [<ffffffff811abab3>] SyS_mount+0x83/0xc0
[  828.109096]  [<ffffffff814cfe1d>] system_call_fastpath+0x1a/0x1f

The only thing that I did recently is defrag /var/log/journal files
(journalctl is very slow because of btrfs COW). Something like this
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg24878.html

How to fix this problem? And restore the data...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to