Hello, This may be the same problem as "btrfs lockup".
I have two systems using btrfs for several years. One is my home desktop, it has root+home ext4 fs on a PCI SSD, and "big stuff" on a btrfs using two hard disks in RAID1 configuration: root@pccross:/export# uname -a Linux pccross 4.7.0-rc2-custom #2 SMP Sat Jun 11 01:13:59 MSK 2016 x86_64 x86_64 x86_64 GNU/Linux # -- Was earlier 4.x version when the problem happened root@pccross:/export# btrfs --version btrfs-progs v4.4 root@pccross:/export# btrfs fi show Label: 'export' uuid: c94c3ef6-394e-4441-8992-d7033332bdff Total devices 2 FS bytes used 1.26TiB devid 1 size 3.64TiB used 1.26TiB path /dev/sda devid 2 size 3.64TiB used 1.26TiB path /dev/sdb root@pccross:/export# btrfs fi df /export Data, RAID1: total=1.26TiB, used=1.25TiB System, RAID1: total=32.00MiB, used=208.00KiB Metadata, RAID1: total=5.00GiB, used=3.82GiB GlobalReserve, single: total=512.00MiB, used=0.00B A month ago, I moved a directory containing a few Gb from home (ext4) to btrfs with `mv` command. The command took some minutes and eventually finished without error. After some hours, a cron job that uses files on btrfs did not run. I logged in to investigate and realized that its process was in 'D' state, and any command that I tried that would use btrfs (ls, ...) would enter 'D' state and stay there indefinitely. There was nothing interesting (that I remember) in dmesg. Reboot did not help and indeed could not complete because some of startup jobs use files on btfs, and they hang. I rebooted without mounting btrfs and ran `btrfsck`. It found and fixed some inconsistencies (no log, sorry), and I could mount, and since then everything works, except the directory that I moved disappeared altogether (I had a backup so could restore it). No debugging material left so this is just for background. ===== Enter the second system. It is a rented physical server in a datacenter with two hard disks, joined into a single root btrfs (/dev/sd[ab]1 are swap partitions): root@dehost:~# uname -a Linux dehost 3.13.0-91-generic #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux root@dehost:~# btrfs --version Btrfs v3.12 root@dehost:~# btrfs fi show Label: none uuid: 67a2708c-f039-4783-a699-6f6be0dac318 Total devices 2 FS bytes used 442.58GiB devid 1 size 2.72TiB used 444.04GiB path /dev/sda2 devid 2 size 2.72TiB used 444.03GiB path /dev/sdb2 Btrfs v3.12 root@dehost:~# btrfs fi df / Data, RAID1: total=440.00GiB, used=439.51GiB System, RAID1: total=32.00MiB, used=72.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=4.00GiB, used=3.07GiB A week ago, the system started to become unresponsive every day. Kernel works (responds to ping) but no processes can start. Looking at the logs after reboot I noticed that activity stops some time after the start of backup cron job that covers a set of directories (/etc, /home, /var/mail and some more.). I disabled the backup job and since then, several days, it did not hang. ===== My question to the developers: what can I do to (1) recover the filesystem while it is mounted (I can use recovery netboot system and run `btrfs check` as the last resort), and (2) provide any useful debugging information to the developers? Thank you, Eugene
signature.asc
Description: OpenPGP digital signature