cannot mount btrfs volume read/write (+task blocked backtrace)

Frédéric Sat, 07 Nov 2015 14:17:45 -0800

Hi all, 

I am trying to mount a btrfs filesystem, and I have been told on 
freenode/#btrfs to try the mailing list for more precise advice.
My setup is as follow : one debian server (stretch,  4.2.0-1-amd64 #1 SMP 
Debian 4.2.3-2 (2015-10-14) x86_64) with 5 disks running in a btrfs raid6 
volume.
root@nas:~# btrfs --version
btrfs-progs v4.1.2
root@nas:~#   btrfs fi show
Label: none  uuid: 0d56cb74-65f9-4f4e-9c51-74ea286f3f79
        Total devices 5 FS bytes used 2.66TiB
        devid    3 size 931.51GiB used 320.29GiB path /dev/sda1
        devid    5 size 2.73TiB used 1.76TiB path /dev/sdb1
        devid    6 size 2.73TiB used 1.76TiB path /dev/sde1
        devid    7 size 2.73TiB used 1.76TiB path /dev/sdf1
        devid    8 size 2.73TiB used 609.50GiB path /dev/sdd1


btrfs-progs v4.1.2
root@nas:~#   btrfs fi df /raid
Data, RAID6: total=2.66TiB, used=2.66TiB
System, RAID6: total=64.00MiB, used=416.00KiB
Metadata, RAID6: total=10.00GiB, used=8.13GiB
GlobalReserve, single: total=512.00MiB, used=544.00KiB


As you can guess, device 8 is pretty new. Device 3 is to be removed. I already 
issued a "btrfs dev del" on it (and the data was balanced to dev 8), but had to 
interrupt it (I'm not proud of that, but the server had to be turned off).
I had few hitches with this volume recently : the btrfs dev del was 
interrupted, and one of the SATA cable was defective (I saw that recently, it 
is now replaced). I ran a scrub which had not had enough time to complete. I 
can not say for sure that it is not waiting for resuming...
The volume has several snapshots (about 20 or 30, I activated snapper 1 month 
ago), and I just enabled qgroups in order to monitor snapshot disk usage. I 
know that the qgroups quotas are not updated yet : because of the bad cable, 
the latest "btrfs dev del" started to have errors, and I could not do anything 
but kill the server. If I remember correctly, the disk with the bad cable was 
devid 5 (/dev/sdb1).
I enabled qgroups during the "btrfs dev del" process.

Now, I can mount the volume with no problem in read-only (so my backups are 
up-to-date!)
If I try to mount it RW, I get the following errors in dmesg :
steps :
mount -o ro,recover,nospace_cache,clear_cache,skip_balance
(time is 28s in dmesg)
mount -o remount,rw,recover,nospace_cache,clear_cache,skip_balance
(time 83s)
dmesg :
[   28.920176] BTRFS info (device sda1): enabling auto recovery
[   28.920183] BTRFS info (device sda1): force clearing of disk cache
[   28.920187] BTRFS info (device sda1): disabling disk space caching
[   28.920190] BTRFS: has skinny extents
[   29.551170] BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3434, 
gen 0
[   83.483314] BTRFS info (device sda1): disabling disk space caching
[   83.483323] BTRFS info (device sda1): enabling auto recovery
[   83.483326] BTRFS info (device sda1): enabling auto recovery
[  360.188189] INFO: task btrfs-transacti:1100 blocked for more than 120 
seconds.
[  360.188209]       Tainted: G        W       4.2.0-1-amd64 #1
[  360.188214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  360.188221] btrfs-transacti D ffffffff8109a1c0     0  1100      2 0x00000000
[  360.188226]  ffff8800d857eec0 0000000000000046 ffff8800cb947e20 
ffff88011abecd40
[  360.188229]  0000000000000246 ffff8800cb948000 ffff880119dc8000 
ffff8800cb947e20
[  360.188231]  ffff88011ae83d58 ffff8800d8d72d48 ffff8800d8d72da8 
ffffffff8154acaf
[  360.188233] Call Trace:
[  360.188239]  [<ffffffff8154acaf>] ? schedule+0x2f/0x70
[  360.188265]  [<ffffffffa022c0bf>] ? btrfs_commit_transaction+0x3ef/0xa90 
[btrfs]
[  360.188269]  [<ffffffff810a9ad0>] ? wait_woken+0x80/0x80
[  360.188281]  [<ffffffffa0227654>] ? transaction_kthread+0x224/0x240 [btrfs]
[  360.188293]  [<ffffffffa0227430>] ? btrfs_cleanup_transaction+0x510/0x510 
[btrfs]
[  360.188296]  [<ffffffff8108aa41>] ? kthread+0xc1/0xe0
[  360.188298]  [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170
[  360.188301]  [<ffffffff8154ea1f>] ? ret_from_fork+0x3f/0x70
[  360.188303]  [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170
[  480.188185] INFO: task btrfs-transacti:1100 blocked for more than 120 
seconds.
[...]

As you can see, I get an error at boot+360s (6 minutes)
The error repeats every two minutes, and stopped at boot+28minutes.
However, the "mount" process was still active, and I stopped it (in order to 
try something else) more than 3 hours later. No message appeared after this one 
(t+28m).
I also tried to mount RO, umount, then mount RW (with the same option), but 
with no success : I got the same message+backtrace at boot+4 minutes

Following an advice on #btrfs, I am currently running a btrfs check --readonly, 
but it takes a pretty long time. However, may the check fix the problem or not, 
the backtrace may be of interest for you...
I'll update the mailing-list after the btrfs-check result

Goulou.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cannot mount btrfs volume read/write (+task blocked backtrace)

Reply via email to