RAID5: btrfs rescue chunk-recover segfaults.

Simon Waid Mon, 23 Jan 2017 05:16:21 -0800

Dear all,

I have a btrfs raid5 array that has become unmountable. When trying to
mount dmesg containes the following:


[ 5686.334384] BTRFS info (device sdb): disk space caching is enabled
[ 5688.377244] BTRFS info (device sdb): bdev /dev/sdb errs: wr 2517, rd
77, flush 0, corrupt 0, gen 0
[ 5688.377254] BTRFS info (device sdb): bdev /dev/sdc errs: wr 0, rd 0,
flush 0, corrupt 10, gen 0
[ 5688.377261] BTRFS info (device sdb): bdev /dev/sdd1 errs: wr 0, rd 0,
flush 0, corrupt 5, gen 0
[ 5688.377268] BTRFS info (device sdb): bdev /dev/sde errs: wr 21, rd
8807, flush 0, corrupt 0, gen 0
[ 5688.744249] BTRFS error (device sdb): parent transid verify failed on
16227387371520 wanted 88711 found 88395
[ 5689.533817] BTRFS error (device sdb): parent transid verify failed on
16227388260352 wanted 88711 found 88395
[ 5689.609355] BTRFS error (device sdb): parent transid verify failed on
16227415158784 wanted 88711 found 88397
[ 5689.627715] BTRFS error (device sdb): parent transid verify failed on
16227415158784 wanted 88711 found 88397
[ 5689.627731] BTRFS error (device sdb): failed to read block groups: -5
[ 5689.675017] BTRFS error (device sdb): open_ctree failed

I tried to recover from the problem using:

btrfs rescue chunk-recover -v /dev/sdb

The command runs for a few minutes. Then it segfaults. I used gdb to
debug. This is the backtrace:

Starting program: btrfs-progs/btrfs rescue chunk-recover -v /dev/sdb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
All Devices:
     Device: id = 4, name = /dev/sde
     Device: id = 1, name = /dev/sdd1
     Device: id = 2, name = /dev/sdc
     Device: id = 3, name = /dev/sdb

[New Thread 0x7ffff6f6e700 (LWP 8155)]
[New Thread 0x7ffff676d700 (LWP 8156)]
[New Thread 0x7ffff5f6c700 (LWP 8157)]
[New Thread 0x7ffff576b700 (LWP 8158)]
Scanning: 24603734016 in dev0, 32581337088 in dev1, 37911248896 in dev2,
32217350144 in dev3
Thread 2 "btrfs" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6f6e700 (LWP 8155)]
btrfs_new_device_extent_record (leaf=leaf@entry=0x7ffff00008c0,
key=key@entry=0x7ffff6f6dc90, slot=slot@entry=12)
     at cmds-check.c:6656
6656        rec->chunk_objecteid =
(gdb) backtrace
#0  btrfs_new_device_extent_record (leaf=leaf@entry=0x7ffff00008c0,
key=key@entry=0x7ffff6f6dc90, slot=slot@entry=12)
     at cmds-check.c:6656
#1  0x00000000004370d2 in process_device_extent_item (slot=12,
key=0x7ffff6f6dc90, leaf=0x7ffff00008c0,
     devext_cache=0x7fffffffe410) at chunk-recover.c:332
#2  extract_metadata_record (rc=rc@entry=0x7fffffffe3c0,
leaf=leaf@entry=0x7ffff00008c0) at chunk-recover.c:727
#3  0x000000000043759b in scan_one_device (dev_scan_struct=0x6ae420) at
chunk-recover.c:807
#4  0x00007ffff733f6ba in start_thread (arg=0x7ffff6f6e700) at
pthread_create.c:333
#5  0x00007ffff707582d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Information about the system:

uname -a: Linux 4.10.0-041000rc4-generic #201701152031 SMP Mon Jan 16
01:33:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
btrfs-progs --version: v4.9 (from
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git)
sudo btrfs fi show
Label: none  uuid: a27cc0cf-1665-43ba-8c63-bf236d31fcd2
     Total devices 4 FS bytes used 6.51TiB
     devid    1 size 2.73TiB used 2.73TiB path /dev/sdd1
     devid    2 size 7.28TiB used 2.73TiB path /dev/sdc
     devid    3 size 3.64TiB used 3.56TiB path /dev/sdb
     devid    4 size 1.82TiB used 1.46TiB path /dev/sde
btrfs fi df wont work as the filesystem is not mountable.

Any help would be appreciated!

Best regards,
Simon


PS: I'd also like to mention how the raid array became unmountable.

The system I was running at that time was:
Kernel: 4.8.0-34 generic #36~16.04.1 Ubuntu SMP
btrfs-progs --version: v4.4

- I issued a replace command on disk 2. During the replace, disc 4 was
disconnected. I noticed it and rebooted the system just a few second
after the event. After the reboot, the replace continued and eventually
finished. However, dmesg would showed errors like: parent transid verify
failed on 16227387371520 wanted 88711 found 88395.

- I issued a resize command on the new drive to free additional space:
btrfs resize 2:max, which completed without errors.

- I issued a balance without any filters in the hope it would correct
the "parent transid verify failed" errors. The balance started normally.
However, after about one hour, I saw that no I/O would happen and lots
of errors appeared in dmesg. I tried to reboot but the command had no
effect, so disconnected the PC from the power supply.

I have attached the dmesg for the resize and balance operations.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RAID5: btrfs rescue chunk-recover segfaults.

Reply via email to