On 2014-09-16 16:57, Chris Murphy wrote:
> 
> On Sep 16, 2014, at 8:40 AM, Austin S Hemmelgarn <ahferro...@gmail.com> wrote:
> 
>> Based on the kernel messages, the primary issue is log corruption, and
>> in theory btrfs-zero-log should fix it.
> 
> Can you provide a complete dmesg somewhere for this initial failure, just for 
> reference? I'm curious what this indication looks like compared to other 
> problems.
> 
Okay, I can't really get a 'complete' dmesg, because the system panics 
on the mount failure (the filesystem in question is the system's root 
filesystem), the system has no serial ports, and I didn't think to 
build in support for console on ttyUSB0.  I can however get what the 
recovery environment (locally compiled based on buildroot) shows when I 
try to mount the filesystem:
[   30.871036] BTRFS: device label gentoo devid 1 transid 160615 /dev/sda3
[   30.875225] BTRFS info (device sda3): disk space caching is enabled
[   30.917091] BTRFS: detected SSD devices, enabling SSD mode
[   30.920536] BTRFS: bad tree block start 0 130402254848
[   30.924018] BTRFS: bad tree block start 0 130402254848
[   30.926234] BTRFS: failed to read log tree
[   30.953055] BTRFS: open_ctree failed
>>  The actual issue however, is
>> that the primary superblock appears to be pointing at a corrupted root
>> tree, which causes pretty much everything that does anything other than
>> just read the sb to fail.  The first backup sb does point to a good
>> tree, but only btrfs check and btrfs restore have any option to ignore
>> the first sb and use one of the backups instead.
> 
> Maybe use wipefs -a on this volume, which removes the magic from only the 
> first superblock by default (you can specify another location). And then try 
> btrfs-show-super -F which "dumps" supers with bad magic.
> 
Thanks for the suggestion, I hadn't thought of that...
> I just tried this:
> # wipefs -a /dev/sdb
> /dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 
> 5f 4d
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum                  0x5c1196d7 [DON'T MATCH]
> bytenr                        65536
> flags                 0x1
> magic                 ........ [DON'T MATCH]
> […]
> # btrfs-show-super -i1 /dev/sdb
> superblock: bytenr=67108864, device=/dev/sdb
> ---------------------------------------------------------
> csum                  0xfc70be19 [match]
> bytenr                        67108864
> flags                 0x1
> magic                 _BHRfS_M [match]
> 
> So the mirror is definitely there and valid.
> # btrfs rescue super-recover -yv /dev/sdb
> No valid Btrfs found on /dev/sdb
> Usage or syntax errors
> 
> Not expected at all, man page says "Recover bad superblocks from good 
> copies." There's a good copy, it's not being found by btrfs rescue 
> super-recover. Seems like a bug.
> 
> 
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> 
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> OK it finds it, maybe a --repair will fix the bad first one?
> # btrfs check -s1 /dev/sdb
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Checking filesystem on /dev/sdb
> UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
> [snip]
> No indication of repair
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> # btrfs check /dev/sdb
> No valid Btrfs found on /dev/sdb
> Couldn't open file system
> [root@f21v ~]# btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum                  0x5c1196d7 [DON'T MATCH]
> bytenr                        65536
> flags                 0x1
> magic                 ........ [DON'T MATCH]
> 
> 
> Still not fixed. Maybe I needed to corrupt something else in the superblock 
> other than the magic and this behavior is intentional, otherwise wipefs -a, 
> followed by btrfsck would resurrect an intentionally wiped btrfs fs, 
> potentially wiping out some newer file system in the process.
> 
...though maybe it's a good thing I didn't.
> 
> 
>> I'm fine using dd to replace the primary sb with one of the
>> backups, but don't know the exact parameters that would be needed.
> 
> Here's an idea:
> 
> # btrfs-show-super /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum                  0x92aa51ab [match]
> [snip]
> So I know what I'm looking for starts at LBA 65536/512
> 
> # dd if=/dev/sdb skip=128 count=4 2>/dev/null | hexdump -C
> 00000000  92 aa 51 ab 00 00 00 00  00 00 00 00 00 00 00 00  |..Q…..........|
> [snip]
> 
> And as it turns out the csum is right at the beginning, 4 bytes. So use bs of 
> 4 bytes, seek 65536/4, count of 1. This should zero just 4 bytes starting at 
> 65536 bytes in.
> 
> # dd if=/dev/zero of=/dev/sdb bs=4 seek=16384 count=1
> 
> Checked it with the earlier skip=128 command and it looks like everything 
> else is intact.
> 
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum                  0x00000000 [DON'T MATCH]
> bytenr                        65536
> flags                 0x1
> magic                 _BHRfS_M [match]
> [snip]
> OK so the csum is bad, the magic is good. Now see if btrfs rescue 
> super-recover does anything
> # btrfs rescue super-recover /dev/sdb
> Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are 
> you sure? [y/N]: Y
> Recovered bad superblocks successful
> *** Error in `btrfs': corrupted double-linked list: 0x0000000002289e40 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7a77e)[0x7f388663977e]
> /lib64/libc.so.6(+0x80b03)[0x7f388663fb03]
> /lib64/libc.so.6(+0x81c88)[0x7f3886640c88]
> /lib64/libc.so.6(cfree+0x4c)[0x7f38866456ec]
> btrfs[0x425ec6]
> btrfs[0x406902]
> /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38865df0e0]
> btrfs[0x406a04]
> ======= Memory m
> [snip]
> 
> kaboom!
> 
> But was it really successful?
> # btrfs-show-super -F /dev/sdb
> superblock: bytenr=65536, device=/dev/sdb
> ---------------------------------------------------------
> csum                  0x92aa51ab [match]
> [skip]
> Looks fixed. And it mounts.
> 
> NOW, I didn't actually have my first superblock pointing to a corrupt root 
> tree. So it's possible that while the csum was fixed in my case, that the 
> subsequent crash has not properly copied all good parts of superblock1 to 
> superblock0. *shrug*
> 
> And since it crashes, looks like I found a bug.
> 
>> I'm using btrfs-progs 3.16 and
>> kernel 3.16.1.
> 
> So did I for all of the above.
> 
> 
Since posting this, I realized that the recovery environment I'm working from 
is actually btrfs-progs 3.14.1 and kernel 3.14.5, I need to make a point to 
update that once I get the system working again.

I've also discovered, when trying to use btrfs restore to copy out the data to 
a different system, that 3.14.1 restore apparently chokes on filesystem that 
have lzo compression turned on.  It's reporting errors trying to inflate 
compressed files, and I know for a fact that none of those files were even 
open, let alone being written to, when the system crashed.  I don't know if 
this is a known bug or even if it is still the case with btrfs-progs 3.16, but 
I figured I'd comment about it because I haven't seen anything about it 
anywhere.

Also, I interestingly didn't get the crash you saw above with btrfs rescue 
super-recover, so that might be a regression in 3.16 btrfs-progs.

Thanks for all the help.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to