On Sep 16, 2014, at 8:40 AM, Austin S Hemmelgarn <ahferro...@gmail.com> wrote:

> Based on the kernel messages, the primary issue is log corruption, and
> in theory btrfs-zero-log should fix it.

Can you provide a complete dmesg somewhere for this initial failure, just for 
reference? I'm curious what this indication looks like compared to other 
problems.

>  The actual issue however, is
> that the primary superblock appears to be pointing at a corrupted root
> tree, which causes pretty much everything that does anything other than
> just read the sb to fail.  The first backup sb does point to a good
> tree, but only btrfs check and btrfs restore have any option to ignore
> the first sb and use one of the backups instead.

Maybe use wipefs -a on this volume, which removes the magic from only the first 
superblock by default (you can specify another location). And then try 
btrfs-show-super -F which "dumps" supers with bad magic.

I just tried this:
# wipefs -a /dev/sdb
/dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 
5f 4d
# btrfs-show-super -F /dev/sdb
superblock: bytenr=65536, device=/dev/sdb
---------------------------------------------------------
csum                    0x5c1196d7 [DON'T MATCH]
bytenr                  65536
flags                   0x1
magic                   ........ [DON'T MATCH]
[…]
# btrfs-show-super -i1 /dev/sdb
superblock: bytenr=67108864, device=/dev/sdb
---------------------------------------------------------
csum                    0xfc70be19 [match]
bytenr                  67108864
flags                   0x1
magic                   _BHRfS_M [match]

So the mirror is definitely there and valid.
# btrfs rescue super-recover -yv /dev/sdb
No valid Btrfs found on /dev/sdb
Usage or syntax errors

Not expected at all, man page says "Recover bad superblocks from good copies." 
There's a good copy, it's not being found by btrfs rescue super-recover. Seems 
like a bug.


# btrfs check /dev/sdb
No valid Btrfs found on /dev/sdb
Couldn't open file system

# btrfs check -s1 /dev/sdb
using SB copy 1, bytenr 67108864
Checking filesystem on /dev/sdb
UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
[snip]
OK it finds it, maybe a --repair will fix the bad first one?
# btrfs check -s1 /dev/sdb
using SB copy 1, bytenr 67108864
enabling repair mode
Checking filesystem on /dev/sdb
UUID: 9acf13de-5b98-4f28-9992-533e4a99d348
[snip]
No indication of repair
# btrfs check /dev/sdb
No valid Btrfs found on /dev/sdb
Couldn't open file system
# btrfs check /dev/sdb
No valid Btrfs found on /dev/sdb
Couldn't open file system
[root@f21v ~]# btrfs-show-super -F /dev/sdb
superblock: bytenr=65536, device=/dev/sdb
---------------------------------------------------------
csum                    0x5c1196d7 [DON'T MATCH]
bytenr                  65536
flags                   0x1
magic                   ........ [DON'T MATCH]


Still not fixed. Maybe I needed to corrupt something else in the superblock 
other than the magic and this behavior is intentional, otherwise wipefs -a, 
followed by btrfsck would resurrect an intentionally wiped btrfs fs, 
potentially wiping out some newer file system in the process.



> I'm fine using dd to replace the primary sb with one of the
> backups, but don't know the exact parameters that would be needed.

Here's an idea:

# btrfs-show-super /dev/sdb
superblock: bytenr=65536, device=/dev/sdb
---------------------------------------------------------
csum                    0x92aa51ab [match]
[snip]
So I know what I'm looking for starts at LBA 65536/512

# dd if=/dev/sdb skip=128 count=4 2>/dev/null | hexdump -C
00000000  92 aa 51 ab 00 00 00 00  00 00 00 00 00 00 00 00  |..Q…..........|
[snip]

And as it turns out the csum is right at the beginning, 4 bytes. So use bs of 4 
bytes, seek 65536/4, count of 1. This should zero just 4 bytes starting at 
65536 bytes in.

# dd if=/dev/zero of=/dev/sdb bs=4 seek=16384 count=1

Checked it with the earlier skip=128 command and it looks like everything else 
is intact.

# btrfs-show-super -F /dev/sdb
superblock: bytenr=65536, device=/dev/sdb
---------------------------------------------------------
csum                    0x00000000 [DON'T MATCH]
bytenr                  65536
flags                   0x1
magic                   _BHRfS_M [match]
[snip]
OK so the csum is bad, the magic is good. Now see if btrfs rescue super-recover 
does anything
# btrfs rescue super-recover /dev/sdb
Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are 
you sure? [y/N]: Y
Recovered bad superblocks successful
*** Error in `btrfs': corrupted double-linked list: 0x0000000002289e40 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7a77e)[0x7f388663977e]
/lib64/libc.so.6(+0x80b03)[0x7f388663fb03]
/lib64/libc.so.6(+0x81c88)[0x7f3886640c88]
/lib64/libc.so.6(cfree+0x4c)[0x7f38866456ec]
btrfs[0x425ec6]
btrfs[0x406902]
/lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38865df0e0]
btrfs[0x406a04]
======= Memory m
[snip]

kaboom!

But was it really successful?
# btrfs-show-super -F /dev/sdb
superblock: bytenr=65536, device=/dev/sdb
---------------------------------------------------------
csum                    0x92aa51ab [match]
[skip]
Looks fixed. And it mounts.

NOW, I didn't actually have my first superblock pointing to a corrupt root 
tree. So it's possible that while the csum was fixed in my case, that the 
subsequent crash has not properly copied all good parts of superblock1 to 
superblock0. *shrug*

And since it crashes, looks like I found a bug.

> I'm using btrfs-progs 3.16 and
> kernel 3.16.1.

So did I for all of the above.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to