I'm not entirely sure what went completely wrong.  Three possibilities are most 
likely, and they're listed below.
For reference, here are supplemental materials split out into their own 
pastebins:
* btrfs-debug-tree -R log http://pastebin.com/7ePy9sin
* dmesg log http://pastebin.com/s1sdJRyd
(btrfs tools are git head)
Mounting with "recovery,ro" is no use.
I've also taken a metadata dump with btrfs-image, though it completed with 
errors, so the dump may be incomplete.  It's also 5 GBs, but I'm more than 
willing to make it publicly downloadable if it would help the cause.

************** 1
Firstly, I have a raid1 (and, as I'll explain, partially raid10) array of 8 raw 
drives.  A couple experience a controller error every once in a while.  So it 
/may/ be the case that the hardware itself caused this problem, but I find it 
less likely than the following other two possibilities.  (However, in part 3's 
log there is some mention of sdf giving IO errors...)

************** 2
A couple of months ago I was doing a balance, trying to convert from raid10 to 
raid1.  At the time, it was on the 3.6 kernel.

I kept getting enospc errors (even with plenty of space), so I went from doing 
a soft conversion to a hard one.  Of course, in the process my server was 
hard-rebooted by accident.  When back online, I used btrfsck and it showed a 
bunch of extent vs. csum problems, which I used --repair to attempt to deal 
with. 

Though I can't recall the problems exactly, I do remember that it triggered an 
odd check regarding csums existing for extents that were freed.
The commit which introduced this printf was 
https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=580ccf9e2ef4607f5b67b531190e7842c4b2b0db

Since then, every once in a while I would do another balance (sometimes soft, 
sometimes hard) in an attempt to complete the conversion -- to no avail, but 
seemingly to no harm.

************** 3
Now, 2 weeks ago I (foolishly) thought I'd try the new skinny extents feature 
(mistaking it as available in 3.9) in order to see if it might alleviate the 
issues I've had with trying to finish that conversion.  I enabled it via 
btrfstune, but quickly noted that my 3.9 kernel wouldn't mount the filesystem 
anymore (because of the incompatible feature).

However, nothing had changed on-disk (given I wasn't running 3.10) but the 
flag...  So I looked into clearing that flag, but btrfstune provided me no 
recourse.  So I did something very dangerous and foolish:  I went into 
btrfstune.c and changed the setting of the flag to clear the flag instead, then 
reran it.  I mounted again, fingers crossed, and lo and behold, it was fine!

Unfortunately, after some use, the filesystem failed and went read-only.  
That's when I got scared and decided it was time to stop trying to fix things 
myself (of course, far too late).

The actual log is at http://pastebin.com/s1sdJRyd
On line 85 you can see where I tried to mount it
Line 87 is where I remounted after my btrfstune hack
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to