Re: help on broken file system

Duncan Mon, 27 Apr 2015 23:28:46 -0700

Ermanno Baschiera posted on Mon, 27 Apr 2015 15:39:14 +0200 as excerpted:

> I have a 3 disks file system configured in RAID1, created with Ubuntu
> 13.10 (if I recall correctly). Last friday I upgraded my system from
> Ubuntu 14.10 (kernel 3.16.0) to 15.04 (kernel 3.19.0). Then I started to
> notice some malfunctions (errors on cron scripts, my time machine asking
> to perform a full backup, high load, etc.). On saturday I rebooted the
> system and it booted in readonly. I tried to reboot it and it didn't 
boot
> anymore, stuck at mounting the disks.
> So I booted with a live Ubuntu 15.05 which could not mount my disks,
> even with "-o recovery,". Then I switched to Fedora beta with kernel
> 4.0.0-0.rc5. I did a "btrfs check" and got a lot of "parent transid 
verify
> failed on 8328801964032 wanted 1568448 found 1561133".
> Reading on docs and Stack Exchange, I decided to try a "btrfs restore"
> to backup my data. Having not a spare disk, and being the file system a
> RAID1, I decided to use one of the 3 disks as target for the restore. I
> formatted it in EXT4 and tried the restore. The process stopped after 
one
> minute, ending with errors.
> Then I tried to "btrfs-zero-log" the file system, but I noticed that
> running it multiple times, it was giving me the same amount of messages,
> making me think it wasn't fixing anything.
> So I run a "btrfs rescue chunk-recover". After that, I still not being
> able to mount the system (with parameters -o recovery,degraded,ro).
> I'm not sure about what to do now. Can someone give me some advice?
> My possible steps are (if I understand correctly):
> - try the "btrfs rescue super-recover"
> - try the "btrfs check --repair"


Sysadmin's backup rule of thumb:  If the data is valuable to you, it's 
backed up.  If it's not backed up, by definition, you consider it less 
valuable to you than the time and money you're saving by not backing it 
up, or it WOULD be backed up.  No exceptions.

And the corollary:  A backup is not a backup until you have tested your 
ability to actually use it.  An untested "will-be backup" is therefore 
not yet a backup, as the backup job is not yet completed until it is 
tested usable.

Given that btrfs isn't yet fully stable and mature, those rules apply to 
it even more than they apply to other, more stable and mature filesystems.

So... no problem.  If you have a backup, restore from it and be happy.  
If you don't, as seems to be the case, then by definition, you considered 
the time and money saved by not doing that backup more valuable than the 
data, and you still have that time and money you saved, so again, no 
problem.


OK, so you unfortunately may have learned that the hard way...  Lesson 
learned, is there any hope?

Actually, yes, and you were on the right track with restore, you just 
haven't gone far enough with it yet, using only its defaults, which as 
you've seen, don't always work.  But with a strong dose of patience, some 
rather fine-point effort, and some luck... hopefully... =:^)

The idea is to use btrfs-find-root along with the advanced btrfs restore 
options to find an older root commit (btrfs' copy-on-write nature means 
there's generally quite a few older generations still on the device(s)) 
that contains as much of the data you're trying to save as possible.  
There's a writeup on the wiki about it, but last I checked, it was rather 
outdated.  Still, you should be able to use it as a start, and with some 
trial and error...

https://btrfs.wiki.kernel.org/index.php/Restore

Basically, your above efforts stopped at the "really lucky" stage.  
Obviously you aren't that lucky, so you gotta do the "advanced usage" 
stuff.

A few hints that I found helpful last time I had to use it.[1]

* Use current btrfs-progs for the best chance at successful restoration.  
As of a few days ago, that was v3.19.1, the version I'm referring to in 
the points below.

* "Generation" and "transid" (transaction ID) are the same thing.  
Fortunately the page actually makes this a bit more explicit than it used 
to, as this key to understanding the output, which also makes it worth 
repeating, just in case.

* Where the page says pick the tree root with the largest set of 
filesystem trees, use restore's -l option to see those trees.  (The page 
doesn't say how to see the set, just to use the largest set.)

* Use btrfs-show-super to list what the filesystem thinks is the current 
transid/generation, and btrfs-find-root to find older candidate 
transids.  

* Feed the bytenrs (byte numbers) from find-root to restore using the -t 
option (as the page mentions), first with -l to see if it gives you a 
full list of filesystem trees, then with -D (dry run, which didn't exist 
when the page was written) to see if you get a good list of files.

* Restore's -D (dry run) can be used to see what it thinks it can 
restore.  It's a file list so will likely be long.  You thus might want 
to redirect it to a file or pipe it to a pager for further examination.

* In directories with lots of files, restore will loop enough it can 
think it's not making progress, and will prompt you to continue or not.  
You'll obviously want to continue if you want all the files in that dir 
restored.  (Back when I ran it, it just gave up, and I had to run it 
repeatedly, getting more files each time, to get them all.)

* Restore currently only restores file data, not metadata like dates, 
ownership/permission, etc, and not symlinks.  Files are written as owned 
by the user and group (probably root:root ) you're running restore as, 
using the current UMASK.  When I ran restore, since I had a stale backup 
as well, I whipped up a script to compared to it, and where the file 
existed in the backup too, the script used the backup file as a reference 
to reset ownership/perms.  That left only the files new enough not to be 
in the backup to deal with, and there were relatively few of those.  I 
had to recreate the symlinks manually.

There are still very new (less than a week old) patches on the list that 
let restore optionally restore ownership/perms/symlinks, too.  Depending 
on what you're restoring, it may be well worth your time to rebuild btrfs-
progs with these patches applied, letting you avoid having to do the 
fixups I had to do when I had to use restore.



Given enough patience and the technical literacy to piece things together 
from the outdated page, the above hints, and the output as you get it, 
chances are reasonably good that you'll be able to successfully restore 
most of your files.  Btrfs' COW nature makes the techniques restore uses 
surprisingly effective, but it does take a bit of reading between the 
lines to figure things out, and nerves of steel while you're working on 
it.  The exception would be a filesystem that's simply so heavily damaged 
there's just not enough of the trees, of /any/ generation, left to make 
sense of things.

---
[1] FWIW, I had a backup, but it wasn't as current as I wanted, and it 
turned out restore gave me newer copies than my stale backup of many 
files.  In keeping with the above rule, the data was valuable enough to 
me to back it up, but obviously not valuable enough to me to consistently 
update that backup...  If I'd have lost everything from the backup on, 
I'd have been not exactly happy, but I'd have considered it fair for the 
backup time/energy/money invested.  Restore thus simply let me get a 
better deal than I actually deserved... which actually happens enough 
that I'm obviously willing to play the odds...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: help on broken file system

Reply via email to