Re: Fixing recursive fault and parent transid verify failed
On Wed, Dec 09, 2015 at 10:19:41AM +, Duncan wrote: > Alistair Grant posted on Wed, 09 Dec 2015 09:38:47 +1100 as excerpted: > > > On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote: > > Thanks again Duncan for your assistance. > > > > I plugged the ext4 drive I planned to use for the recovery in to the > > machine and immediately got a couple of errors, which makes me wonder > > whether there isn't a hardware problem with the machine somewhere. > > > > So decided to move to another machine to do the recovery. > > Ouch! That can happen, and if you moved the ext4 drive to a different > machine and it was fine there, then it's not the drive. > > But you didn't say what kind of errors or if you checked SMART, or even > how it was plugged in (USB or SATA-direct or...). So I guess you have > that side of things under control. (If not, there's some here who know > quite a bit about that sort of thing...) Yep, I'm familiar enough with smartmontools, etc. to (hopefully) figure this out on my own. > > > So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1 > > (the latest version from archlinuxarm.org). > > > > Attempting: > > > > sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee > > btrfs-recover.log > > > > only recovered 53 of the more than 106,000 files that should be > > available. > > > > The log is available at: > > > > https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0 > > > > I did attempt btrfs-find-root, but couldn't make sense of the output: > > > > https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0 > > Yeah, btrfs-find-root's output deciphering takes a bit of knowledge. > Between what I had said and the wiki, I was hoping you could make sense > of things without further help, but... > > ... It turns out that a drive from a separate filesystem was dying and causing all the weird behaviour on the original machine. Having two failures at the same time (drive physical failure and btrfs filesystem corruption) was a bit too much for me, so I aborted the btrfs restore attempts, bought a replacement drive and just went back to the backups (for both failures). Unfortunately, I now won't be able to determine whether there was any connection between the failures or not. So while I didn't get to practice my restore skills, the good news is that it is all back up and running without any problems (yet :-)). Thank you very much for the description and detailed set of steps for using btrfs-find-root and restore. While I didn't get to use them this time, I've added links to the mailing list archive in my btrfs wiki user page so I can find my way back (and if others search for restore and find root they may also benefit from your effort). Thanks again, Alistair -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Wed, 09 Dec 2015 09:38:47 +1100 as excerpted: > On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote: >> Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted: >> >> > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: >> >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as >> >> excerpted: >> >> >> >> > I think I'll try the btrfs restore as a learning exercise >> >> >> >> Trying btrfs restore is an excellent idea. It'll make things far >> >> easier if you have to use it for real some day. > > Thanks again Duncan for your assistance. > > I plugged the ext4 drive I planned to use for the recovery in to the > machine and immediately got a couple of errors, which makes me wonder > whether there isn't a hardware problem with the machine somewhere. > > So decided to move to another machine to do the recovery. Ouch! That can happen, and if you moved the ext4 drive to a different machine and it was fine there, then it's not the drive. But you didn't say what kind of errors or if you checked SMART, or even how it was plugged in (USB or SATA-direct or...). So I guess you have that side of things under control. (If not, there's some here who know quite a bit about that sort of thing...) > So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1 > (the latest version from archlinuxarm.org). > > Attempting: > > sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee > btrfs-recover.log > > only recovered 53 of the more than 106,000 files that should be > available. > > The log is available at: > > https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0 > > I did attempt btrfs-find-root, but couldn't make sense of the output: > > https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0 Yeah, btrfs-find-root's output deciphering takes a bit of knowledge. Between what I had said and the wiki, I was hoping you could make sense of things without further help, but... Well, at least this gets you some practice before you are desperate. =:^) FWIW, I was really hoping that it would find generation/transid 2308, since that's what it was finding on those errors, but that seems to be too far back. OK, here's the thing about transaction IDs aka transids aka generations. Normally, it's a monotonically increasing number, representing the transaction/commit count at that point. Taking a step back, btrfs organizes things as a tree of trees, with each change cascading up (down?) the tree to its root, and then to the master tree's root. Between this and btrfs' copy-on-write nature, this means the filesystem is atomic. If the system crashes at any point, either the latest changes are committed and the master root reflects them, or the master root points to the previous consistent state of all the subtrees, which is still in place due to copy-on-write and the fact that the changes hadn't cascaded all the way up the trees to the master root, yet. And each time the master root is updated, the generation aka transid is incremented by one. So 3503 is the current generation (see the superblock thinks... bit), 3502 the one before that, 3501 the one before that... The superblocks record the current transid and point (by address, aka bytenr) to that master root. But, because btrfs is copy-on-write, older copies of the master root (and the other roots it points to) tend to hang around for awhile. Which is where btrfs-find-root comes along, as it's designed to find all those old roots, listing them by bytenr and generation/transid. In your case, while generation 3361 is current, there's a list going back to generation 2497 with only a few (just eyeballing it) missing, then 2326, and pretty much nothing before that but the REALLY early generation 2 and 3, which are likely a nearly empty filesystem. OK, that explains the generations/transids. There's also levels, which I don't clearly understand myself; definitely not well enough to try to explain, tho I could make some WAGs but that'd just confuse things if they're equally wildly wrong. But it turns out that levels aren't in practice something you normally need to worry much about anyway, so ignoring them seems to work fine. Then, there's bytenrs, the block addresses. These are more or less randomly large numbers, from an admin perspective, but they're very important numbers, because this is the number you feed to restore's -t option, that tells it which tree root to use. Put a different way, humans read the generation aka transid numbers; btrfs reads the block numbers. So what we do is find a generation number that looks reasonable, and get its corresponding block number, to feed to restore -t. OK, knowing that, you can perhaps make a bit more sense of what those transid verify failed messages are all about. As I said, the current generation is 3503. Apparently, there's a problem in a subtree, however, where the
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted: > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: >> >> > I think I'll try the btrfs restore as a learning exercise, and to >> > check the contents of my backup (I don't trust my memory, so >> > something could have changed since the last backup). >> >> Trying btrfs restore is an excellent idea. It'll make things far >> easier if you have to use it for real some day. >> >> Note that while I see your kernel is reasonably current (4.2 series), I >> don't know what btrfs-progs ubuntu ships. There have been some marked >> improvements to restore somewhat recently, checking the wiki >> btrfs-progs release-changelog list says 4.0 brought optional metadata >> restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check >> off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and >> produces invalid filesystems.) So you'll want at least progs 4.0 to >> get the optional metadata restoration, and 4.2.3 to get full symlinks >> restoration support. >> >> > Ubuntu 15.10 comes with btrfs-progs v4.0. It looks like it is easy > enough to compile and install the latest version from > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git so > I'll do that. > > Should I stick to 4.2.3 or use the latest 4.3.1? I generally use the latest myself, but recommend as a general guideline that at minimum, a userspace version series matching that of your kernel be used, as if the usual kernel recommendations (within two kernel series of either current or LTS, so presently 4.2 or 4.3 for current or 3.18 or 4.1 for LTS) are followed, that will keep userspace reasonably current as well, and the userspace of a particular version was being developed concurrently with the kernel of the same series, so they're relatively in sync. So with a 4.2 kernel, I'd suggest at least a 4.2 userspace. If you want the latest, as I generally do, and are willing to put up with occasional bleeding edge bugs like that broken mkfs.btrfs in 4.1.1, by all means, use the latest, but otherwise, the general same series as your kernel guideline is quite acceptable. The exception would be if you're trying to fix or recover from a broken filesystem, in which case the very latest tends to have the best chance at fixing things, since it has fixes for (or lacking that, at least detection of) the latest round of discovered bugs, that older versions will lack. While btrfs restore does fall into the recover from broken category, we know from the changelogs that nothing specific has gone into it since the mentioned 4.2.3 symlink off-by-one fix, so while I would recommend at least that since you are going to be working with restore, there's no urgent need for 4.3.0 or 4.3.1 if you're more comfortable with the older version. (In fact, while I knew I was on 4.3.something, I just had to run btrfs version, to check whether it was 4.3 or 4.3.1, myself. FWIW, it was 4.3.1.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing recursive fault and parent transid verify failed
On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote: > Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted: > > > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: > >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: > >> > >> > I think I'll try the btrfs restore as a learning exercise, and to > >> > check the contents of my backup (I don't trust my memory, so > >> > something could have changed since the last backup). > >> > >> Trying btrfs restore is an excellent idea. It'll make things far > >> easier if you have to use it for real some day. > >> > >> Note that while I see your kernel is reasonably current (4.2 series), I > >> don't know what btrfs-progs ubuntu ships. There have been some marked > >> improvements to restore somewhat recently, checking the wiki > >> btrfs-progs release-changelog list says 4.0 brought optional metadata > >> restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check > >> off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and > >> produces invalid filesystems.) So you'll want at least progs 4.0 to > >> get the optional metadata restoration, and 4.2.3 to get full symlinks > >> restoration support. > >> > >> ... Thanks again Duncan for your assistance. I plugged the ext4 drive I planned to use for the recovery in to the machine and immediately got a couple of errors, which makes me wonder whether there isn't a hardware problem with the machine somewhere. So decided to move to another machine to do the recovery. So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1 (the latest version from archlinuxarm.org). Attempting: sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee btrfs-recover.log only recovered 53 of the more than 106,000 files that should be available. The log is available at: https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0 I did attempt btrfs-find-root, but couldn't make sense of the output: https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0 Simply mounting the drive, then re-mounting it read only, and rsync'ing the files to the backup drive recovered 97,974 files before crashing. If anyone is interested, I've uploaded a photo of the console to: https://www.dropbox.com/s/xbrp6hiah9y6i7s/rsync%20crash.jpg?dl=0 I'm currently running a hashdeep audit between the recovered files and the backup to see how the recovery went. If you'd like me to try any other tests, I'll keep the damaged file system for at least the next day or so. Thanks again for all your assistance, Alistair -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: > I think I'll try the btrfs restore as a learning exercise, and to check > the contents of my backup (I don't trust my memory, so something could > have changed since the last backup). Trying btrfs restore is an excellent idea. It'll make things far easier if you have to use it for real some day. Note that while I see your kernel is reasonably current (4.2 series), I don't know what btrfs-progs ubuntu ships. There have been some marked improvements to restore somewhat recently, checking the wiki btrfs-progs release-changelog list says 4.0 brought optional metadata restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and produces invalid filesystems.) So you'll want at least progs 4.0 to get the optional metadata restoration, and 4.2.3 to get full symlinks restoration support. > Does btrfs restore require the path to be on a btrfs filesystem? I've > got an existing ext4 drive with enough free space to do the restore, so > would prefer to use it than have to buy another drive. Restoring to ext4 should be fine. Btrfs restore writes files as would an ordinary application, the reason metadata restoration is optional (otherwise it uses normal file change and mod times, with files written as the running user, root, using umask- based file perms, all exactly the same as if it were a normal file writing application), so it will restore to any normal filesystem. The filesystem it's restoring /from/ of course must be btrfs... unmounted since it's designed to be used when mounting is broken, but it writes files normally, so can write them to any filesystem. FWIW, I restored to my reiserfs based media partition (still on spinning rust, my btrfs are all on ssd) here, since that's where I had the room to work with. > My plan is: > > * btrfs restore /dev/sdX /path/to/ext4/restorepoint > ** Where /dev/sdX is one of the two drives that were part of the raid1 >fileystem > * hashdeep audit the restored drive and backup > * delete the existing corrupted btrfs filesystem and recreate > * rsync the merge filesystem (from backup and restore) > on to the new filesystem > > Any comments or suggestions are welcome. Looks very reasonable, here. There's a restore page on the wiki with more information than the btrfs-restore manpage, describing how to use it with btrfs-find-root if necessary, etc. https://btrfs.wiki.kernel.org/index.php/Restore Some details on the page are a bit dated; it doesn't cover the dryrun, list-roots, metadata and symlink options, for instance, and these can be very helpful, but the general idea remains the same. The general idea is to use btrfs-find-root to get a listing of available root generations (if restore can't find a working root from the superblocks or you want to try restoring an earlier root), then feed the corresponding bytenr to restore's -t option. Note that generation and transid refer to the same thing, a normally increasing number, so higher generations are newer. The wiki page makes this much clearer than it used to, but the old wording anyway was confusing to me until I figured that out. Where the wiki page talks about root object-ids, those are the various subtrees, low numbers are the base trees, 256+ are subvolumes/snapshots. Note that restore's list-roots option lists these for the given bytenr as well. So you try restore with list-roots (-l) to see what it gives you, try btrfs-find-root if not satisfied, to find older generations and get their bytenrs to plug into restore with -t, and then confirm specific generation bytenrs with list-roots again. Once you have a good generation/bytenr candidate, try a dry-run (-D) to see if you get a list of files it's trying to restore that looks reasonable. If the dry-run goes well, you can try the full restore, not forgetting the metadata and symlinks options (-m, -S, respectively), if desired. >From there you can continue with your plan as above. One more bonus hint. Since you'll be doing a new mkfs.btrfs, it's a good time to review active features and decide which ones you might wish to activate (or not, if you're concerned about old-kernel compatibility). Additionally, before repopulating your new filesystem, you may want to review mount options, particularly autodefrag if appropriate, and compression if desired, so they take effect from the very first file created on the new filesystem. =:^) FWIW in the past I usually did an immediate post-mkfs.btrfs mount and balance with -dusage=0 -musage=0 to get rid of the single-mode chunk artifacts from the mkfs.btrfs as well, but with a new enough mkfs.btrfs you may be able to avoid that now, as -progs 4.2 was supposed to eliminate those single-mode mkfs.btrfs artifacts on multi-device filesystems. I've just not done any fresh mkfs.btrfs since then so haven't had a
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Mon, 07 Dec 2015 12:57:15 +1100 as excerpted: > I've ran btrfs scrub and btrfsck on the drives, with the output included > below. Based on what I've found on the web, I assume that a > btrfs-zero-log is required. > > * Is this the recommended path? [Just replying to a couple more minor points, here.] Absolutely not. btrfs-zero-log isn't the tool you need here. About the btrfs log... Unlike most journaling filesystems, btrfs is designed to be atomic and consistent at commit time (every 30 seconds by default) and doesn't log normal filesystem activity at all. The only thing logged is fsyncs, allowing them to deliver on their file-written-to-hardware guarantees, without forcing the entire atomic filesystem sync, which would trigger a normal atomic commit and thus is a far heavier weight process. IOW, all it does is log and speedup fsyncs. The filesystem is designed to be atomically consistent at commit time, with or without the log, with the only thing missing if the log isn't replayed being the last few seconds of fsyncs since the last atomic commit. So the btrfs log is very limited in scope and will in many cases be entirely empty, if there were no fsyncs after the last atomic filesystem commit, again, every 30 seconds by default, so in human terms at least, not a lot of time. About btrfs log replay... The kernel, meanwhile, is designed to replay the log automatically at mount time. If the mount is successful, the log has by definition been replayed successfully and zeroing it wouldn't have done much of anything but possibly lose you a few seconds worth of fsyncs. Since you are able to run scrub, which requires a writable mount, the mount is definitely successful, which means btrfs-zero-log is the wrong tool for the job, since it addresses a problem you obviously don't have. > * Is there a way to find out which files will be affected by the loss of > the transactions? I'm interpreting that question in the context of the transid wanted/found listings in your linked logs, since it no longer makes sense in the context of btrfs-zero-log, given the information above. I believe so, but the most direct method requires manual use of btrfs- debug and similar tools, looking up addresses and tracing down the files to which they belong. Of course that's if the addresses trace to actual files at all. If they trace to metadata instead of data, then it's not normally files, but the metadata (including checksums and very small files of only a few KiB) about files, instead. Of course if it's metadata the problem's worse, as a single bad metadata block can affect multiple actual files. The more indirect way would be to use btrfs restore with the -t option, feeding it the root address associated with the transid found (with that association traced via btrfs-find-root), to restore the file from the filesystem as it existed at that point, to some other mounted filesystem, also using the restore metadata option. You could then do for instance a diff of the listing (or possibly a per-file checksum, say md5sum, of both versions) between your current backup (or current mounted filesystem, since you can still mount it) and the restored version, which would be the files at the time of that transaction-id, and see which ones changed. That of course would be the affected files. =:^] > I do have a backup of the drive (which I believe is completely up to > date, the btrfs volume is used for archiving media and documents, and > single person use of git repositories, i.e. only very light writing and > reading). Of course either one of the above is going to be quite some work, and if you have a current backup, simply restoring it is likely to be far easier, unless of course you're interested in practicing your recovery technique or the like, certainly not a valueless endeavor, if you have the time and patience for it. The *GOOD* thing is that you *DO* have a current backup. Far *FAR* too many people we see posting here, are unfortunately finding out the hard way, that their actions, or more precisely, lack thereof, in failing to do backups, put the lie to any claims that they actually valued the data. As any good sysadmin can tell you, often from unhappy lessons such as this, if it's not backed up, by definition, your actions are placing its value at less than the time and resources necessary to do that backup (modified of course by the risk factor of actually needing it, thus taking care of the Nth level backup, some of which are off-site, if the data is really /that/ valuable, while also covering the throw-away data that's so trivial as to not justify even the effort of a single level of backup). So hurray for you! =:^) (FWIW, I personally have backups of most stuff here, often several levels, tho I don't always keep them current. But should I be forced to resort to them, I'm prepared to lose the intervening updates, as I
Re: Fixing recursive fault and parent transid verify failed
On Mon, Dec 07, 2015 at 08:25:01AM +, Duncan wrote: > Alistair Grant posted on Mon, 07 Dec 2015 12:57:15 +1100 as excerpted: > > > I've ran btrfs scrub and btrfsck on the drives, with the output included > > below. Based on what I've found on the web, I assume that a > > btrfs-zero-log is required. > > > > * Is this the recommended path? > > [Just replying to a couple more minor points, here.] > > Absolutely not. btrfs-zero-log isn't the tool you need here. > > About the btrfs log... > > Unlike most journaling filesystems, btrfs is designed to be atomic and > consistent at commit time (every 30 seconds by default) and doesn't log > normal filesystem activity at all. The only thing logged is fsyncs, > allowing them to deliver on their file-written-to-hardware guarantees, > without forcing the entire atomic filesystem sync, which would trigger a > normal atomic commit and thus is a far heavier weight process. IOW, all > it does is log and speedup fsyncs. The filesystem is designed to be > atomically consistent at commit time, with or without the log, with the > only thing missing if the log isn't replayed being the last few seconds > of fsyncs since the last atomic commit. > > So the btrfs log is very limited in scope and will in many cases be > entirely empty, if there were no fsyncs after the last atomic filesystem > commit, again, every 30 seconds by default, so in human terms at least, > not a lot of time. > > About btrfs log replay... > > The kernel, meanwhile, is designed to replay the log automatically at > mount time. If the mount is successful, the log has by definition been > replayed successfully and zeroing it wouldn't have done much of anything > but possibly lose you a few seconds worth of fsyncs. > > Since you are able to run scrub, which requires a writable mount, the > mount is definitely successful, which means btrfs-zero-log is the wrong > tool for the job, since it addresses a problem you obviously don't have. OK, thanks for the detailed explanation (here and below, so I don't have to repeat myself). The reason I thought it might be required was that the parent transid failed errors were found even after a reboot (and obviously remounting the filesystem) and without any user activity. > > > * Is there a way to find out which files will be affected by the loss of > > the transactions? > > I'm interpreting that question in the context of the transid wanted/found > listings in your linked logs, since it no longer makes sense in the > context of btrfs-zero-log, given the information above. > > I believe so, but the most direct method requires manual use of btrfs- > debug and similar tools, looking up addresses and tracing down the files > to which they belong. Of course that's if the addresses trace to actual > files at all. If they trace to metadata instead of data, then it's not > normally files, but the metadata (including checksums and very small > files of only a few KiB) about files, instead. Of course if it's > metadata the problem's worse, as a single bad metadata block can affect > multiple actual files. > > The more indirect way would be to use btrfs restore with the -t option, > feeding it the root address associated with the transid found (with that > association traced via btrfs-find-root), to restore the file from the > filesystem as it existed at that point, to some other mounted filesystem, > also using the restore metadata option. You could then do for instance a > diff of the listing (or possibly a per-file checksum, say md5sum, of both > versions) between your current backup (or current mounted filesystem, > since you can still mount it) and the restored version, which would be > the files at the time of that transaction-id, and see which ones > changed. That of course would be the affected files. =:^] > I think I'll try the btrfs restore as a learning exercise, and to check the contents of my backup (I don't trust my memory, so something could have changed since the last backup). Does btrfs restore require the path to be on a btrfs filesystem? I've got an existing ext4 drive with enough free space to do the restore, so would prefer to use it than have to buy another drive. My plan is: * btrfs restore /dev/sdX /path/to/ext4/restorepoint ** Where /dev/sdX is one of the two drives that were part of the raid1 fileystem * hashdeep audit the restored drive and backup * delete the existing corrupted btrfs filesystem and recreate * rsync the merge filesystem (from backup and restore) on to the new filesystem Any comments or suggestions are welcome. > > I do have a backup of the drive (which I believe is completely up to > > date, the btrfs volume is used for archiving media and documents, and > > single person use of git repositories, i.e. only very light writing and > > reading). > > Of course either one of the above is going to be quite some work, and if > you have a current backup, simply
Re: Fixing recursive fault and parent transid verify failed
On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: > Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: > > > I think I'll try the btrfs restore as a learning exercise, and to check > > the contents of my backup (I don't trust my memory, so something could > > have changed since the last backup). > > Trying btrfs restore is an excellent idea. It'll make things far easier > if you have to use it for real some day. > > Note that while I see your kernel is reasonably current (4.2 series), I > don't know what btrfs-progs ubuntu ships. There have been some marked > improvements to restore somewhat recently, checking the wiki btrfs-progs > release-changelog list says 4.0 brought optional metadata restore, 4.0.1 > added --symlinks, and 4.2.3 fixed a symlink path check off-by-one error. > (And don't use 4.1.1 as its mkfs.btrfs is broken and produces invalid > filesystems.) So you'll want at least progs 4.0 to get the optional > metadata restoration, and 4.2.3 to get full symlinks restoration support. > Ubuntu 15.10 comes with btrfs-progs v4.0. It looks like it is easy enough to compile and install the latest version from git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git so I'll do that. Should I stick to 4.2.3 or use the latest 4.3.1? > > Does btrfs restore require the path to be on a btrfs filesystem? I've > > got an existing ext4 drive with enough free space to do the restore, so > > would prefer to use it than have to buy another drive. > > Restoring to ext4 should be fine. > > Btrfs restore writes files as would an ordinary application, the reason > metadata restoration is optional (otherwise it uses normal file change > and mod times, with files written as the running user, root, using umask- > based file perms, all exactly the same as if it were a normal file > writing application), so it will restore to any normal filesystem. The > filesystem it's restoring /from/ of course must be btrfs... unmounted > since it's designed to be used when mounting is broken, but it writes > files normally, so can write them to any filesystem. > > FWIW, I restored to my reiserfs based media partition (still on spinning > rust, my btrfs are all on ssd) here, since that's where I had the room to > work with. > Thanks for the confirmation. > > My plan is: > > > > * btrfs restore /dev/sdX /path/to/ext4/restorepoint > > ** Where /dev/sdX is one of the two drives that were part of the raid1 > >fileystem > > * hashdeep audit the restored drive and backup > > * delete the existing corrupted btrfs filesystem and recreate > > * rsync the merge filesystem (from backup and restore) > > on to the new filesystem > > > > Any comments or suggestions are welcome. > > > Looks very reasonable, here. There's a restore page on the wiki with > more information than the btrfs-restore manpage, describing how to use it > with btrfs-find-root if necessary, etc. > > https://btrfs.wiki.kernel.org/index.php/Restore > I'd seen this, but it isn't explicit about the target filesystem support. I should try and update the page a bit. > Some details on the page are a bit dated; it doesn't cover the dryrun, > list-roots, metadata and symlink options, for instance, and these can be > very helpful, but the general idea remains the same. > > The general idea is to use btrfs-find-root to get a listing of available > root generations (if restore can't find a working root from the > superblocks or you want to try restoring an earlier root), then feed the > corresponding bytenr to restore's -t option. > > Note that generation and transid refer to the same thing, a normally > increasing number, so higher generations are newer. The wiki page makes > this much clearer than it used to, but the old wording anyway was > confusing to me until I figured that out. > > Where the wiki page talks about root object-ids, those are the various > subtrees, low numbers are the base trees, 256+ are subvolumes/snapshots. > Note that restore's list-roots option lists these for the given bytenr as > well. > > So you try restore with list-roots (-l) to see what it gives you, try > btrfs-find-root if not satisfied, to find older generations and get their > bytenrs to plug into restore with -t, and then confirm specific > generation bytenrs with list-roots again. > > Once you have a good generation/bytenr candidate, try a dry-run (-D) to > see if you get a list of files it's trying to restore that looks > reasonable. > > If the dry-run goes well, you can try the full restore, not forgetting > the metadata and symlinks options (-m, -S, respectively), if desired. > > From there you can continue with your plan as above. > > One more bonus hint. Since you'll be doing a new mkfs.btrfs, it's a good > time to review active features and decide which ones you might wish to > activate (or not, if you're concerned about old-kernel compatibility). > Additionally, before
Re: Fixing recursive fault and parent transid verify failed
On 12/07/2015 02:57 PM, Alistair Grant wrote as excerpted: > Fixing recursive fault, but reboot is needed For the record: I saw the same message (incl. hard lockup) when doing a balance on a single-disk btrfs. Besides that, the fs works flawlessly (~60GB, usage: no snapshots, ~15 lxc containers, low-load databases, few mails, a couple of Web servers). As this is a production machine, I rather rebooted the machine instead of investigating but the error is reproducible if that would be of great interest. > I've ran btrfs scrub and btrfsck on the drives, with the output > included below. Based on what I've found on the web, I assume that a > btrfs-zero-log is required. > > * Is this the recommended path? > * Is there a way to find out which files will be affected by the loss of > the transactions? > Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6) I used Debian Backports 4.2.6. Cheers, Lukas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fixing recursive fault and parent transid verify failed
Hi, (Resending as it looks like the first attempt didn't get through, probably too large, so logs are now in dropbox) I have a btrfs volume which is raid1 across two spinning rust disks, each 2TB. When trying to access some files from a another machine using sshfs the server machine has crashed twice resulting in a hard lock up, i.e. power off required to restart the machine. There are no crash dumps in /var/log/syslog, or anything that looks like an associated error message to me, however on the second occasion I was able to see the following message flash up the console (in addition to some stack dumps): Fixing recursive fault, but reboot is needed I've ran btrfs scrub and btrfsck on the drives, with the output included below. Based on what I've found on the web, I assume that a btrfs-zero-log is required. * Is this the recommended path? * Is there a way to find out which files will be affected by the loss of the transactions? I do have a backup of the drive (which I believe is completely up to date, the btrfs volume is used for archiving media and documents, and single person use of git repositories, i.e. only very light writing and reading). Some basic details: OS: Ubuntu 15.10 Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6) > sudo btrfs fi df /srv/d2root == Data, RAID1: total=250.00GiB, used=248.86GiB Data, single: total=8.00MiB, used=0.00B System, RAID1: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=1.00GiB, used=466.77MiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=160.00MiB, used=0.00B > sudo btrfs fi usage /srv/d2root = Overall: Device size: 3.64TiB Device allocated:502.04GiB Device unallocated:3.15TiB Device missing: 0.00B Used:498.62GiB Free (estimated): 1.58TiB (min: 1.58TiB) Data ratio: 2.00 Metadata ratio: 1.99 Global reserve: 160.00MiB (used: 0.00B) Data,single: Size:8.00MiB, Used:0.00B /dev/sdc8.00MiB Data,RAID1: Size:250.00GiB, Used:248.86GiB /dev/sdb 250.00GiB /dev/sdc 250.00GiB Metadata,single: Size:8.00MiB, Used:0.00B /dev/sdc8.00MiB Metadata,RAID1: Size:1.00GiB, Used:466.77MiB /dev/sdb1.00GiB /dev/sdc1.00GiB System,single: Size:4.00MiB, Used:0.00B /dev/sdc4.00MiB System,RAID1: Size:8.00MiB, Used:64.00KiB /dev/sdb8.00MiB /dev/sdc8.00MiB Unallocated: /dev/sdb1.57TiB /dev/sdc1.57TiB btrfs scrub output: https://www.dropbox.com/s/blqvopa1lhkghe5/scrub.log?dl=0 btrfsck sdb output: https://www.dropbox.com/s/hw6w6cupuu1rny4/btrfsck.sdb.log?dl=0 btrfsck sdc output: https://www.dropbox.com/s/mijz492mjr76p8z/btrfsck.sdc.log?dl=0 Thanks very much, Alistair -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html