- - posted on Thu, 23 Apr 2015 19:30:56 +0200 as excerpted: > Hello, > > I had a 3 disk raid5 system with btrfs installed. Unfortunately one of > the disks crashed. Now I cannot mount the system any more, not even with > the degraded option. I suspect the failed disk to have a hw failure. I > Think part of the problem might be that I configured the system to not > only have the data and metadata, but also the system data in raid > config. Is there any chance that I might get my data back from the file > system? > > Currently the system does not boot any more. It was a debian testing > system with btrfs version 3.17. The kernel was originally > 3.16.0-4-amd64, but now I also have 3.19.0-trunk-amd64 installed. > > When I run btrfs fi show, I get an error message: > Check tree block failed, want=<big number>, have=<another big number> > read block failed check_tree_block Couldn' t read chunk root warning, > device 2 is missing > > Sorry, I cannot copy/paste as the machine does not boot anymore. > > Can anyone give me some help or can explain to me what other kind of > info you need? Thanks.
Full recovery support for btrfs raid5 is very *VERY* new. Kernel 3.19 was the first version that was supposed to have it at all, and due to the newness, it could be expected to be buggy, so you should really have 4.0 and be prepared to upgrade kernels pretty quickly for a few releases until the raid56 mode support matures a bit. Before 3.19, normal raid56 modes runtime was there, but support for recovery wasn't complete, so in effect you were running a slow raid0, with effectively no available protection against device failure at all (parity was calculated and written, the runtime side, but the code to use it for recovery was incomplete). So first off, for btrfs raid5 recovery, forget kernels previous to 3.19 and preferably use 4.0. Second, try a similarly current userspace. I'm not actually sure on userspace raid5 status, but 3.17 is certainly not current userspace, and given the newness of raid5 recovery support, I'd strongly recommend 3.19 or 3.19.1 (current as of two days ago at least) userspace as well, just to be sure. Beyond that... I'm running raid1 mode here and have only followed raid56 mode development at a certain distance, so my help will be limited. However... Third, I'm not sure if the wiki (https://btrfs.wiki.kernel.org) has been well updated for raid56 or not, but the user-level guy with the most testing and experience with it (pre-full-recovery-support, at least) is Marc MERLIN, and there should be a link from the wiki's raid56 discussion to his blog, which has FAR more detail, altho as I said, some of it may be a bit dated now if he hasn't updated. But that's likely to be some of the better help you can get. Forth... those "big numbers" you mentioned are probably generation aka transaction-id numbers. The generation/transid is a monotonically increasing number bumped every time the root block is updated, which is every 30 seconds (by default) if anything has changed on the btrfs. So on an active btrfs around for any length of time, yes, it'll be a "big number". But because it's monotonically increasing, the difference between the wanted and have values can give you a hint at how bad the situation is. If wanted is only a bit higher, the generations are fairly close and the chances of recovery are reasonably good. If wanted is a LOT higher, then you may well still be able to recover, but the number of files that may revert to old copies is higher. If wanted is LOWER than have, you probably hit the bug from a couple kernels ago that was resetting generation. That's an entirely different situation with its own recovery scenario. Fifth, on the wiki there's a (somewhat dated last I looked) writeup on using btrfs-find-root and btrfs restore, to try to recover files from an unmounted filesystem, writing them to some other location as it finds them. This tool doesn't write anything to the damaged btrfs, so unlike other tools, has no chance of making the damage worse. You can use it to pull files off the filesystem, if you don't have a current backup (which you certainly should have had of btrfs raid5, given that before 3.19 it was effectively btrfs raid0, if you placed any value on the data at all, but unfortunately, people often learn about the importance of backups the hard way). The general idea is that you find a good generation using find-root, and then feed that to restore if the current generation isn't usable, to get as current a valid version of your files as possible. *BUT*, I'm not entirely sure of btrfs restore's ability to work with raid5, that being so new. Hopefully it works and you're good, but... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html