Chris Murphy posted on Sun, 08 Feb 2015 15:43:35 -0700 as excerpted: > Confusing is that sdd1, sdi1, sdg1 have gen 0 and also have corruptions > reported, just not anywhere near as many as sdc1. So I don't know what > problems you have with your hardware, but they're not restricted to just > one or two drives. Generation 0 makes no sense to me.
There was a btrfs bug along about kernel 3.17 (or was it 3.18), that reset the generation count. Several people posted weird generation outputs within a few days, and then there was a fast bug-fix that went into the new kernel and stable for the previous kernel (which IIRC was the only one affected), almost before we'd figured out that the reports were connected and it was a definite bug! The good news is that the devs are getting MUCH faster at fixing these things -- the first widespread bug from a few kernels ago took just over a full kernel cycle to trace down and fix, the next one was fixed in the same kernel cycle, this one was traced and fixed so fast that even tho we had a number of reports, the devs found and fixed the bug at about the same time most list users even became aware that there was a bug in the first place! The bad news is that because it was found and fixed so fast, you evidently weren't aware of it at all or you'd have seen it in these zero- gen reports here, and while I was aware of it, I have less details than on the other bugs. Meanwhile, more good news in that there's a fix, and btrfs check from the latest btrfs-progs 3.18.x should have no problems fixing it -- on an otherwise normal btrfs, anyway. But the bad news is that I don't know if it'll handle it with the one missing device and one failing, as seems to be the case here. So to the problem at hand, and mostly addressing constantine (OP) now... The lack of a good backup in this case is... unfortunate. I'd be surprised if there's any list regulars that trust the "still stabilizing" btrfs without backups for any data they value, yet. As any good sysadmin (and by that I mean anyone administering systems, professional or not, IOW, anyone making this sort of decision is a sysadmin) knows, if the data's not backed up (and a good sysadmin knows a backup isn't complete until it's tested to whatever level they're comfortable with restoring from), by definition, the data is considered of low enough value that it's not worth the backup hassle and the admin is prepared to do without it as a fair price to pay for avoiding the hassle of backup. And that's the general rule, applying even to stable and mature filesystems. Because btrfs is still maturing and isn't yet fully stable, the rule applies to btrfs more than ever, so indeed, if it's not backed up, by definition, you don't care about losing the data. Which interpreted strictly, means since there wasn't a backup here, the data is by definition of actions of the person responsible, worth less than the hassle of backing it up, which by definition means it's loss is no big deal, no matter any claims to the contrary. That being the case, in the strict sense, call it a loss and be on to something more important. But back here dealing in reality... well let's just try to retrieve what can be retrieved. As Chris Murphy has said, that means backing up what you can now. If at all possible leave the existing filesystem as it is when you do that (and preferably mount it read-only for now, to minimize the chance of anything else going bad on it in the mean time), which means... if you have to get out the credit card to get enough storage to create a backup... well, it's that or very high risk of losing even more of the data by taking another drive out of the array and using it for backup. This is serious rock and hard place time, now. What fails to backup should show in the log. Hopefully it's not too much and not too valuable. If you can do without it, great. Otherwise, again as Chris suggested, try btrfs restore on the unmounted filesystem... and hope. Because worked properly, restore can let you recover earlier states of the files in question, you may be able to do that for otherwise unrecoverable files, even if you can't get the current version. Personally, I'd try those before trying the checksums reset he suggested. Because that's writing to the damaged filesystem, which of course means possibly losing more if things go wrong. Only if both normal backup of what's possible, and restore, can't retrieve critical files, would I even bother with the checksum zeroing and further rescue efforts. Meanwhile, when you're done and everything's settled down again, regardless of whether you stick with btrfs or revert to a more mature filesystem, for the love of your data, PLEASE backup anything you consider important. Even if you don't regularly update those backups, a backup of most of it losing the last month or six of work still gives you SOMEWHERE to start! Because if it's not backed up, as you've found out, it really does mean you consider the risk of its loss of lower value than the time/hassle/media-money you'd spend backing it up. But I doubt I have to tell you that... now! Meanwhile, even if it's all lost as horrible as that would be, NOT all is lost, because it's a lesson most people don't have to learn twice, but quite a few do end up learning once the hard way as most admins will tell you if they're being honest! So look at it this way, you'll be a much better sysadmin from now on, having learned this lesson! And regardless of how bad the loss might be, it's only data. You didn't lose your house in a fire; you didn't just have your doctor tell you you'll never walk again due to this accident, and you're not looking at prison for driving head-on, drunk, into a bus full of people. Put in perspective against the TRULY important things in life, you're actually still in pretty good shape, even if you lose all the data, but it actually looks like you have a good chance at recovering most of it, so you're in even BETTER shape! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html