Re: unable to fixup (regular) error

Duncan Mon, 26 Nov 2018 23:21:02 -0800

Alexander Fieroch posted on Mon, 26 Nov 2018 11:23:00 +0100 as excerpted:

> Am 26.11.18 um 09:13 schrieb Qu Wenruo:
>> The corruption itself looks like some disk error, not some btrfs error
>> like transid error.
> 
> You're right! SMART has an increased value for one harddisk on
> reallocated sector count. Sorry, I missed to check this first...
> 
> I'll try to salvage my data...


FWIW as a general note about raid0 for updating your layout...

Because raid0 is less reliable than a single device (failure of any 
device of the raid0 is likely to take it out, and failure of any one of N 
is more likely than failure of any specific single device), admins 
generally consider it useful only for "throw-away" data, that is, data 
that can be lost without issue either because it really /is/ "throw-
away" (internet cache being a common example), or because it is 
considered a "throw-away" copy of the "real" data stored elsewhere, with 
that "real" copy being either the real working copy of which the raid0 is 
simply a faster cache, or with the raid0 being the working copy, but with 
sufficiently frequent backup updates that if the raid0 goes, it won't 
take anything of value with it (read as the effort to replace any data 
lost will be reasonably trivial, likely only a few minutes or hours, at 
worst perhaps a day's worth, of work, depending on how many people's work 
is involved and how much their time is considered to be worth).

So if it's raid0, you shouldn't be needing to worry about trying to 
recover what's on it, and probably shouldn't even be trying to run a 
btrfs check on it at all as it's likely to be more trouble and take more 
time than the throw-away data on it is worth.  If something goes wrong 
with a raid0, just declare it lost, blow it away and recreate fresh, 
restoring from the "real" copy if necessary.  Because for an admin, 
really with any data but particularly for a raid0, it's more a matter of 
when it'll die than if.

If that's inappropriate for the value of the data and status of the 
backups/real-copies, then you should really be reconsidering whether 
raid0 of any sort is appropriate, because it almost certainly is not.


For btrfs, what you might try instead of raid0, is raid1 metadata at 
least, raid0 or single mode data if there's not room enough to do raid1 
data as well.  And the raid1 metadata would have very likely saved the 
filesystem in this case, with some loss of files possible depending on 
where the damage is, but with the second copy of the metadata from the 
good device being used to fill in for and (attempt to, if the bad device 
is actively getting worse it might be a losing battle) repair any 
metadata damage on the bad device, thus giving you a far better chance of 
saving the filesystem as a whole.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

Re: unable to fixup (regular) error

Reply via email to