Re: How to replace a failed drive in btrfs RAID 1 filesystem

waxhead Sat, 10 Mar 2018 01:38:12 -0800

Austin S. Hemmelgarn wrote:

On 2018-03-09 11:02, Paul Richards wrote:
Hello there,
I have a 3 disk btrfs RAID 1 filesystem, with a single failed drive.
Before I attempt any recovery I’d like to ask what is the recommended
approach?  (The wiki docs suggest consulting here before attempting
recovery[1].)

The system is powered down currently and a replacement drive is being
delivered soon.

Should I use “replace”, or “add” and “delete”?

Once replaced should I rebalance and/or scrub?

I believe that the recovery may involve mounting in degraded mode.  If
I do this, how do I later get out of degraded mode, or if it’s
automatic how do i determine when I’m out of degraded mode?
It won't automatically mount degraded, you either have to explicitly askit to, or you have to have an option to do so in your default mountoptions for the volume in /etc/fstab (which is dangerous for multiplereasons).
Now, as to what the best way to go about this is, there are three thingsto consider:
1. Is the failed disk still usable enough that you can get good data offof it in a reasonable amount of time? If you're replacing the diskbecause of a lot of failed sectors, you can still probably get data offof it, while something like a head crash isn't worth trying to get databack.2. Do you have enough room in the system itself to add another diskwithout removing one?
3. Is the replacement disk at least as big as the failed disk?
If the answer to all three is yes, then just put in the new disk, mountthe volume normally (you don't need to mount it degraded if the faileddisk is working this well), and use `btrfs replace` to move the data.This is the most efficient option in terms of both time and is alsogenerally the safest (and I personally always over-spec drive-bays insystems we build where I work specifically so that this approach can beused).
If the answer to the third question is no, put in the new disk (removingthe failed one first if the answer to the second question is no), mountthe volume (mount it degraded if one of the first two questions is no,normally otherwise), then add the new disk to the volume with `btrfsdevice add` and remove the old one with `btrfs device delete` (using the'missing' option if you had to remove the failed disk). This is neededbecause the replace operation requires the new device to be at least asbig as the old one.
If the answer to either one or two is no but the answer to three is yes,pull out the failed disk, put in a new one, mount the volume degraded,and use `btrfs replace` as well (you will need to specify the device IDfor the now missing failed disk, which you can find by calling `btrfsfilesystem show` on the volume). In the event that the replaceoperation refuses to run in this case, instead add the new disk to thevolume with `btrfs device add` and then run `btrfs device deletemissing` on the volume.
If you follow any of the above procedures, you don't need to balance(the replace operation is equivalent to a block level copy and willresult in data being distributed exactly the same as it was before,while the delete operation is a special type of balance), and yougenerally don't need to scrub the volume either (though it may still bea good idea). As far as getting back from degraded mode, you can justremount the volume to do so, though I would generally suggest rebooting.
Note that there are three other possible approaches to consider as well:
1. If you can't immediately get a new disk _and_ all the data will fiton the other two disks, use `btrfs device delete` to remove the faileddisk anyway, and run with just the two until you can get a new disk.This is exponentially safer than running the volume degraded until youget a new disk, and is the only case you realistically should delete adevice before adding the new one. Make sure to balance the volume afteradding the new device.2. Depending on the situation, it may be faster to just recreate thewhole volume from scratch using a backup than it is to try to repair it. This is actually the absolute safest method of handling thissituation, as it makes sure that nothing from the old volume with thefailed disk causes problems in the future.3. If you don't have a backup, but have some temporary storage spacethat will fit all the data from the volume, you could also use `btrfsrestore` to extract files from the old volume to temporary storage,recreate the volume, and copy the data back in from the temporary storage.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

I did a quick scan of the wiki just to see, but I did not find any goodinfo about how to recover a "RAID" like set if degraded. Informationabout how to recover, and what profiles can be recovered from would begood to have (with examples) in a separate "how-to" on the Wiki.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to replace a failed drive in btrfs RAID 1 filesystem

Reply via email to