Re: Convert from RAID 5 to 10

Austin S. Hemmelgarn Tue, 29 Nov 2016 11:42:13 -0800

On 2016-11-29 14:03, Lionel Bouton wrote:

Hi,


Le 29/11/2016 à 18:20, Florian Lindner a écrit :

[...]

* Any other advice? ;-)


Don't rely on RAID too much... The degraded mode is unstable even for
RAID10: you can corrupt data simply by writing to a degraded RAID10. I
could reliably reproduce this on a 6 devices RAID10 BTRFS filesystem
with a missing device. It affected even a 4.8.4 kernel where our
PostgreSQL clusters got frequent write errors (on the fs itself but not
the 5 working devices) and managed to corrupt their data. Have backups,
you probably will need them.

With Btrfs RAID If you have a failing device, replace it early (monitor
the devices and don't wait for them to fail if you get transient errors
or see worrying SMART values). If you have a failed device, don't
actively use the filesystem in degraded mode. Replace or delete/add
before writing to the filesystem again.

This is an excellent point I didn't think of. If you don't have someway you can monitor things, don't trust RAID (not just BTRFS raid modes,but any RAID like system in general). The only reason I'm willing totrust it is because I have really good monitoring set up (SMART statuson the disks + daily scrubs + hourly event counter checks on the FS +watching for changes to filesystem flags + a couple of other things)which will e-mail me the moment something starts to go bad (and I'vejumped through hoops to get the mailing to work under almost anycircumstances as long as userspace still exists and has network access).

I can confirm though that things work well with BTRFS raid1 mode for atleast the following:* Basic, mostly static, network services (DHCP server, DNS relay, webserver serving static content, very low volume postfix installation, etc).* Moderate disk usage in very sequential usage patterns (BOINCapplications in my case, but almost anything replacing files orappending in reasonably sized chunks semi-regularly falls into this).* Infrequent typical usage for software builds (I run Gentoo, sosystem updates = building software, and I've never had any issues withthis (at least, not any issues because of BTRFS)).

 * Bulk sequential streaming of data (stuff like multimedia recordings).

In all cases except the last (which I've only had some limited recentexperience with), I've had BTRFS raid1 mode filesystems survive justfine through:* 3 bad PSU's (common case for this is that you see filesystem andstorage device errors tracing down to the disks at rates proportionateto the overall load on the system)* 7 different storage devices going bad (1 catastrophic mechanicalfailure, 1 connector failure (poor soldering job for the connector), 2disk controller failures, and 3 media failures)

 * 2 intermittently bad storage controllers
 * 100+ kernel panics/crashes

All with no issues with data corruption (there was corruption, but BTRFSsafely handled all of it and fixed it, and actually helped me diagnosetwo of the bad PSU's and one of the bad storage controllers). 90% ofthe reason it's survived all this though is because of the monitoring Ihave in place which let me track down exactly what was wrong and fix itbefore it became an issue.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Convert from RAID 5 to 10

Reply via email to