Re: dup vs raid1 in single disk

Austin S. Hemmelgarn Thu, 19 Jan 2017 12:03:40 -0800

On 2017-01-19 13:23, Roman Mamedov wrote:

On Thu, 19 Jan 2017 17:39:37 +0100
"Alejandro R. Mosteo" <alejan...@mosteo.com> wrote:

I was wondering, from a point of view of data safety, if there is any
difference between using dup or making a raid1 from two partitions in
the same disk. This is thinking on having some protection against the
typical aging HDD that starts to have bad sectors.


RAID1 will write slower compared to DUP, as any optimization to make RAID1
devices work in parallel will cause a total performance disaster for you as
you will start trying to write to both partitions at the same time, turning
all linear writes into random ones, which are about two orders of magnitude
slower than linear on spinning hard drives. DUP shouldn't have this issue, but
still it will be twice slower than single, since you are writing everything
twice.

As of right now, there will actually be near zero impact on writeperformance (or at least, it's way less than the theoretical 50%)because there really isn't any optimization to speak of in themulti-device code. That will hopefully change over time, but it's notlikely to do so any time in the future since nobody appears to beworking on multi-device write performance.


You could consider DUP data for when a disk is already known to be getting bad
sectors from time to time -- but then it's a fringe exercise to try and keep
using such disk in the first place. Yeah with DUP data DUP metadata you can
likely have some more life out of such disk as a throwaway storage space for
non-essential data, at half capacity, but is it worth the effort, as it's
likely to start failing progressively worse over time.

In all other cases the performance and storage space penalty of DUP within a
single device are way too great (and gained redundancy is too low) compared
to a proper system of single profile data + backups, or a RAID5/6 system (not
Btrfs-based) + backups.

That really depends on your usage. In my case, I run DUP data on singledisks regularly. I still do backups of course, but the performance isworth far less for me (especially in the cases where I'm using NVMeSSD's which have performance measured in thousands of MB/s for bothreads and writes) than the ability to recover from transient datacorruption without needing to go to a backup.

As long as /home and any other write heavy directories are on a separatepartition, I would actually advocate using DUP data on your rootfilesystem if you can afford the space simply because it's a whole loteasier to recover other data if the root filesystem still works. Mostof the root filesystem except some stuff under /var follows a WORMaccess pattern, and even the stuff that doesn't in /var is usually notperformance critical, so the write performance penalty won't haveanywhere near as much impact on how well the system runs as you might think.

There's also the fact that you're writing more metadata than data mostof the time unless you're dealing with really big files, and metadata isalready DUP mode (unless you are using an SSD), so the performance hitisn't 50%, it's actually a bit more than half the ratio of data writesto metadata writes.

On a related note, I see this caveat about dup in the manpage:

"For example, a SSD drive can remap the blocks internally to a single
copy thus deduplicating them. This negates the purpose of increased
redunancy (sic) and just wastes space"


That ability is vastly overestimated in the man page. There is no miracle
content-addressable storage system working at 500 MB/sec speeds all within a
little cheap controller on SSDs. Likely most of what it can do, is just
compress simple stuff, such as runs of zeroes or other repeating byte
sequences.

Most of those that do in-line compression don't implement it infirmware, they implement it in hardware, and even DEFLATE can get 500MB/second speeds if properly implemented in hardware. The firmware maycontrol how the hardware works, but it's usually hardware doing heavylifting in that case, and getting a good ASIC made that can hit therequired performance point for a reasonable compression algorithm likeLZ4 or Snappy is insanely cheap once you've gotten past the VLSI work.


And the DUP mode is still useful on SSDs, for cases when one copy of the DUP
gets corrupted in-flight due to a bad controller or RAM or cable, you could
then restore that block from its good-CRC DUP copy.

The only window of time during which bad RAM could result in only onecopy of a block being bad is after the first copy is written but beforethe second is, which is usually an insanely small amount of time. Asfar as the cabling, the window for errors resulting in a single bad copyof a block is pretty much the same as for RAM, and if they'repersistently bad, you're more likely to lose data for other reasons.

That said, I do still feel that DUP mode has value on SSD's. Theprimary arguments against it are:

1. It wears out the SSD faster.

2. The blocks are likely to end up in the same erase block, andtherefore there will be no benefit.

The first argument is accurate, but not usually an issue for mostpeople. Average life expectancy for a decent SSD is well over 10 years,which is more than twice the usual life expectancy for a consumer harddrive. Putting it in further perspective, the 575GB SSD's have beenrunning essentially 24/7 for the past year and a half (13112 hourspowered on now), and have seen just short of 25.7TB of writes over thattime. This equates to roughly 2GB/hour, which is well within typicaldesktop usage. It also means they've seen more than 44.5 times theirtotal capacity in writes. Despite this, the wear-out indicators allshow that I can still expect at least 9 years more of run-time on these.Normalizing that, that means I'm likely to see between 8 and 12 yearsof life on these. Equivalent stats for the HDD's I used to use (NASrated Seagate drives) gave me a roughly 3-5 year life expectancy, lessthan half that of the SSD. In both cases however, you're talking wellbeyond the typical life expectancy of anything short of a server or atight-embedded system, and worrying about a 4-year versus 8-year lifeexpectancy on your storage device is kind of pointless when you need toupgrade the rest of the system in 3 years.

As far as the second argument against it, that one is partially correct,but ignores an important factor that many people who don't do hardwaredesign (and some who do) don't often consider. The close temporalproximity of the writes for each copy are likely to mean they end up inthe same erase block on the SSD (especially if the SSD has a large writecache). However, that doesn't mean that one getting corrupted due todevice failure is guaranteed to corrupt the other. The reason for thisis exactly the same reason that single word errors in RAM areexponentially more common than losing a whole chip or the whole memorymodule: The primary error source is environmental noise (EMI, cosmicrays, quantum interference, background radiation, etc), not systemfailure. In other words, you're far more likely to lose a single cell(which is usually not more than a single byte in the MLC flash that getsused in most modern SSD's) in the erase block than the whole eraseblock. In that event, you obviously have only got corruption in theparticular filesystem block that that particular cell was storing data for.


There's also a third argument for not using DUP on SSD's however:
The SSD already does most of the data integrity work itself.

This is only true of good SSD's, but many do have some degree ofbuilt-in erasure coding in the firmware which can handle losing largechunks of an erase block and still return the data safely. This is partof the reason that you almost never see nice power-of-two sizes forflash Storage despite flash chips being made that way them,selves (theother part is the spare blocks). Depending on the degree of protectionprovided by this erasure coding, it can actually cancel out my argumentagainst argument 2. In all practicality though, that requires you toactually trust the SSD manufacturer to have implemented things properlyfor it to be a valid counter-argument, and most people who would careenough about data integrity to use BTRFS for that reason are not likelyto trust the storage device that much.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dup vs raid1 in single disk

Reply via email to