On 13/12/2010 15:17, cwillu wrote:

In a few weeks parts for my new computer will be arriving. The storage
will be a 128GB SSD. A few weeks after that I will order three large
disks for a RAID array. I understand that BTRFS RAID 5 support will be
available shortly. What is the best possible way for me to get the
highest performance out of this setup. I know of the option to optimize
for SSD's

BTRFS is hardly the best option for SSDs. I typically use ext4
without a journal on SSDs, or ext2 if that is not available.
Journalling causes more writes to hit the disk, which wears out
flash faster. Plus, SSDs typically have much slower writes than
reads, so avoiding writes is a good thing.

Gordan, this you wrote is so wrong I don't even know where to begin.

You'd better google a bit on the subject (ssd, and btrfs on ssd) as much
is written about it already.

I suggest you back your opinion up with some hard data before making such
statements. Here's a quick test - make an ext2 fs and a btrfs on two similar
disk partitions (any disk, for the sake of the experiment it doesn't have to
be an ssd), then check vmstat -d to get a base line. Then put the kernel
sources on each it, do a full build, then make clean and check vmstat -d
again. Check the vmstat -d output again. See how many writes (sectors) hit
the disk with ext2 and how many with btrfs. You'll find that there were many
more writes with BTRFS. You can't go faster when doing more. Journaling is
expensive.

Of course.  But that applies to rotating media as well (where the
seeks involved hurt much more), and has little if anything to do with
why you would use btrfs instead of ext2.

Indeed - btrfs is about features, most specifically the chesumming that allows smart recovery from disk media failure. But on flash, write volumes are something that shouldn't be ignored.

Good ssd drives (by which I mean anything but consumer flash as it
exists on sd cards and usb sticks) have very good wear leveling, good
enough that you could overwrite the same logical sector billions of
times before you'd experience any failure due to wear.

It comes down to volumes even in the best case scenario. A _very_ good SSD (e.g. Intel) might get write amplification down to about 1.2:1, but more typical figures are in the region of 10-20:1. Every write that can be avoided, should be avoided.

The issues
with cheaper ssd drives (which I distinguish from things like sd
cards) are uniformly performance degredation due to crappy garbage
collection and lack of trim support to compensate.  A journal is _not_
a problem here.

The journal doesn't help. It can cause more than a 50% overhead on metadata-heavy operations.

On crappy flash, yes, you want to avoid a journal, mainly because the
write leveling for a given sector only occurs over a fixed small
number of erase blocks, resulting in a filesystem that you can burn
out quite easily — I have a small pile of sd cards on my desk that I
sent to such a fate.  Even here there is reason to use btrfs.  The
journaling performed is much less strenuous that ext3/4:  it's
basically just a version stamp, as opposed to actually journaling the
metadata involved.  The actual metadata writes, being copy-on-write,
provide pretty much the best case for crappy flash, as cow inherently
wear-levels over the entire device (ssd_spread).  To say nothing of
checksums and duplicated metadata, allowing you to actually determine
if you're running into corrupted metadata, and often recover from it
transparently.  Ext2's behavior in this respect is less than ideal.

I'm not disputing that, but the OP was talking about using the SSD as a cache for a slower disk subsystem. That is likely to waste the SSD pretty quickly purely by volume of writes, regardless of how good the wear leveling is. That may be fine on a setup where the SSD is treated as disposable throw-away cache item that doesn't lose you data when it goes wrong, but what was being discussed isn't an expensive enterprise grade setup that behaves that way.

Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to