Re: [PATCH] btrfs-progs: Make RAID stripesize configurable

Austin S. Hemmelgarn Fri, 22 Jul 2016 10:23:02 -0700

On 2016-07-22 12:06, Sanidhya Solanki wrote:

On Fri, 22 Jul 2016 10:58:59 -0400
"Austin S. Hemmelgarn" <ahferro...@gmail.com> wrote:

On 2016-07-22 09:42, Sanidhya Solanki wrote:

+*stripesize=<number>*;;
+Specifies the new stripe size for a filesystem instance. Multiple BTrFS
+filesystems mounted in parallel with varying stripe size are supported, the 
only
+limitation being that the stripe size provided to balance in this option must
+be a multiple of 512 bytes, and greater than 512 bytes, but not larger than
+16 KiBytes. These limitations exist in the user's best interest. due to sizes 
too
+large or too small leading to performance degradations on modern devices.
+
+It is recommended that the user try various sizes to find one that best suit 
the
+performance requirements of the system. This option renders the RAID instance 
as
+in-compatible with previous kernel versions, due to the basis for this 
operation
+being implemented through FS metadata.
+

I'm actually somewhat curious to see numbers for sizes larger than 16k.
In most cases, that probably will be either higher or lower than the
point at which performance starts suffering.  On an set of fast SSD's,
that's almost certainly lower than the turnover point (I can't give an
opinion on BTRFS, but for DM-RAID, the point at which performance starts
degrading significantly is actually 64k on the SSD's I use), while on a
set of traditional hard drives, it may be as low as 4k (yes, I have
actually seen systems where this is the case).  I think that we should
warn about sizes larger than 16k, not refuse to use them, especially
because the point of optimal performance will shift when we get proper
I/O parallelization.  Or, better yet, warn about changing this at all,
and assume that if the user continues they know what they're doing.


I agree with you from a limited point of view. Your considerations are
relevant for a more broad, but general, set of circumstances.

My consideration is worst case scenario, particularly on SSDs, where,
say, you pick 8KiB or 16 KiB, write out all your data, then delete a
block, which will have to be read-erase-written on a multi-page level,
usually 4KiB in size.

I don't know what SSD's you've been looking at, but the erase block sizeon all of the modern NAND MLC based SSD's I've seen is between 1 and 8megabytes, so it would lead to at most a single erase block beingrewritten. Even most of the NAND SLC based SSD's I've seen have atleast a 64k erase block. Overall, the only case this is reasonablygoing to lead to a multi-page rewrite is if the filesystem isn'tproperly aligned, which is not a likely situation for most people.


On HDDs, this will make the problem of fragmenting even worse. On HDDs,
I would only recommend setting stripe block size to the block level
(usually 4KiB native, 512B emulated), but this just me focusing on the
worst case scenario.

And yet, software RAID implementations do fine with larger stripe sizes.On my home server, I'm using BTRFS in RAID1 mode on top of LVM managedDM-RAID0 volumes, and I actually have gone through testing every powerof 2 stripe size in this configuration for the DM-RAID volumes from 1kup to 64k. I get peak performance using a 16k stripe size, and theperformance actually falls off faster at lower sizes than it does athigher ones (at least, within the range I checked). I've seen similarresults on all the server systems I manage for work as well, so it's notjust consumer hard drives that behave like this.


Maybe I will add these warnings in a follow-on patch, if others agree
with these statements and concerns.

The other part of my issue with this which forgot to state is that twotypes of people are likely to use this feature:1. Those who actually care about performance and are willing to testmultiple configurations to find an optimal one.2. Those who claim to care about performance, but either just twiddlethings randomly or blindly follow advice from others without reallyknowing for certain what they're doing.The only people settings like this actually help to a reasonable degreeare in the first group. Putting a upper limit on the stripe size catersto protecting the second group (who shouldn't be using this to beginwith) at the expense of the first group. This doesn't affect datasafety (or at least, it shouldn't), it only impacts performance, thesystem is still usable even if this is set poorly, so the value oftrying to make it resistant to stupid users is not all that great.

Additionally, unless you have numbers to back up 16k being the practicalmaximum on most devices, then it's really just an arbitrary number,which is something that should be avoided in management tools.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: Make RAID stripesize configurable

Reply via email to