md RAID

Solor Vox Thu, 08 Apr 2010 14:16:34 -0700

Hey all,

I'm going to warn you beforehand and say that this message is
technical and academic discussions of the inner-workings of md-RAID
and file systems.  If you haven't had your morning coffee or don't
want a headache, please stop reading now. :)


If you're still here, I've been trying to work out the optimal chunk
size, stripe width, and stride for a 6TB RAID-5 array I'm building.

For hardware, I've got 4x1.5TB Samsung SATA2 drives.  I'm going to use
Linux md in RAID-5 configuration.  Primary use for this box is HD
video and DVD  storage.  So for argument's sake, lets say that of the
usable 4.5TB, 4TB is for large 8GB and up files.  I also plan on
either ext4 or xfs.

One last thing to get out of the way is meaning of all the block
sizes.  Unfortunately, people tend to use “block size” to mean many
different things.  So to prevent this, I'm going to use the following.

Stride – number of bytes written to disk before moving to next in array.
Stripe width – stride size * data disks in array, so 3 in my case.
Chunk size – File system “block size” or bytes per inode.
Page size – Linux kernel cache page size, almost always 4KB on x86 hardware

Now comes the fun part, picking the correct values for creating the
array and file-system.  The arguments for this are very academic and
very specific for intended use.  Typically most people try for
“position” optimization by picking a FS chunk size that matches the
RAID stripe width.  By matching the array, you reduce the number of
read/writes to access each file.  While this works in theory, you
can't ensure that the stripe is written perfectly across the array.
And unless your chunk size matches your page size, the operation isn't
atomic anyway.

The other method is “transfer” optimization where you make the FS
chunk sizes smaller ensuring that files are broken up across the
array.  The theory here is that using more then one drive at a time to
read the file will increase transfer performance.  This however
increases the number of read/write operations needed for the same size
file with larger chunks.

Things get even more fun when LVM is thrown into the mix.  As LVM will
create a physical volume that contains logical volumes.  The FS is
then put on the LV so trying to align the FS to the array no longer
makes sense.  You can set the metasize for PV so it is aligned with
the array.  So the assumption here is that the FS should be aligned
with the PV.

While this all may seem like a bit much, getting it right can mean an
extra 30-50MB/s or more from the array.  So, has anyone done this type
of optimization?  I'd really rather not spend a week(s) testing
different values as 6TB arrays can take several hours to build.

Cheers,
sV

md RAID

Reply via email to