Hey all, I'm going to warn you beforehand and say that this message is technical and academic discussions of the inner-workings of md-RAID and file systems. If you haven't had your morning coffee or don't want a headache, please stop reading now. :)
If you're still here, I've been trying to work out the optimal chunk size, stripe width, and stride for a 6TB RAID-5 array I'm building. For hardware, I've got 4x1.5TB Samsung SATA2 drives. I'm going to use Linux md in RAID-5 configuration. Primary use for this box is HD video and DVD storage. So for argument's sake, lets say that of the usable 4.5TB, 4TB is for large 8GB and up files. I also plan on either ext4 or xfs. One last thing to get out of the way is meaning of all the block sizes. Unfortunately, people tend to use “block size” to mean many different things. So to prevent this, I'm going to use the following. Stride – number of bytes written to disk before moving to next in array. Stripe width – stride size * data disks in array, so 3 in my case. Chunk size – File system “block size” or bytes per inode. Page size – Linux kernel cache page size, almost always 4KB on x86 hardware Now comes the fun part, picking the correct values for creating the array and file-system. The arguments for this are very academic and very specific for intended use. Typically most people try for “position” optimization by picking a FS chunk size that matches the RAID stripe width. By matching the array, you reduce the number of read/writes to access each file. While this works in theory, you can't ensure that the stripe is written perfectly across the array. And unless your chunk size matches your page size, the operation isn't atomic anyway. The other method is “transfer” optimization where you make the FS chunk sizes smaller ensuring that files are broken up across the array. The theory here is that using more then one drive at a time to read the file will increase transfer performance. This however increases the number of read/write operations needed for the same size file with larger chunks. Things get even more fun when LVM is thrown into the mix. As LVM will create a physical volume that contains logical volumes. The FS is then put on the LV so trying to align the FS to the array no longer makes sense. You can set the metasize for PV so it is aligned with the array. So the assumption here is that the FS should be aligned with the PV. While this all may seem like a bit much, getting it right can mean an extra 30-50MB/s or more from the array. So, has anyone done this type of optimization? I'd really rather not spend a week(s) testing different values as 6TB arrays can take several hours to build. Cheers, sV