On 2/17/09 11:52 PM, "Rajesh Kumar Mallah" <mallah.raj...@gmail.com> wrote:
the raid10 voulme was benchmarked again taking in consideration above points Effect of ReadAhead Settings disabled,256(default) , 512,1024 xfs_ra0 414741 , 66144 xfs_ra256 403647, 545026 all tests on sda6 xfs_ra512 411357, 564769 xfs_ra1024 404392, 431168 looks like 512 was the best setting for this controller Try 4096 or 8192 (or just to see, 32768), you should get numbers very close to a raw partition with xfs with a sufficient readahead value. It is controller dependant for sure, but I usually see a "small peak" in performance at 512 or 1024, followed by a dip, then a larger peak and plateau at somewhere near # of drives * the small peak. The higher quality the controller, the less you need to fiddle with this. I use a script that runs fio benchmarks with the following profiles with readahead values from 128 to 65536. The single reader STR test peaks with a smaller readahead value than the concurrent reader one (2 ot 8 concurrent sequential readers) and the mixed random/sequential read loads become more biased to sequential transfer (and thus, higher overall throughput in bytes/sec) with larger readahead values. The choice between the cfq and deadline scheduler however will affect the priority of random vs sequential reads more than the readahead - cfq favoring random access due to dividing I/O by time slice. The FIO profiles I use for benchmarking are at the end of this message. Considering these two figures xfs25 350661, 474481 (/dev/sda7) 25xfs 404291 , 547672 (/dev/sda6) looks like the beginning of the drives are 15% faster than the ending sections , considering this is it worth creating a special tablespace at the begining of drives For SAS drives, its typically a ~15% to 25% degradation (the last 5% is definitely slow). For SATA 3.5" drives the last 5% is 50% the STR as the front. Graphs about half way down this page show what it looks like for a typical SATA drive: http://www.tomshardware.com/reviews/Seagate-Barracuda-1-5-TB,2032-5.html And a couple figures for some SAS drives here http://www.storagereview.com/ST973451SS.sr?page=0%2C1 > > If testing STR, you will also want to tune the block device read ahead value > (example: /sbin/blockdev -getra > /dev/sda6). This has very large impact on sequential transfer performance > (and no impact on random access). >How large of an impact depends quite a bit > on what kernel you're on since the readahead code has been getting >better > over time and requires less tuning. But it still defaults out-of-the-box to > more optimal settings for a single >drive than RAID. > For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 > for raid 10). For SATA, try 1024 or >2048 * the number of effective > spindles. The value is in blocks (512 bytes). There is documentation on the > >blockdev command, and here is a little write-up I found with a couple web > searches: >http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command FIO benchmark profile examples (long, posting here for the archives): *Read benchmarks, sequential: [read-seq] ; one sequential reader reading one 64g file rw=read size=64g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 numjobs=1 nrfiles=1 runtime=1m group_reporting=1 exec_prerun=echo 3 > /proc/sys/vm/drop_caches [read-seq] ; two sequential readers, each concurrently reading a 32g file, for a total of 64g max rw=read size=32g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 numjobs=2 nrfiles=1 runtime=1m group_reporting=1 exec_prerun=echo 3 > /proc/sys/vm/drop_caches [read-seq] ; eight sequential readers, each concurrently reading a 8g file, for a total of 64g max rw=read size=8g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 numjobs=8 nrfiles=1 runtime=1m group_reporting=1 exec_prerun=echo 3 > /proc/sys/vm/drop_caches *Read benchmarks, random 8k reads. [read-rand] ; random access on 2g file by single reader, best case scenario. rw=randread size=2g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 numjobs=1 nrfiles=1 group_reporting=1 runtime=1m exec_prerun=echo 3 > /proc/sys/vm/drop_caches [read-rand] ; 8 concurrent random readers each to its own 1g file rw=randread size=1g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 numjobs=8 nrfiles=1 group_reporting=1 runtime=1m exec_prerun=echo 3 > /proc/sys/vm/drop_caches *Mixed Load: [global] ; one random reader concurrently with one sequential reader. directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 runtime=1m exec_prerun=echo 3 > /proc/sys/vm/drop_caches [seq-read] rw=read size=64g numjobs=1 nrfiles=1 [read-rand] rw=randread size=1g numjobs=1 nrfiles=1 [global] ; Four sequential readers concurrent with four random readers directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 runtime=1m group_reporting=1 exec_prerun=echo 3 > /proc/sys/vm/drop_caches [read-seq] rw=read size=8g numjobs=4 nrfiles=1 [read-rand] rw=randread size=1g numjobs=4 nrfiles=1 *Write tests [write-seq] rw=write size=32g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync iodepth=1 numjobs=1 nrfiles=1 runtime=1m group_reporting=1 end_fsync=1 [write-rand] rw=randwrite size=32g directory=/data/test fadvise_hint=0 blocksize=8k direct=0 ioengine=sync ; overwrite= 1 is MANDATORY for xfs, otherwise the writes are sparse random writes and can slow performance to near zero. Postgres only does random re-writes, never sparse random writes. overwrite=1 iodepth=1 numjobs=1 nrfiles=1 group_reporting=1 runtime=1m end_fsync=1;