>From: Mikael Carneholm <[EMAIL PROTECTED]>
>Sent: Jul 16, 2006 6:52 PM
>To: [email protected]
>Subject: [PERFORM] RAID stripe size question
>
>I have finally gotten my hands on the MSA1500 that we ordered some time
>ago. It has 28 x 10K 146Gb drives,
>
Unless I'm missing something, the only FC or SCSI HDs of ~147GB capacity are
15K, not 10K.
(unless they are old?)
I'm not just being pedantic. The correct, let alone optimal, answer to your
question depends on your exact HW characteristics as well as your SW config and
your usage pattern.
15Krpm HDs will have average access times of 5-6ms. 10Krpm ones of 7-8ms.
Most modern HDs in this class will do ~60MB/s inner tracks ~75MB/s avg and
~90MB/s outer tracks.
If you are doing OLTP-like things, you are more sensitive to latency than most
and should use the absolute lowest latency HDs available within you budget.
The current latency best case is 15Krpm FC HDs.
>currently grouped as 10 (for wal) + 18 (for data). There's only one controller
>(an emulex), but I hope
>performance won't suffer too much from that. Raid level is 0+1,
>filesystem is ext3.
>
I strongly suspect having only 1 controller is an I/O choke w/ 28 HDs.
28HDs as above setup as 2 RAID 10's => ~75MBps*5= ~375MB/s, ~75*9= ~675MB/s.
If both sets are to run at peak average speed, the Emulex would have to be able
to handle ~1050MBps on average.
It is doubtful the 1 Emulex can do this.
In order to handle this level of bandwidth, a RAID controller must aggregate
multiple FC, SCSI, or SATA streams as well as down any RAID 5 checksumming etc
that is required.
Very, very few RAID controllers can do >= 1GBps
One thing that help greatly with bursty IO patterns is to up your battery
backed RAID cache as high as you possibly can. Even multiple GBs of BBC can be
worth it. Another reason to have multiple controllers ;-)
Then there is the question of the BW of the bus that the controller is plugged
into.
~800MB/s is the RW max to be gotten from a 64b 133MHz PCI-X channel.
PCI-E channels are usually good for 1/10 their rated speed in bps as Bps.
So a PCI-Ex4 10Gbps bus can be counted on for 1GBps, PCI-Ex8 for 2GBps, etc.
At present I know of no RAID controllers that can singlely saturate a PCI-Ex4
or greater bus.
...and we haven't even touched on OS, SW, and usage pattern issues.
Bottom line is that the IO chain is only as fast as its slowest component.
>Now to the interesting part: would it make sense to use different stripe
>sizes on the separate disk arrays?
>
The short answer is Yes.
WAL's are basically appends that are written in bursts of your chosen log chunk
size and that are almost never read afterwards. Big DB pages and big RAID
stripes makes sense for WALs.
Tables with OLTP-like characteristics need smaller DB pages and stripes to
minimize latency issues (although locality of reference can make the optimum
stripe size larger).
Tables with Data Mining like characteristics usually work best with larger DB
pages sizes and RAID stripe sizes.
OS and FS overhead can make things more complicated. So can DB layout and
access pattern issues.
Side note: a 10 HD RAID 10 seems a bit much for WAL. Do you really need
375MBps IO on average to your WAL more than you need IO capacity for other
tables?
If WAL IO needs to be very high, I'd suggest getting a SSD or SSD-like device
that fits your budget and having said device async mirror to HD.
Bottom line is to optimize your RAID stripe sizes =after= you optimize your OS,
FS, and pg design for best IO for your usage pattern(s).
Hope this helps,
Ron
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match