Re: harmless (?) error

Jan Edler Wed, 18 Aug 1999 07:49:48 -0700
I've been following these threads on sw raid over hw raid, etc., with
some curiosity.  I also did testing with a Mylex DAC1164P, in my case
using 8 IBM Ultrastar 18ZX drives (10000rpm).

I get the following bonnie results on that system, just using hw raid,
for sequential input, sequential output, and random seeks, using the
default 8k/64k stripe/segment size, and write-back cache (32MB):

                  input      output     random
                MB/s  %cpu MB/s  %cpu   /s   %cpu

1drive-jbod     19.45 16.3 17.99 16.4 153.90 4.0
raid0           48.49 42.1 25.48 23.1 431.00 7.4
raid01          53.23 41.4 21.22 19.0 313.10 9.5
raid5           52.47 39.3 21.35 19.8 365.60 11.2
raid5-degraded  20.23 15.5 21.86 20.3 277.90 7.8

That was using 3 busses, but the numbers for raid0 and raid5 weren't
very different when I first tried them with all 8 drives on 1 bus.

For comparison, here are my sw raid numbers using 8 much cheaper
ATA drives (Seagate ST317242A) and Ultra-33 controllers
(1 drive per controller, 16k chunksize):

                  input      output     random
                MB/s  %cpu MB/s  %cpu   /s   %cpu

1drive          15.40 12.6 14.03 11.9 101.50 2.9
raid0-low       48.73 49.7 36.77 33.8 242.60 7.0
raid0-high      61.57 64.4 37.18 33.7 227.70 7.5
raid1-8drives   20.63 20.1 4.26 4.0 180.90 6.3
raid1-2drives   15.31 12.7 14.36 13.1 103.80 3.3
raid10          27.11 30.4 18.42 17.2 191.60 6.8
raid5           40.43 38.6 30.32 26.8 209.10 6.3
raid5-degraded  33.46 31.0 31.40 28.1 164.20 4.9

On all platforms, I see quite a bit of variation from run to run,
so my numbers are optimistic: I run several times (usually 10),
and take the best results.  I could (and sometimes do) compute
mean and standard deviation, but I find taking the best results to be
most "useful" for analysis and comparison.

I also skewed the numbers towards best results in another way: I generally
report results only for the outer cylinders (low numbered cylinders).
Well, I also test the inner cylinders for comparison.  The difference
is much less pronounced for the IBM SCSI drives than for the Seagate
ATA drives.  I do this for the simple reason that I want to know what the
maximum performance is.  I also want the minimum, so I check that too.
There's an anomaly in the software raid0 numbers, where the low cylinders
seem to perform worse than the high cylinders, which is the opposite
of what happens in all other cases.   Any ideas on this one?  It's very
repeatable for me, and disturbing since it involves the very highest
throughput numbers I've measured.  If there's enough interest, I can
present more low vs. high cylinder results, but this message is long
enough as it is.

I hacked my copy of bonnie in two ways: to skip the per-char tests,
which I'm not interested in, and to call fdatasync() after each write
test (but that change didn't really make much difference).  I use
an awk script to pick the best results from multiple runs, and to
convert KB/s to MB/s.

The sw raid measurements here were taken on a P3/450 system, and the
hw raid was done on a P2/400 system.  I compared both processors
on sw raid5, and found maybe 1% difference in performance.
The machines were otherwise identical, with 512MB ram.
I used uniprocessor mode, rather than SMP mode, to improve repeatibility
(and because the uniform-ide patches cause hangups for me in SMP mode).

I don't know why your sw over hw raid numbers are so poor by comparison.
Did you try plain hw raid, as above?

My hypothesis is that the mylex itself, or the kernel + driver,
is limited in the number of requests/s that can be handled.

I speculate that sequential throughput is not very important to the
companies developing and buying raid systems.  Rather, some sort
of database transactions are what drives this industry, where the
workload is much more driven by random I/O.  In that case, what you
want more than anything else is a whole lot of disks, because the
random I/O rate will go up.  The fact that my random results in bonnie
don't scale up anywhere near linear with the number of drives is
a weakness in this theory.  Perhaps the current Linux kernel/driver
structure is not well suited to tons of random I/Os.

Still, I consider the lack of higher rates to be an unsolved mystery.

Jan Edler
NEC Research Institute
Re: harmless (?) error

Reply via email to