Here is a potential performance problem uncovered
by a set of benchmark tests.
Namely two SCSI adaptors don't speed up raid 0/1 perceptibly
in comparison to the performance of a single SCSI adaptor system.
(and under dual CPU system for that matter .)

Let me explain. This is lengthy containing the benchmark result, etc.

A certain Mr. Sagai did a comparison of disk system performance on
Linux using various combination of
IDE  and SCSI disks, making RAID 0 and 1 combinations
using raid tools and tested the configuration on
single and dual CPU systems.
He published the benchmark results in a Japanese monthly magazine,
"Software Design" in two consecutive issues.
The came out in July and August this year.
The test was very exhaustive in terms of
the various configuration set up and I would like to
thank him for his time as a linux user at home.

One conclusion in his magazine article based on the benchmark results
was that there was NO discernable speedup when
TWO (2) SCSI host adaptors that have one SCSI disk each were used
in setting up RAID disk in comparison to the performance
observed with only ONE (1) SCSI host adaptor with two SCSI disks on
the bus.

I thought that the conclusion was rather COUNTER-INTUITIVE.
For the last 15 years or so, it was rather rule of a thumb to
add a SCSI host adapter and balance the load to
speed up the disk system throughput. I saw magazine articles where
people added scsi host adaptors for better performance.

So I said to myself something was fishy here and
set out to find out what might be the cause of this
counter-intuitive result.

Rereading the articles (the article ran in two issues. He did a very
throught testing and it must have been a lot of work.)
I thought I detected a possible weakness in the performance
measurement.

He used Bonnie (and some other tests) to test the system throughput,
but in the case of Bonnie, he didn't specify
large enough temporary file size. Small test  file size may mean
that the file buffers in memory can cushion the I/O requests and
may not  excersize the disk very much.

So I wrote him an e-mail, and mentioned my
surprise about the counter-intuitive result.
I suggested that the increasing of the test file size
may result in a very different Bonnie benchmark and
his tentative conclusion might  have to be changed.
Mr. Sagai promptly wrote back and told me that
he himself was surprised and disappointed at the result and
promised he would run the test again using the larger file size
and ask the magazine to carry additional article reporting the
new test results. This was in August.

One month later, he wrote back and told me the new article
would be forth-coming mis-October.
And more importantly, again there was NO discernable speedup (!).
Bonnie and iozone were used.
He was kind enough to let me use his preliminary results before
publication
to investigate, er.., rather ask the veterans of linux-scsi (and
linux-kernel)
to investigate the cause of the problem.

I am reporting his results for your information and analysis in this
e-mail.

- BTW,  I don't know if Bonnie is a good performance measurement tool
today
   although it certainly shows that striping makes the disk system faster
on Linux
   and  at that level, it is useful.
  (I already run this discussion by a friend of mine who found out about
   the software striping speed gain, etc.. He doesn't have much to offer
   about this counter-intuitive result, though.
   BTW, this friend of mine found that software RAID using fast CPU
   beats the hardware RAID under light conditions, and as a matter of
fact,
   all the test configurations he threw at it! Obviously 400Mhz CPU can
   do processing faster than sub-100 Mhz i960 and other CPUs on the
   hardware RAID cards. )

- Seriously, I recall vaguely that there was a discussion in linux-scsi?
about
   less parallelism in SCSI subsystem (or whatever)
   as there could be due to some serialization bottleneck or somethig.
   If I recall correctly Linus himself jumped in the discussion and
mentioned
   that was one reason why he did't like the current SCSI system.

   This reduced lack of parallelism (?) or something might be the cause
   of the not-so-optimum result Mr. Sagai observed..

Since the benchmark results were obtained for raid 0/1 in the case of
single and dual CPUs, I think that the chance of experiencing
a REAL performance bottleneck in the kernel is high.

Maybe we have found a new performance challenge like
the ones that emerged after Mindcraft benchmark incident.
(Or we may find other explanations.)

[BTW, Mr. Sagain himeself seems to think that
SMP ought to add performance boost at least
in certain configurations. Come to think of it why not.
However, Mr. Sagai mentioned a monitoring tool xosview showed
that CPU (?) seems to be
accessing the disk one at a time during testing even where there are
two adapters. I don't know if this was to be expected or not.
For me tthe surprise was that the addition of the SCSI adaptor to balance
the load
ought to increase the performance a little bit but didn't seem to do so
by
large margin .]


So for your analysis, here is the gist of the revised short benchmark
article.

---------------------------------------------------
Here is Mr. Sagai's system configuration.

Kernel: 2.2.10 + raid0134-19990724-2.2.10.gz patch
M/B: Abit BP6
MEMORY: 128MB SDRAM CL=2
SCSI: I/O Data SC-UPCI(Symbios 53C875) x 2
HDD: IBM DHEA 34330(Linux kernel, etc. is here)
   IBM DCAS 34330UW(4.3GB)(RAID disk 1)
    IBM DDRS 34560UW(4.5GB)(RAID disk2)
VIDEO: MGA G100 AGP 8MB
NETWORK: ELECOM Laneed LD-10/100 AN(DEC DC21140)

(Right. He had to use similar but
different SCSI disks for RAID. The older model was sold out
when he tried to buy another one and thus he had to use the newer
available model for his testing. Remember he did the test in his spare
time.)

His test method.

He set up Software RAID using
two SCSI disks.

He benchmarked the following configurations:

   RAID 0 (striping), and RAID 1 (mirroring.)

for both single and dual CPU set up.

So there are 4 combinations.

He ran Bonnie (512MB test file size) and iozone
for measuring the performance.

The results are attached.

>From the results, he thinks that
two SCSI adaptors (and SMP) does NOT  seem to
offer the perceptible performance gain. The same conclusion.

One possibility he mentions was that the benchmark
test programs themselves are not quite
compatible with today's hardware resource setting and
multithreaded version of the benchmark programs probably
fare better.
(CI's comment: All I can think of is that the kernel ought to
better utlize the added bandwidth offered by two SCSI adapters.)


(1) BONNIE results

1-a. Standalone disk (no RAID. )

(He must have used a Japanized version of Bonnie.
The header lines are presented in Japanese characters.
I translate them by hand below. So they may look slightly different from
the original Bonnie output. )

                               -------sequentail write-------
--sequential read-             -random-
                              -char-          -block-        -rewrite--
    -char-             -block-         --seek--
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
K/sec %CPU
DCAS       512  5474      70.4  7317        9.8   2876       7.2  5701
  68.4  7734        7.1    67.9       1.0
DDRS       512  6792      87.3  9825       12.5  4473     11.5  7413
89.4 12715     11.0    94.3       1.3


(1-b) RAID 0 / singl CPU case.

The case of two (2) SCSI adaptors (SC UPCI x2) and
the case of single SCSI adaptor    (SC UPCI x 1)  are presented.

                               -------sequentail write-------
--sequential read-             -random-
                              -char-          -block-        -rewrite--
    -char-             -block-         --seek--
Machine       MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
K/sec %CPU
SC UPCIx2  512  7586    95.5 16702     18.8   4871     10.5  6533
77.5 15604     14.0 104.4     0.9
SC UPCIx1  512  7547    95.2 17350      20.0  4914     10.7  6683
79.5 15543     12.2 108.9     1.0

(1-c) RAID 0 / dual CPU case.

The case of two (2) SCSI adaptors (SC UPCI x2) and
the case of single SCSI adaptor    (SC UPCI x 1)  are presented.


                               -------sequentail write-------
--sequential read-             -random-
                              -char-          -block-        -rewrite--
    -char-             -block-         --seek--
Machine       MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
K/sec %CPU
SC UPCIx2  512  7502     96.0 16650     20.6  4915     12.4   6455
77.5 15554     14.5    99.5      1.5
SC UCPIx1  512  7399     95.6 15918     21.9  5018     13.2   6612
79.7 15491     15.3  101.9      1.6

(1-c) RAID 1 / single CPU

The case of two (2) SCSI adaptors (SC UPCI x2) and
the case of single SCSI adaptor    (SC UPCI x 1)  are presented.


                               -------sequentail write-------
--sequential read-             -random-
                              -char-          -block-        -rewrite--
    -char-             -block-         --seek--
Machine      MB  K/sec %CPU K/sec %CPU K/sec %CPU  K/sec %CPU K/sec %CPU
K/sec %CPU
SC UPCIx2  512  4851    62.2   6301       7.1    2506      5.6    5675
  68.4   6304       5.5    80.7      1.2
SC UPCIx1  512  4948    63.7   6539       7.5    2485      5.6    5699
  69.1   6299       5.6    80.6      1.1

(1-d) RAID 1 / dual CPU

The case of two (2) SCSI adaptors (SC UPCI x2) and
the case of single SCSI adaptor    (SC UPCI x 1)  are presented.


                               -------sequentail write-------
--sequential read-             -random-
                              -char-          -block-        -rewrite--
    -char-             -block-         --seek--
Machine       MB  K/sec %CPU K/sec %CPU K/sec %CPU  K/sec %CPU K/sec %CPU
K/sec %CPU
SC UPCIx2  512  4753     62.8   6627       8.9        2513  6.9
5457 66.9        6310  6.5    81.0      0.9
SC UPCIx1  512  4985     66.3   6639       9.5        2497  7.3
5447 66.5        6263  6.9    80.5      1.5

(The above was Bonnie results...)
----
(Below is IOZONE test results.)

2. IOZONE test results.

(2-a) Standalone Disk.

512MB file      write   /   read throughput
DCAS         8.3 MB/s  7.2 MB/s
DDRS         9.8 MB/s 11.4 MB/s

(2-b) RAID 0 / single CPU

 (Sorry, I forgot to ask.The two rows probably have the same meaning.
Above is for two adaptor cases.)
512MB file    write     /   read    throughput
                  17.3 MB/s 14.3 MB/s
                  17.8 MB/s 14.4 MB/s

(2-c) RAID 0 / dual CPU

 (Sorry, I forgot to ask.The two rows probably have the same meaning.
Above is for two adaptor cases.)

512MB file     write     /   read   throughput
                   17.3 MB/s 14.4 MB/s
                   16.5 MB/s 13.9 MB/s

(2-d) RAID 1 / Single CPU

(Sorry, I forgot to ask.The two rows probably have the same meaning.
Above is for two adaptor cases.)
512MB file write / read throughput
                     6.7 MB/s  5.9 MB/s
                     6.9 MB/s  5.9 MB/s

(2-e) RAID 1 / dual CPU

(Sorry, I forgot to ask.The two rows probably have the same meaning.
Above is for two adaptor cases.)

512MB file write / read throughput
                     6.3 MB/s  6.4 MB/s
                     6.9 MB/s  5.9 MB/s

(above is iozone benchmark results.)



That is all.
I suspect there is an artificial serialization bottleneck somwhere
and this keeps the kernel to take advantage of the added bandwidth of
the two SCSI bus on separate controllers...
(of course, it is possible some other bottleneck is already is in effect.
Maybe at this load level
there is indeed no difference.)

Since I though there may be kernel issues I include
linux-kernel in this post (but I am not on it.).
Probably the follow-up should be
to linux-scsi only?

Chiaki Ishikawa

A happy linux user at home





-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to