Seth Vidal writes:
> I have an odd question. Where I work we will, in the next year, be in a
> position to have to process about a terabyte or more of data. The data is
> probably going to be shipped on tapes to us but then it needs to be read
> from disks and analyzed. The process is segmentable so its reasonable to
> be able to break it down into 2-4 sections for processing so arguably only
> 500gb per machine will be needed. I'd like to get the fastest possible
> access rates from a single machine to the data. Ideally 90MB/s+
>
> So were considering the following:
>
> Dual Processor P3 something.
> ~1gb ram.
> multiple 75gb ultra 160 drives - probably ibm's 10krpm drives
> Adaptec's best 160 controller that is supported by linux.
>
> The data does not have to be redundant or stable - since it can be
> restored from tape at almost any time.
>
> so I'd like to put this in a software raid 0 array for the speed.
>
> So my questions are these:
> Is 90MB/s a reasonable speed to be able to achieve in a raid0 array
> across say 5-8 drives?
> What controllers/drives should I be looking at?
Here are actual benchmarks from one of my systems.
dbench:
2 Throughput 123.637 MB/sec (NB=154.546 MB/sec 1236.37 MBit/sec)
4 Throughput 109.7 MB/sec (NB=137.126 MB/sec 1097 MBit/sec)
32 Throughput 77.7743 MB/sec (NB=97.2178 MB/sec 777.743 MBit/sec)
64 Throughput 64.3793 MB/sec (NB=80.4741 MB/sec 643.793 MBit/sec)
Bonnie:
---Sequential Output ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
2000 9585 99.1 51312 26.0 28675 45.3 9224 94.8 81720 73.3 512.2 4.2
That is with a Dell 4400, 2 x 600 MHz Pentium III Coppermine CPUs
with 256K cache, 1GB RAM, one 64-bit 66MHz PCI bus (and one 33MHz
PCI bus). Disk subsystem is a built-in Adaptec 7899 dual-160MB
channel with 8 Quantum ATLAS IV 9MB SCA disks attached to one
channel. The benchmarks above were done on an ext2 filesystem with
4KB blocksize and stride 16 created on a 7-way stripe of the above
disks using software RAID (0.9 on kernel 2.2.x) with 64KB chunksize.
If you use both SCSI channels and use 36GB disks (let alone 75GB
ones), you'll get 480GB of disk without even needing to plug another
SCSI card in. With larger disks or another SCSI card or two you could
go larger/faster.
--Malcolm
--
Malcolm Beattie <[EMAIL PROTECTED]>
Unix Systems Programmer
Oxford University Computing Services