On Apr 10, 2009, at 9:40 AM, Stas Oskin wrote:

Hi.


Depends. What hardware? How much hardware? Is the cluster under load? What does your I/O load look like? As a rule of thumb, you'll probably
expect very close to hardware speed.


Standard Xeon dual cpu, quad core servers, 4 GB RAM.
The DataNodes also do some processing, with usual loads about ~4 (from 8
recommended). The IO load is linear, there are almost no write or read
peaks.


Interesting -- machines are fairly RAM-poor for data processing ... I guess your tasks must be fairly efficient.

By close to hardware speed, you mean results very near the results I get via
iozone?

Depends on what kind of I/O you do - are you going to be using MapReduce and co-locating jobs and data? If so, it's possible to get close to those speeds if you are I/O bound in your job and read right through each chunk. If you have multiple disks mounted individually, you'll need the number of streams equal to the number of disks. If you're going to do I/O that's not through MapReduce, you'll probably be bound by the network interface.

Brian

Reply via email to