Re: Poor IO performance on a 10 node cluster.

Ted Dunning Wed, 01 Jun 2011 21:06:34 -0700

It is also worth using dd to verify your raw disk speeds.

Also, expressing disk transfer rates in bytes per second makes it a bit
easier for most of the disk people I know to figure out what is large or
small.

Each of these disks disk should do about 100MB/s when driven well.  Hadoop
does OK, but not nearly full capacity so I would expected more like
40-50MB/s/disk.  Also, if one of your disks is doing double duty you may
have some extra cost.  40 x 2 x 10 = 800MB per second.  You are doing 100GB
/ 220 seconds = 0.5 GB / s which isn't so terribly bad.  It is definitely
less than the theoretically possible 100 x 2 x 10 = 2GB/second and I would
expect you could tune this up a little, but not a massive amount.

In general, blade servers do not make good Hadoop nodes exactly because the
I/O performance tends to be low when you only have a few spindle.

One other reason that this might be a bit below expectations is that your
files may not be well distributed on your cluster.  Can you say what you
used to upload the files?

On Wed, Jun 1, 2011 at 5:56 PM, hadoopman <hadoop...@gmail.com> wrote:

> Some things which helped us include setting your vm.swappiness to 0 and
> mounting your disks with noatime,nodiratime options.
>
> Also make sure your disks aren't setup with RAID (JBOD is recommended)
>
> You might want to run terasort as you tweak your environment.  It's very
> helpful when checking if a change helped (or hurt) your cluster.
>
> Hope that helps a bit.
>
> On 05/30/2011 06:27 AM, Gyuribácsi wrote:
>
>>
>> Hi,
>>
>> I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk, 16 HT
>> cores).
>>
>> I've uploaded 10 files to HDFS. Each file is 10GB. I used the streaming
>> jar
>> with 'wc -l' as mapper and 'cat' as reducer.
>>
>> I use 64MB block size and the default replication (3).
>>
>> The wc on the 100 GB took about 220 seconds which translates to about 3.5
>> Gbit/sec processing speed. One disk can do sequential read with 1Gbit/sec
>> so
>> i would expect someting around 20 GBit/sec (minus some overhead), and I'm
>> getting only 3.5.
>>
>> Is my expectaion valid?
>>
>> I checked the jobtracked and it seems all nodes are working, each reading
>> the right blocks. I have not played with the number of mapper and reducers
>> yet. It seems the number of mappers is the same as the number of blocks
>> and
>> the number of reducers is 20 (there are 20 disks). This looks ok for me.
>>
>> We also did an experiment with TestDFSIO with similar results. Aggregated
>> read io speed is around 3.5Gbit/sec. It is just too far from my
>> expectation:(
>>
>> Please help!
>>
>> Thank you,
>> Gyorgy
>>
>>
>
>

Re: Poor IO performance on a 10 node cluster.

Reply via email to