M/R is performance is known to be better when using just a bunch of disks (BOD) 
instead of RAID.

>From your setup it looks like your single datanode must be running hot on I/O 
>activity.

The parameter- dfs.datanode.handler.count only control the number of datanode 
threads serving IPC request.
These are NOT used for actual block transfer. Try upping - 
dfs.datanode.max.xcievers.

You can then run the I/O  benchmarks to measure the I/O throughput -
jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 -fileSize 
1000

-...@nkur

On 3/30/10 12:46 PM, "Ed Mazur" <ma...@cs.umass.edu> wrote:

Hi,

I have a 12 node cluster where instead of running a DN on each compute
node, I'm running just one DN backed by a large RAID (with a
dfs.replication of 1). The compute node storage is limited, so the
idea behind this was to free up more space for intermediate job data.
So the cluster has that one node with the DN, a master node with the
JT/NN, and 10 compute nodes each with a TT. I am running 0.20.1+169.68
from Cloudera.

The problem is that MR job performance is now worse than when using a
traditional HDFS setup. A job that took 76 minutes before now takes
169 minutes. I've used this single DN setup before on a
similarly-sized cluster without any problems, so what can I do to find
the bottleneck?

-Loading data into HDFS was fast, under 30 minutes to load ~240GB, so
I'm thinking this is a DN <-> map task communication problem.

-With a traditional HDFS setup, map tasks were taking 10-30 seconds,
but they now take 45-90 seconds or more.

-I grep'd the DN logs to find how long the size 67633152 HDFS reads
(map inputs) were taking. With the central DN, the reads were an order
of magnitude slower than with traditional HDFS (e.g. 82008147000 vs.
8238455000).

-I tried increasing dfs.datanode.handler.count to 10, but this didn't
seem to have any effect.

-Could low memory be an issue? The machine the DN is running on only
has 2GB and there is less than 100MB free without the DN running. I
haven't observed any swapping going on though.

-I looked at netstat during a job. I wasn't too sure what to look for,
but I didn't see any substantial send/receive buffering.

I've tried everything I can think of, so I'd really appreciate any tips. Thanks.

Ed

Reply via email to