I set dfs.datanode.max.xcievers to 4096, but this didn't seem to have any effect on performance.
Here are some benchmarks (not sure what typical values are): ----- TestDFSIO ----- : write Date & time: Tue Mar 30 04:53:18 EDT 2010 Number of files: 10 Total MBytes processed: 10000 Throughput mb/sec: 23.41355598064167 Average IO rate mb/sec: 25.179018020629883 IO rate std deviation: 7.022948102609891 Test exec time sec: 74.437 ----- TestDFSIO ----- : read Date & time: Tue Mar 30 05:02:01 EDT 2010 Number of files: 10 Total MBytes processed: 10000 Throughput mb/sec: 10.735545929349373 Average IO rate mb/sec: 10.741226196289062 IO rate std deviation: 0.24872891783558398 Test exec time sec: 119.561 ----- TestDFSIO ----- : write Date & time: Tue Mar 30 05:09:59 EDT 2010 Number of files: 40 Total MBytes processed: 40000 Throughput mb/sec: 3.3887489806219473 Average IO rate mb/sec: 5.173769950866699 IO rate std deviation: 6.293246618896401 Test exec time sec: 360.765 ----- TestDFSIO ----- : read Date & time: Tue Mar 30 05:18:20 EDT 2010 Number of files: 40 Total MBytes processed: 40000 Throughput mb/sec: 2.345990558443698 Average IO rate mb/sec: 2.3469674587249756 IO rate std deviation: 0.04731737036312141 Test exec time sec: 477.568 I also used 40 files in the benchmarks because I have 10 compute nodes with mapred.tasktracker.map.tasks.maximum set to 4. It looks like performance degrades quite a bit when switching from 10 files. I set mapred.tasktracker.map.tasks.maximum to 1 and ran a MR job. This got map completion times back down to the expected 15-30 seconds, but did not change the overall running time. Does this just mean that the RAID isn't able to keep up with 10*4=40 parallel requests, but it is able to keep up with 10*1=10 parallel requests? And if so, is there anything I can do to change this? I know this isn't how HDFS is meant to be used, but this single DN/RAID setup has worked for me in the past on a similarly-sized cluster. Ed On Tue, Mar 30, 2010 at 4:29 AM, Ankur C. Goel <gan...@yahoo-inc.com> wrote: > > M/R is performance is known to be better when using just a bunch of disks > (BOD) instead of RAID. > > From your setup it looks like your single datanode must be running hot on I/O > activity. > > The parameter- dfs.datanode.handler.count only control the number of datanode > threads serving IPC request. > These are NOT used for actual block transfer. Try upping - > dfs.datanode.max.xcievers. > > You can then run the I/O benchmarks to measure the I/O throughput - > jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10 -fileSize > 1000 > > -...@nkur > > On 3/30/10 12:46 PM, "Ed Mazur" <ma...@cs.umass.edu> wrote: > > Hi, > > I have a 12 node cluster where instead of running a DN on each compute > node, I'm running just one DN backed by a large RAID (with a > dfs.replication of 1). The compute node storage is limited, so the > idea behind this was to free up more space for intermediate job data. > So the cluster has that one node with the DN, a master node with the > JT/NN, and 10 compute nodes each with a TT. I am running 0.20.1+169.68 > from Cloudera. > > The problem is that MR job performance is now worse than when using a > traditional HDFS setup. A job that took 76 minutes before now takes > 169 minutes. I've used this single DN setup before on a > similarly-sized cluster without any problems, so what can I do to find > the bottleneck? > > -Loading data into HDFS was fast, under 30 minutes to load ~240GB, so > I'm thinking this is a DN <-> map task communication problem. > > -With a traditional HDFS setup, map tasks were taking 10-30 seconds, > but they now take 45-90 seconds or more. > > -I grep'd the DN logs to find how long the size 67633152 HDFS reads > (map inputs) were taking. With the central DN, the reads were an order > of magnitude slower than with traditional HDFS (e.g. 82008147000 vs. > 8238455000). > > -I tried increasing dfs.datanode.handler.count to 10, but this didn't > seem to have any effect. > > -Could low memory be an issue? The machine the DN is running on only > has 2GB and there is less than 100MB free without the DN running. I > haven't observed any swapping going on though. > > -I looked at netstat during a job. I wasn't too sure what to look for, > but I didn't see any substantial send/receive buffering. > > I've tried everything I can think of, so I'd really appreciate any tips. > Thanks. > > Ed > >