Have you read http://wiki.apache.org/hadoop/PerformanceTuning ? I'm sure your clients could need some write buffer magic and compression always helps.
WRT to the 66% reducing, it's because of how it works http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer you see it has 3 phases: shuffle, sort, reduce. Shuffle runs in parallel to the mappers, then it sorts (could be very short, depends on your reducer's input) and then it reduces (runs the code you wrote). Hope that helps! J-D On Sun, Dec 13, 2009 at 10:10 AM, Something Something <[email protected]> wrote: > Hello, > > I wrote a couple of MapReduce programs that take about **32 minutes** to > complete on my local machine in a psuedo distributed mode. The input file > has about 1 million rows. > > I created a cluster on EC2 that has 10 instances running Hadoop/Hbase & 5 > instances running ZooKeeper. The 10 Hadoop/HBase machines were c1.xlarge & > 5 ZooKeeper machines were c1.medium - so fairly powerful machines. But the > same job took over *1.5 hours to complete! *I agree that my code needs some > improvement and I am looking into that, but honestly, I am still comparing > apples to apples, right? > > Before starting the job I made sure that all HBase instances were live by > using the status command. > > status 'simple' (showed 11 live servers.) > > What setup should I use to see performance improvements.. meaning.. how many > instances? Should I start just Hadoop on 10 instances, Hbase on 5, and > zookeeper on 5? > > Any help in improving performance will be greatly appreciated. Thanks. > > - Ajay > > PS: One thing I have noticed is that it goes to 66% very fast and then > slows down from there.. > > 09/12/13 00:59:17 INFO mapred.JobClient: map 0% reduce 0% > 09/12/13 01:02:22 INFO mapred.JobClient: map 100% reduce 0% > 09/12/13 01:02:34 INFO mapred.JobClient: map 100% reduce 66% > 09/12/13 01:03:25 INFO mapred.JobClient: map 100% reduce 67% > 09/12/13 01:05:49 INFO mapred.JobClient: map 100% reduce 68% > 09/12/13 01:08:10 INFO mapred.JobClient: map 100% reduce 69% > 09/12/13 01:10:31 INFO mapred.JobClient: map 100% reduce 70% > 09/12/13 01:12:55 INFO mapred.JobClient: map 100% reduce 71% > 09/12/13 01:15:20 INFO mapred.JobClient: map 100% reduce 72% > 09/12/13 01:17:41 INFO mapred.JobClient: map 100% reduce 73% > 09/12/13 01:20:02 INFO mapred.JobClient: map 100% reduce 74% > 09/12/13 01:22:24 INFO mapred.JobClient: map 100% reduce 75% > 09/12/13 01:24:47 INFO mapred.JobClient: map 100% reduce 76% > 09/12/13 01:27:05 INFO mapred.JobClient: map 100% reduce 77% > 09/12/13 01:29:24 INFO mapred.JobClient: map 100% reduce 78% > 09/12/13 01:31:45 INFO mapred.JobClient: map 100% reduce 79% > 09/12/13 01:34:06 INFO mapred.JobClient: map 100% reduce 80% > 09/12/13 01:36:28 INFO mapred.JobClient: map 100% reduce 81% > 09/12/13 01:38:42 INFO mapred.JobClient: map 100% reduce 82% >
