Hello,

I wrote a couple of MapReduce programs that take about **32 minutes** to
complete on my local machine in a psuedo distributed mode.  The input file
has about 1 million rows.

I created a cluster on EC2 that has 10 instances running Hadoop/Hbase & 5
instances running ZooKeeper.  The 10 Hadoop/HBase machines were c1.xlarge &
5 ZooKeeper machines were c1.medium - so fairly powerful machines.  But the
same job took over *1.5 hours to complete!  *I agree that my code needs some
improvement and I am looking into that, but honestly, I am still comparing
apples to apples, right?

Before starting the job I made sure that all HBase instances were live by
using the status command.

status 'simple'  (showed 11 live servers.)

What setup should I use to see performance improvements.. meaning.. how many
instances?  Should I start just Hadoop on 10 instances, Hbase on 5, and
zookeeper on 5?

Any help in improving performance will be greatly appreciated.  Thanks.

- Ajay

PS:  One thing I have noticed is that it goes to 66% very fast and then
slows down from there..

09/12/13 00:59:17 INFO mapred.JobClient:  map 0% reduce 0%
09/12/13 01:02:22 INFO mapred.JobClient:  map 100% reduce 0%
09/12/13 01:02:34 INFO mapred.JobClient:  map 100% reduce 66%
09/12/13 01:03:25 INFO mapred.JobClient:  map 100% reduce 67%
09/12/13 01:05:49 INFO mapred.JobClient:  map 100% reduce 68%
09/12/13 01:08:10 INFO mapred.JobClient:  map 100% reduce 69%
09/12/13 01:10:31 INFO mapred.JobClient:  map 100% reduce 70%
09/12/13 01:12:55 INFO mapred.JobClient:  map 100% reduce 71%
09/12/13 01:15:20 INFO mapred.JobClient:  map 100% reduce 72%
09/12/13 01:17:41 INFO mapred.JobClient:  map 100% reduce 73%
09/12/13 01:20:02 INFO mapred.JobClient:  map 100% reduce 74%
09/12/13 01:22:24 INFO mapred.JobClient:  map 100% reduce 75%
09/12/13 01:24:47 INFO mapred.JobClient:  map 100% reduce 76%
09/12/13 01:27:05 INFO mapred.JobClient:  map 100% reduce 77%
09/12/13 01:29:24 INFO mapred.JobClient:  map 100% reduce 78%
09/12/13 01:31:45 INFO mapred.JobClient:  map 100% reduce 79%
09/12/13 01:34:06 INFO mapred.JobClient:  map 100% reduce 80%
09/12/13 01:36:28 INFO mapred.JobClient:  map 100% reduce 81%
09/12/13 01:38:42 INFO mapred.JobClient:  map 100% reduce 82%

Reply via email to