Hello, I wrote a couple of MapReduce programs that take about **32 minutes** to complete on my local machine in a psuedo distributed mode. The input file has about 1 million rows.
I created a cluster on EC2 that has 10 instances running Hadoop/Hbase & 5 instances running ZooKeeper. The 10 Hadoop/HBase machines were c1.xlarge & 5 ZooKeeper machines were c1.medium - so fairly powerful machines. But the same job took over *1.5 hours to complete! *I agree that my code needs some improvement and I am looking into that, but honestly, I am still comparing apples to apples, right? Before starting the job I made sure that all HBase instances were live by using the status command. status 'simple' (showed 11 live servers.) What setup should I use to see performance improvements.. meaning.. how many instances? Should I start just Hadoop on 10 instances, Hbase on 5, and zookeeper on 5? Any help in improving performance will be greatly appreciated. Thanks. - Ajay PS: One thing I have noticed is that it goes to 66% very fast and then slows down from there.. 09/12/13 00:59:17 INFO mapred.JobClient: map 0% reduce 0% 09/12/13 01:02:22 INFO mapred.JobClient: map 100% reduce 0% 09/12/13 01:02:34 INFO mapred.JobClient: map 100% reduce 66% 09/12/13 01:03:25 INFO mapred.JobClient: map 100% reduce 67% 09/12/13 01:05:49 INFO mapred.JobClient: map 100% reduce 68% 09/12/13 01:08:10 INFO mapred.JobClient: map 100% reduce 69% 09/12/13 01:10:31 INFO mapred.JobClient: map 100% reduce 70% 09/12/13 01:12:55 INFO mapred.JobClient: map 100% reduce 71% 09/12/13 01:15:20 INFO mapred.JobClient: map 100% reduce 72% 09/12/13 01:17:41 INFO mapred.JobClient: map 100% reduce 73% 09/12/13 01:20:02 INFO mapred.JobClient: map 100% reduce 74% 09/12/13 01:22:24 INFO mapred.JobClient: map 100% reduce 75% 09/12/13 01:24:47 INFO mapred.JobClient: map 100% reduce 76% 09/12/13 01:27:05 INFO mapred.JobClient: map 100% reduce 77% 09/12/13 01:29:24 INFO mapred.JobClient: map 100% reduce 78% 09/12/13 01:31:45 INFO mapred.JobClient: map 100% reduce 79% 09/12/13 01:34:06 INFO mapred.JobClient: map 100% reduce 80% 09/12/13 01:36:28 INFO mapred.JobClient: map 100% reduce 81% 09/12/13 01:38:42 INFO mapred.JobClient: map 100% reduce 82%
