Have you read http://wiki.apache.org/hadoop/PerformanceTuning ? I'm
sure your clients could need some write buffer magic and compression
always helps.

WRT to the 66% reducing, it's because of how it works
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer
you see it has 3 phases: shuffle, sort, reduce. Shuffle runs in
parallel to the mappers, then it sorts (could be very short, depends
on your reducer's input) and then it reduces (runs the code you
wrote).

Hope that helps!

J-D

On Sun, Dec 13, 2009 at 10:10 AM, Something Something
<[email protected]> wrote:
> Hello,
>
> I wrote a couple of MapReduce programs that take about **32 minutes** to
> complete on my local machine in a psuedo distributed mode.  The input file
> has about 1 million rows.
>
> I created a cluster on EC2 that has 10 instances running Hadoop/Hbase & 5
> instances running ZooKeeper.  The 10 Hadoop/HBase machines were c1.xlarge &
> 5 ZooKeeper machines were c1.medium - so fairly powerful machines.  But the
> same job took over *1.5 hours to complete!  *I agree that my code needs some
> improvement and I am looking into that, but honestly, I am still comparing
> apples to apples, right?
>
> Before starting the job I made sure that all HBase instances were live by
> using the status command.
>
> status 'simple'  (showed 11 live servers.)
>
> What setup should I use to see performance improvements.. meaning.. how many
> instances?  Should I start just Hadoop on 10 instances, Hbase on 5, and
> zookeeper on 5?
>
> Any help in improving performance will be greatly appreciated.  Thanks.
>
> - Ajay
>
> PS:  One thing I have noticed is that it goes to 66% very fast and then
> slows down from there..
>
> 09/12/13 00:59:17 INFO mapred.JobClient:  map 0% reduce 0%
> 09/12/13 01:02:22 INFO mapred.JobClient:  map 100% reduce 0%
> 09/12/13 01:02:34 INFO mapred.JobClient:  map 100% reduce 66%
> 09/12/13 01:03:25 INFO mapred.JobClient:  map 100% reduce 67%
> 09/12/13 01:05:49 INFO mapred.JobClient:  map 100% reduce 68%
> 09/12/13 01:08:10 INFO mapred.JobClient:  map 100% reduce 69%
> 09/12/13 01:10:31 INFO mapred.JobClient:  map 100% reduce 70%
> 09/12/13 01:12:55 INFO mapred.JobClient:  map 100% reduce 71%
> 09/12/13 01:15:20 INFO mapred.JobClient:  map 100% reduce 72%
> 09/12/13 01:17:41 INFO mapred.JobClient:  map 100% reduce 73%
> 09/12/13 01:20:02 INFO mapred.JobClient:  map 100% reduce 74%
> 09/12/13 01:22:24 INFO mapred.JobClient:  map 100% reduce 75%
> 09/12/13 01:24:47 INFO mapred.JobClient:  map 100% reduce 76%
> 09/12/13 01:27:05 INFO mapred.JobClient:  map 100% reduce 77%
> 09/12/13 01:29:24 INFO mapred.JobClient:  map 100% reduce 78%
> 09/12/13 01:31:45 INFO mapred.JobClient:  map 100% reduce 79%
> 09/12/13 01:34:06 INFO mapred.JobClient:  map 100% reduce 80%
> 09/12/13 01:36:28 INFO mapred.JobClient:  map 100% reduce 81%
> 09/12/13 01:38:42 INFO mapred.JobClient:  map 100% reduce 82%
>

Reply via email to