Since you are running hadoop on psuedo-distributed mode, it is possible that just 1 reduce task will bing better performance, and this will depend on your input's size and content. 2009/3/5 Sandy <snickerdoodl...@gmail.com>
> Hello all, > > For the sake of benchmarking, I ran the standard hadoop wordcount example > on > an input file using 2, 4, and 8 mappers and reducers for my job. > In other words, I do: > > time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2 > sample.txt output > time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r 4 > sample.txt output2 > time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r 8 > sample.txt output3 > > Strangely enough, when this increase in mappers and reducers result in > slower running times! > -On 2 mappers and reducers it ran for 40 seconds > on 4 mappers and reducers it ran for 60 seconds > on 8 mappers and reducers it ran for 90 seconds! > > Please note that the "sample.txt" file is identical in each of these runs. > > I have the following questions: > - Shouldn't wordcount get -faster- with additional mappers and reducers, > instead of slower? > - If it does get faster for other people, why does it become slower for me? > I am running hadoop on psuedo-distributed mode on a single 64-bit Mac Pro > with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs > > I would greatly appreciate it if someone could explain this behavior to me, > and tell me if I'm running this wrong. How can I change my settings (if at > all) to get wordcount running faster when i increases that number of maps > and reduces? > > Thanks, > -SM >