I'm trying to make sense of the results, but running it like this working at least a little better.
map4 reduce1 map4 reduce2 map4 reduce4 map4 reduce8 I tried keeping the reduces constant, while varying the maps.. this results in an increase of running time. When I tried keeping the maps constant, and varying the reduces, I got something better, though when it hit something like map4 reduce4, the running time shoots up, even though previously it had been decreasing. This has been very helpful... though I am very curious: Is the reason one worked better than the other a function of the input only? Or what about pseudo-distibuted mode makes one way work better than the other? Thanks again! -SM On Thu, Mar 5, 2009 at 9:04 PM, haizhou zhao <random...@gmail.com> wrote: > As I metioned above, you should at least try like this: > map2 reduce1 > map4 reduce1 > map8 reduce1 > > map4 reduce1 > map4 reduce2 > map4 reduce4 > > instead of : > map2 reduce2 > map4 reduce4 > map8 reduce8 > > 2009/3/6 Sandy <snickerdoodl...@gmail.com> > > > I was trying to control the maximum number of tasks per tasktracker by > > using > > the > > mapred.tasktracker.tasks.maximum parameter > > > > I am interpreting your comment to mean that maybe this parameter is > > malformed and should read: > > mapred.tasktracker.map.tasks.maximum = 8 > > mapred.tasktracker.map.tasks.maximum = 8 > > > > I did that, and reran on a 428MB input, and got the same results as > before. > > I also ran it on a 3.3G dataset, and got the same pattern. > > > > I am still trying to run it on a 20 GB input. This should confirm if the > > filesystem cache thing is true. > > > > -SM > > > > On Thu, Mar 5, 2009 at 12:22 PM, Sandy <snickerdoodl...@gmail.com> > wrote: > > > > > Arun, > > > > > > How can I check the number of slots per tasktracker? Which parameter > > > controls that? > > > > > > Thanks, > > > -SM > > > > > > > > > On Thu, Mar 5, 2009 at 12:14 PM, Arun C Murthy <a...@yahoo-inc.com> > > wrote: > > > > > >> I assume you have only 2 map and 2 reduce slots per tasktracker - > which > > >> totals to 2 maps/reduces for you cluster. This means with more > > maps/reduces > > >> they are serialized to 2 at a time. > > >> > > >> Also, the -m is only a hint to the JobTracker, you might see less/more > > >> than the number of maps you have specified on the command line. > > >> The -r however is followed faithfully. > > >> > > >> Arun > > >> > > >> > > >> On Mar 4, 2009, at 2:46 PM, Sandy wrote: > > >> > > >> Hello all, > > >>> > > >>> For the sake of benchmarking, I ran the standard hadoop wordcount > > example > > >>> on > > >>> an input file using 2, 4, and 8 mappers and reducers for my job. > > >>> In other words, I do: > > >>> > > >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2 > > >>> sample.txt output > > >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r 4 > > >>> sample.txt output2 > > >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r 8 > > >>> sample.txt output3 > > >>> > > >>> Strangely enough, when this increase in mappers and reducers result > in > > >>> slower running times! > > >>> -On 2 mappers and reducers it ran for 40 seconds > > >>> on 4 mappers and reducers it ran for 60 seconds > > >>> on 8 mappers and reducers it ran for 90 seconds! > > >>> > > >>> Please note that the "sample.txt" file is identical in each of these > > >>> runs. > > >>> > > >>> I have the following questions: > > >>> - Shouldn't wordcount get -faster- with additional mappers and > > reducers, > > >>> instead of slower? > > >>> - If it does get faster for other people, why does it become slower > for > > >>> me? > > >>> I am running hadoop on psuedo-distributed mode on a single 64-bit > Mac > > >>> Pro > > >>> with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs > > >>> > > >>> I would greatly appreciate it if someone could explain this behavior > to > > >>> me, > > >>> and tell me if I'm running this wrong. How can I change my settings > (if > > >>> at > > >>> all) to get wordcount running faster when i increases that number of > > maps > > >>> and reduces? > > >>> > > >>> Thanks, > > >>> -SM > > >>> > > >> > > >> > > > > > >