Hi Tim, I assume those are single proc machines? I got 649 secs on 70GB of data for our 7-node cluster (~11 mins), but we have dual quad Nehalems (2.26Ghz).
On Thu, Oct 15, 2009 at 11:34 AM, tim robertson <timrobertson...@gmail.com>wrote: > Hi Usmam, > > So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on > high memory jobs so picked 4 only) > [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA > drives] > > Using your template for stats, I get the following with no tuning: > > GENERATE RANDOM DATA > Wrote out 90GB of random binary data: > Map output records=9198009 > The job took 350 seconds. (approximately: 6 minutes) > > SORT RANDOM GENERATED DATA > Map output records= 9197821 > Reduce input records=9197821 > The job took 2176 seconds. (approximately: 36mins). > > So pretty similar to your initial benchmark. I will tune it a bit > tomorrow and rerun. > > If you spent time tuning your cluster and it was successful, please > can you share your config? > > Cheers, > Tim > > > > > > On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed <usm...@opera.com> wrote: > > Hi Todd, > > > > Some changes have been applied to the cluster based on the documentation > > (URL) you noted below, > > like file descriptor settings and io.file.buffer.size. I will check out > the > > other settings which I haven't applied yet. > > > > My map/reduce slot settings from my hadoop-site.xml and > hadoop-default.xml > > on all nodes in the cluster. > > > > _*hadoop-site.xml > > *_mapred.tasktracker.task.maximum = 2 > > mapred.tasktracker.map.tasks.maximum = 8 > > mapred.tasktracker.reduce.tasks.maximum = 8 > > _* > > hadoop-default.xml > > *_mapred.map.tasks = 2 > > mapred.reduce.tasks = 1 > > > > Thanks, > > Usman > > > > > >> This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have > >> you changed the configurations at all? There are some notes on this > >> blog post that might help your performance a bit: > >> > >> > >> > http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/ > >> > >> How many map and reduce slots did you configure for the daemons? If > >> you have Ganglia installed you can usually get a good idea of whether > >> you're using your resources well by looking at the graphs while > >> running a job like this sort. > >> > >> -Todd > >> > >> On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <usm...@opera.com> wrote: > >> > >>> > >>> Here are the results i got from my 4 node cluster (correction i noted 5 > >>> earlier). One of my nodes out of the 4 is a namenode+datanode both. > >>> > >>> GENERATE RANDOM DATA > >>> Wrote out 40GB of random binary data: > >>> Map output records=4088301 > >>> The job took 358 seconds. (approximately: 6 minutes). > >>> > >>> SORT RANDOM GENERATED DATA > >>> Map output records=4088301 > >>> Reduce input records=4088301 > >>> The job took 2136 seconds. (approximately: 35 minutes). > >>> > >>> VALIDATION OF SORTED DATA > >>> The job took 183 seconds. > >>> SUCCESS! Validated the MapReduce framework's 'sort' successfully. > >>> > >>> It would be interesting to see what performance numbers others with a > >>> similar setup have obtained. > >>> > >>> Thanks, > >>> Usman > >>> > >>> > >>>> > >>>> I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB > >>>> cache), 8G RAM and 2x500G drives, and will do the same soon. Got some > >>>> issues though so it won't start up... > >>>> > >>>> Tim > >>>> > >>>> > >>>> On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usm...@opera.com> > wrote: > >>>> > >>>> > >>>>> > >>>>> Thanks Tim, i will check it out and post my results for comments. > >>>>> -Usman > >>>>> > >>>>> > >>>>>> > >>>>>> Might it be worth running the http://wiki.apache.org/hadoop/Sortand > >>>>>> posting your results for comment? > >>>>>> > >>>>>> Tim > >>>>>> > >>>>>> > >>>>>> On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usm...@opera.com> > >>>>>> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> Is there a way to tell what kind of performance numbers one can > >>>>>>> expect > >>>>>>> out > >>>>>>> of their cluster given a certain set of specs. > >>>>>>> > >>>>>>> For example i have 5 nodes in my cluster that all have the > following > >>>>>>> hardware configuration(s): > >>>>>>> Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same > rack. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Usman > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >> > >> > > > > >