This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have you changed the configurations at all? There are some notes on this blog post that might help your performance a bit:
http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/ How many map and reduce slots did you configure for the daemons? If you have Ganglia installed you can usually get a good idea of whether you're using your resources well by looking at the graphs while running a job like this sort. -Todd On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <usm...@opera.com> wrote: > Here are the results i got from my 4 node cluster (correction i noted 5 > earlier). One of my nodes out of the 4 is a namenode+datanode both. > > GENERATE RANDOM DATA > Wrote out 40GB of random binary data: > Map output records=4088301 > The job took 358 seconds. (approximately: 6 minutes). > > SORT RANDOM GENERATED DATA > Map output records=4088301 > Reduce input records=4088301 > The job took 2136 seconds. (approximately: 35 minutes). > > VALIDATION OF SORTED DATA > The job took 183 seconds. > SUCCESS! Validated the MapReduce framework's 'sort' successfully. > > It would be interesting to see what performance numbers others with a > similar setup have obtained. > > Thanks, > Usman > >> I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB >> cache), 8G RAM and 2x500G drives, and will do the same soon. Got some >> issues though so it won't start up... >> >> Tim >> >> >> On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usm...@opera.com> wrote: >> >>> >>> Thanks Tim, i will check it out and post my results for comments. >>> -Usman >>> >>>> >>>> Might it be worth running the http://wiki.apache.org/hadoop/Sort and >>>> posting your results for comment? >>>> >>>> Tim >>>> >>>> >>>> On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usm...@opera.com> wrote: >>>> >>>> >>>>> >>>>> Hi, >>>>> >>>>> Is there a way to tell what kind of performance numbers one can expect >>>>> out >>>>> of their cluster given a certain set of specs. >>>>> >>>>> For example i have 5 nodes in my cluster that all have the following >>>>> hardware configuration(s): >>>>> Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack. >>>>> >>>>> Thanks, >>>>> Usman >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >