Hi Tim,

Thanks much for sharing the info.
I will most certainly share my configuration settings after applying some tuning at my end.
Will let you know the results on this email list.

Thanks,
Usman


Hi Usmam,

So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on
high memory jobs so picked 4 only)
[9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA drives]

Using your template for stats, I get the following with no tuning:

GENERATE RANDOM DATA
Wrote out 90GB of random binary data:
Map output records=9198009
The job took 350 seconds. (approximately: 6 minutes)

SORT RANDOM GENERATED DATA
Map output records= 9197821
Reduce input records=9197821
The job took 2176 seconds. (approximately: 36mins).

So pretty similar to your initial benchmark.  I will tune it a bit
tomorrow and rerun.

If you spent time tuning your cluster and it was successful, please
can you share your config?

Cheers,
Tim





On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed <usm...@opera.com> wrote:
Hi Todd,

Some changes have been applied to the cluster based on the documentation
(URL) you noted below,
like file descriptor settings and io.file.buffer.size. I will check out the
other settings which I haven't applied yet.

My map/reduce slot settings from my hadoop-site.xml and hadoop-default.xml
on all nodes in the cluster.

_*hadoop-site.xml
*_mapred.tasktracker.task.maximum = 2
mapred.tasktracker.map.tasks.maximum = 8
mapred.tasktracker.reduce.tasks.maximum = 8
_*
hadoop-default.xml
*_mapred.map.tasks = 2
mapred.reduce.tasks = 1

Thanks,
Usman


This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
you changed the configurations at all? There are some notes on this
blog post that might help your performance a bit:


http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

How many map and reduce slots did you configure for the daemons? If
you have Ganglia installed you can usually get a good idea of whether
you're using your resources well by looking at the graphs while
running a job like this sort.

-Todd

On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <usm...@opera.com> wrote:

Here are the results i got from my 4 node cluster (correction i noted 5
earlier). One of my nodes out of the 4 is a namenode+datanode both.

GENERATE RANDOM DATA
Wrote out 40GB of random binary data:
Map output records=4088301
The job took 358 seconds. (approximately: 6 minutes).

SORT RANDOM GENERATED DATA
Map output records=4088301
Reduce input records=4088301
The job took 2136 seconds. (approximately: 35 minutes).

VALIDATION OF SORTED DATA
The job took 183 seconds.
SUCCESS! Validated the MapReduce framework's 'sort' successfully.

It would be interesting to see what performance numbers others with a
similar setup have obtained.

Thanks,
Usman


I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
issues though so it won't start up...

Tim


On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usm...@opera.com> wrote:


Thanks Tim, i will check it out and post my results for comments.
-Usman


Might it be worth running the http://wiki.apache.org/hadoop/Sort and
posting your results for comment?

Tim


On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usm...@opera.com>
wrote:



Hi,

Is there a way to tell what kind of performance numbers one can
expect
out
of their cluster given a certain set of specs.

For example i have 5 nodes in my cluster that all have the following
hardware configuration(s):
Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.

Thanks,
Usman







Reply via email to