Re: Hardware performance from HADOOP cluster

Patrick Angeles Thu, 15 Oct 2009 08:52:32 -0700

Hi Tim,
I assume those are single proc machines?

I got 649 secs on 70GB of data for our 7-node cluster (~11 mins), but we
have dual quad Nehalems (2.26Ghz).


On Thu, Oct 15, 2009 at 11:34 AM, tim robertson
<timrobertson...@gmail.com>wrote:

> Hi Usmam,
>
> So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on
> high memory jobs so picked 4 only)
> [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA
> drives]
>
> Using your template for stats, I get the following with no tuning:
>
> GENERATE RANDOM DATA
> Wrote out 90GB of random binary data:
> Map output records=9198009
> The job took 350 seconds. (approximately: 6 minutes)
>
> SORT RANDOM GENERATED DATA
> Map output records= 9197821
> Reduce input records=9197821
> The job took 2176 seconds. (approximately: 36mins).
>
> So pretty similar to your initial benchmark.  I will tune it a bit
> tomorrow and rerun.
>
> If you spent time tuning your cluster and it was successful, please
> can you share your config?
>
> Cheers,
> Tim
>
>
>
>
>
> On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed <usm...@opera.com> wrote:
> > Hi Todd,
> >
> > Some changes have been applied to the cluster based on the documentation
> > (URL) you noted below,
> > like file descriptor settings and io.file.buffer.size. I will check out
> the
> > other settings which I haven't applied yet.
> >
> > My map/reduce slot settings from my hadoop-site.xml and
> hadoop-default.xml
> > on all nodes in the cluster.
> >
> > _*hadoop-site.xml
> > *_mapred.tasktracker.task.maximum = 2
> > mapred.tasktracker.map.tasks.maximum = 8
> > mapred.tasktracker.reduce.tasks.maximum = 8
> > _*
> > hadoop-default.xml
> > *_mapred.map.tasks = 2
> > mapred.reduce.tasks = 1
> >
> > Thanks,
> > Usman
> >
> >
> >> This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
> >> you changed the configurations at all? There are some notes on this
> >> blog post that might help your performance a bit:
> >>
> >>
> >>
> http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/
> >>
> >> How many map and reduce slots did you configure for the daemons? If
> >> you have Ganglia installed you can usually get a good idea of whether
> >> you're using your resources well by looking at the graphs while
> >> running a job like this sort.
> >>
> >> -Todd
> >>
> >> On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <usm...@opera.com> wrote:
> >>
> >>>
> >>> Here are the results i got from my 4 node cluster (correction i noted 5
> >>> earlier). One of my nodes out of the 4 is a namenode+datanode both.
> >>>
> >>> GENERATE RANDOM DATA
> >>> Wrote out 40GB of random binary data:
> >>> Map output records=4088301
> >>> The job took 358 seconds. (approximately: 6 minutes).
> >>>
> >>> SORT RANDOM GENERATED DATA
> >>> Map output records=4088301
> >>> Reduce input records=4088301
> >>> The job took 2136 seconds. (approximately: 35 minutes).
> >>>
> >>> VALIDATION OF SORTED DATA
> >>> The job took 183 seconds.
> >>> SUCCESS! Validated the MapReduce framework's 'sort' successfully.
> >>>
> >>> It would be interesting to see what performance numbers others with a
> >>> similar setup have obtained.
> >>>
> >>> Thanks,
> >>> Usman
> >>>
> >>>
> >>>>
> >>>> I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
> >>>> cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
> >>>> issues though so it won't start up...
> >>>>
> >>>> Tim
> >>>>
> >>>>
> >>>> On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usm...@opera.com>
> wrote:
> >>>>
> >>>>
> >>>>>
> >>>>> Thanks Tim, i will check it out and post my results for comments.
> >>>>> -Usman
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Might it be worth running the http://wiki.apache.org/hadoop/Sortand
> >>>>>> posting your results for comment?
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usm...@opera.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Is there a way to tell what kind of performance numbers one can
> >>>>>>> expect
> >>>>>>> out
> >>>>>>> of their cluster given a certain set of specs.
> >>>>>>>
> >>>>>>> For example i have 5 nodes in my cluster that all have the
> following
> >>>>>>> hardware configuration(s):
> >>>>>>> Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same
> rack.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Usman
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>

Re: Hardware performance from HADOOP cluster

Reply via email to