Re: Hadoop cluster optimization

אבי ווקנין Mon, 22 Aug 2011 03:01:12 -0700

Hi Allen/Michel ,


First, thanks a lot for your reply.

I assumed that the 1.7GB RAM will be the bottleneck in my environment that's
why I am trying to change it now.

I shut down the 4 datanodes with 1.7GB RAM (Amazon EC2 small instance) and
replaced them with

2 datanodes with 7.5GB RAM (Amazon EC2 large instance).

Is it OK that the datanodes are 64 bit while the namenode is still 32 bit?
Based on the new hardware I'm using, Are there any suggestions regarding the
Hadoop

configuration parameters?

One more thing, you asked: "Are your tasks spilling?"

How can I check if my tasks spilling ?

Thanks.

Avi



On Mon, Aug 22, 2011 at 12:55 PM, Avi Vaknin <avivakni...@gmail.com> wrote:


> Hi Allen/Michel ,
> First, thanks a lot for your reply.
>
> I assumed that the 1.7GB RAM will be the bottleneck in my environment
> that's
> why
> I am trying to change it now.
> I shut down the 4 datanodes with 1.7GB RAM (Amazon EC2 small instance) and
> replaced them with
> 2 datanodes with 7.5GB RAM (Amazon EC2 large instance).
>
> Is it OK that the datanodes are 64 bit while the namenode is still 32 bit?
> Based on the new hardware I'm using, Are there any suggestions regarding
> the
> Hadoop
> configuration parameters?
>
> One more thing, you asked: "Are your tasks spilling?"
> How can I check if my tasks spilling ?
>
> Thanks.
>
> Avi
>
>
> -----Original Message-----
> From: Allen Wittenauer [mailto:a...@apache.org]
> Sent: Monday, August 22, 2011 7:06 AM
> To: common-user@hadoop.apache.org
>
> Subject: Re: Hadoop cluster optimization
>
>
> On Aug 21, 2011, at 7:17 PM, Michel Segel wrote:
>
> > Avi,
> > First why 32 bit OS?
> > You have a 64 bit processor that has 4 cores hyper threaded looks like
> 8cpus.
>
>        With only 1.7gb of mem, there likely isn't much of a reason to use a
> 64-bit OS.  The machines (as you point out) are already tight on memory.
> 64-bit is only going to make it worse.
>
> >>
> >> 1.7 GB memory
> >> 1 Intel(R) Xeon(R) CPU E5507 @ 2.27GHz
> >> Ubuntu Server 10.10 , 32-bit platform
> >> Cloudera CDH3 Manual Hadoop Installation
> >> (for the ones who are familiar with Amazon Web Services, I am talking
> about
> >> Small EC2 Instances/Servers)
> >>
> >> Total job run time is +-15 minutes (+-50 files/blocks/mapTasks of up to
> 250
> >> MB and 10 reduce tasks).
> >>
> >> Based on the above information, does anyone can recommend on a best
> practice
> >> configuration??
>
>        How many spindles?  Are your tasks spilling?
>
>
> >> Do you thinks that when dealing with such a small cluster, and when
> >> processing such a small amount of data,
> >> is it even possible to optimize jobs so they would run much faster?
>
>        Most of the time, performance issues are with the algorithm, not
> Hadoop.
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1392 / Virus Database: 1520/3848 - Release Date: 08/21/11
>
>
>

Re: Hadoop cluster optimization

Reply via email to