Re: performance not great, or did I miss something?

2008-08-11 Thread James Graham (Greywolf)

Thus spake Allen Wittenauer::

On 8/8/08 1:25 PM, James Graham (Greywolf) [EMAIL PROTECTED] wrote:
  226GB of available disk space on each one;
  4 processors (2 x dualcore)
  8GB of RAM each.

Some simple stuff:

(Assuming SATA):
Are you using AHCI?
Do you have the write cache enabled?


I will investigate this...



Is the topologyProgram providing proper results?


The whowhat, now?


Is DNS performing as expected? Is it fast?


DNS seems appropriately configured...


How many tasks per node?


four, I think, each of map and reduce.


How much heap does your name node have? Is it going into garbage collection
or swapping?


Maybe GC; no swapping (our systems do not have swap allocated).

--
James Graham (Greywolf)   |
650.930.1138|925.768.4053 *
[EMAIL PROTECTED] |
Check out what people are saying about SearchMe! -- click below
http://www.searchme.com/stack/109aa


performance not great, or did I miss something?

2008-08-08 Thread James Graham (Greywolf)

Greetings,

I'm very very new to this (as you could probably tell from my other postings).

I have 20 nodes available as a cluster, less one as the namenode and one as
the jobtracker (unless I can use them too).  Specs are:

226GB of available disk space on each one;
4 processors (2 x dualcore)
8GB of RAM each.

The RandomWriter takes just over 17 minutes to complete;
the Sorter takes well over three to four hours or more to complete
on only about a half terabyte of data.

This is certainly not the speed or power I had been led to expect from
Hadoop, so I am guessing I have some things tuned wrong (actually, I'm
certain some are tuned wrong as during the reduce phase, I'm seeing processes
die from lack of memory...).

Given the above hardware specs, what should I expect as a theoretical maximum
throughput?  machines 3-10 are on 1GbE, machines 11-20 are on a second 1GbE,
connected by a mutual 1GbE upstream (another switch).



--
James Graham (Greywolf)   |
650.930.1138|925.768.4053 *
[EMAIL PROTECTED] |
Check out what people are saying about SearchMe! -- click below
http://www.searchme.com/stack/109aa


Re: performance not great, or did I miss something?

2008-08-08 Thread Allen Wittenauer
On 8/8/08 1:25 PM, James Graham (Greywolf) [EMAIL PROTECTED] wrote:
 226GB of available disk space on each one;
 4 processors (2 x dualcore)
 8GB of RAM each.

Some simple stuff:

(Assuming SATA):
Are you using AHCI?
Do you have the write cache enabled?

Is the topologyProgram providing proper results?
Is DNS performing as expected? Is it fast?
How many tasks per node?
How much heap does your name node have?  Is it going into garbage collection
or swapping?