What does ganglia show for load and network?

You should also be able to see gc stats (count and time). Might help as well.

fyi,
running
> hadoop-ec2 proxy <cluster-name>

will both setup a socks tunnel and list available urls you can cut/ paste into your browser. one of the urls is for the ganglia interface.

On Apr 11, 2008, at 2:01 PM, Nate Carlson wrote:
On Wed, 9 Apr 2008, Chris K Wensel wrote:
make sure all nodes are running in the same 'availability zone', 
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1347

check!

and that you are using the new xen kernels.
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1353&categoryID=101
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1354&categoryID=101

check!

also, make sure each node is addressing its peers via the ec2 private addresses, not the public ones.

check!

there is a patch in jira for the ec2/contrib scripts that address these issues.
https://issues.apache.org/jira/browse/HADOOP-2410

if you use those scripts, you will be able to see a ganglia display showing utilization on the machines. 8/7 map/reducers sounds like alot.

Reduced - I dropped it to 3/2 for testing.

I am using these scripts now, and am still seeing very poor performance on EC2 compared to my development environment. ;(

I'll be capturing some more extensive stats over the weekend, and see if I can glean anything useful...

------------------------------------------------------------------------
| nate carlson | [EMAIL PROTECTED] | http:// www.natecarlson.com | | depriving some poor village of its idiot since 1981 |
------------------------------------------------------------------------

Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/




Reply via email to