What does ganglia show for load and network?
You should also be able to see gc stats (count and time). Might help
as well.
fyi,
running
> hadoop-ec2 proxy <cluster-name>
will both setup a socks tunnel and list available urls you can cut/
paste into your browser. one of the urls is for the ganglia interface.
On Apr 11, 2008, at 2:01 PM, Nate Carlson wrote:
On Wed, 9 Apr 2008, Chris K Wensel wrote:
make sure all nodes are running in the same 'availability zone',
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1347
check!
and that you are using the new xen kernels.
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1353&categoryID=101
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1354&categoryID=101
check!
also, make sure each node is addressing its peers via the ec2
private addresses, not the public ones.
check!
there is a patch in jira for the ec2/contrib scripts that address
these issues.
https://issues.apache.org/jira/browse/HADOOP-2410
if you use those scripts, you will be able to see a ganglia display
showing utilization on the machines. 8/7 map/reducers sounds like
alot.
Reduced - I dropped it to 3/2 for testing.
I am using these scripts now, and am still seeing very poor
performance on EC2 compared to my development environment. ;(
I'll be capturing some more extensive stats over the weekend, and
see if I can glean anything useful...
------------------------------------------------------------------------
| nate carlson | [EMAIL PROTECTED] | http://
www.natecarlson.com |
| depriving some poor village of its idiot since
1981 |
------------------------------------------------------------------------
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/