Hi James

Were you able to start all the nodes in the same 'availability zone'? You using the new AMI kernels?

If you are using the contrib/ec2 scripts, you might upgrade (just the scripts) to
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/src/contrib/ec2/

These support the new kernels and availability zones. My transient errors went away when upgrading.

The functional changes are documented here:
http://wiki.apache.org/hadoop/AmazonEC2

fyi, you will need to build your own images (via the create-image command) with whatever version of Hadoop you are comfortable with. this will also get you a Ganglia install...

ckw

On May 7, 2008, at 1:29 PM, James Moore wrote:

What is this bit of the log trying to tell me, and what sorts of
things should I be looking at to make sure it doesn't happen?

I don't think the network has any basic configuration issues - I can
telnet from the machine creating this log to the destination - telnet
10.252.222.239 50010 works fine when I ssh in to the box with this
error.

2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient:
Exception in createBlockOutputStream java.net.SocketTimeoutException:
Read timed out
2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient:
Abandoning block blk_-5476242061384228962
2008-05-07 13:20:31,196 INFO org.apache.hadoop.dfs.DFSClient: Waiting
to find target node: 10.252.222.239:50010

I'm seeing a fair number of these.  My reduces finally complete, but
there are usually a couple at the end that take longer than I think
they should, and they frequently have these sorts of errors.

I'm running 20 machines on ec2 right now, with hadoop version 0.16.4.
--
James Moore | [EMAIL PROTECTED]
blog.restphone.com

Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/




Reply via email to