Hi James
Were you able to start all the nodes in the same 'availability zone'?
You using the new AMI kernels?
If you are using the contrib/ec2 scripts, you might upgrade (just the
scripts) to
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/src/contrib/ec2/
These support the new kernels and availability zones. My transient
errors went away when upgrading.
The functional changes are documented here:
http://wiki.apache.org/hadoop/AmazonEC2
fyi, you will need to build your own images (via the create-image
command) with whatever version of Hadoop you are comfortable with.
this will also get you a Ganglia install...
ckw
On May 7, 2008, at 1:29 PM, James Moore wrote:
What is this bit of the log trying to tell me, and what sorts of
things should I be looking at to make sure it doesn't happen?
I don't think the network has any basic configuration issues - I can
telnet from the machine creating this log to the destination - telnet
10.252.222.239 50010 works fine when I ssh in to the box with this
error.
2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient:
Exception in createBlockOutputStream java.net.SocketTimeoutException:
Read timed out
2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient:
Abandoning block blk_-5476242061384228962
2008-05-07 13:20:31,196 INFO org.apache.hadoop.dfs.DFSClient: Waiting
to find target node: 10.252.222.239:50010
I'm seeing a fair number of these. My reduces finally complete, but
there are usually a couple at the end that take longer than I think
they should, and they frequently have these sorts of errors.
I'm running 20 machines on ec2 right now, with hadoop version 0.16.4.
--
James Moore | [EMAIL PROTECTED]
blog.restphone.com
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/