Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread jason hadoop
I believe the datanode is the same physical machine as the namenode if I
understand this problem correctly.
Which really puts pay to our suggestions about traceroute and firewalls)

I have one question, is the ip address consistent, I think in one of the
thread mails, it was stated that the ip address sometimes changes.
That may be because the dns lookup to the primary server timed out and the
secondary returned a different address, or some other floating dns oddity,
and that could be a part of the problem.
We had problems with transient dns failures at one point on one of our
larger clusters, and just hardcoded the ip addresses after that.

On Wed, Apr 22, 2009 at 8:03 PM, Raghu Angadi rang...@yahoo-inc.com wrote:

 Stas Oskin wrote:


  Tried in step 3 to telnet both the 50010 and the 8010 ports of the
 problematic datanode - both worked.


 Shouldn't you be testing connecting _from_ the datanode? The error you
 posted is while this DN is trying connect to another DN.

 Raghu.


  I agree there is indeed an interesting problem :). Question is how it can
 be
 solved.

 Thanks.





-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Stas Oskin
Hi.

Shouldn't you be testing connecting _from_ the datanode? The error you
 posted is while this DN is trying connect to another DN.



You might be into something here indeed:

1) Telnet to 192.168.253.20 8020 / 192.168.253.20 50010 works
2) Telnet to localhost 8020 / localhost 50010 doesn't work
3) Telnet to 127.0.0.1 8020 / 127.0.0.1 50010 doesn't work

In the 2 last cases I get:
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host: Connection refused

Could it be related?

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Stas Oskin
Hi.

I have one question, is the ip address consistent, I think in one of the
 thread mails, it was stated that the ip address sometimes changes.


Same static IP's for all servers.

By the way, I have the fs.default.name defined in IP address could it be
somehow related?

I read that there were some issues with this, but it ran fine for me - that
it, until the power crash.

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Steve Loughran

Stas Oskin wrote:

Hi.

2009/4/23 Matt Massie m...@cloudera.com


Just for clarity: are you using any type of virtualization (e.g. vmware,
xen) or just running the DataNode java process on the same machine?

What is fs.default.name set to in your hadoop-site.xml?




 This machine has OpenVZ installed indeed, but all the applications run
withing the host node, meaning all Java processes are running withing same
machine.


Maybe, but there will still be at least one virtual network adapter on 
the host. Try turning them off.




The fs.default.name is:
hdfs://192.168.253.20:8020


what happens if you switch to hostnames over IP addresses?


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Stas Oskin
Hi.

Maybe, but there will still be at least one virtual network adapter on the
 host. Try turning them off.


Nope, still throws No route to host exceptions.

I have another IP address defined on this machine - 192.168.253.21, for the
same network adapter.

Any idea if it has impact?




 The fs.default.name is:
 hdfs://192.168.253.20:8020


 what happens if you switch to hostnames over IP addresses?


Actually, I never tried this, but point is that the HDFS worked just fine
with this before.

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Stas Oskin
Hi.

2009/4/23 Matt Massie m...@cloudera.com

 Just for clarity: are you using any type of virtualization (e.g. vmware,
 xen) or just running the DataNode java process on the same machine?

 What is fs.default.name set to in your hadoop-site.xml?



 This machine has OpenVZ installed indeed, but all the applications run
withing the host node, meaning all Java processes are running withing same
machine.

The fs.default.name is:
hdfs://192.168.253.20:8020

Thanks.


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread jason hadoop
Can you give us your network topology ?
I see that at least 3 ip addresses
192.168.253.20, 192.168.253.32 and 192.168.253.21

In particular the fs.default.name which you have provided, the
hadoop-site.xml for each machine,
the slaves file, with ip address mappings if needed and a netstat -a -n -t
-p | grep java (hopefully you run linux)
and the output of jps for each machine

That should let us see what servers are binding to what ports on what
machines, and what you cluster things should be happening.

Also iptables -L for each machine as an afterthought - just for paranoia's
sake

On Thu, Apr 23, 2009 at 2:45 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Maybe, but there will still be at least one virtual network adapter on the
  host. Try turning them off.


 Nope, still throws No route to host exceptions.

 I have another IP address defined on this machine - 192.168.253.21, for the
 same network adapter.

 Any idea if it has impact?


 
 
  The fs.default.name is:
  hdfs://192.168.253.20:8020
 
 
  what happens if you switch to hostnames over IP addresses?


 Actually, I never tried this, but point is that the HDFS worked just fine
 with this before.

 Regards.




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Stas Oskin
Just to clarify one point - the iptables were running on 2nd DataNode which
I didn't check, as I was sure the problem is in the NameNode/DataNode, and
on NameNode/DataNode.  But I can't understand what and when launched them,
as I checked multiple times and nothing was running before. Moreover, they
were disabled on start-up, so they shouldn't come up in the first place.

Regards.

2009/4/23 Stas Oskin stas.os...@gmail.com

 Hi.


 Also iptables -L for each machine as an afterthought - just for paranoia's
 sake


 Well, I started preparing all the information you requested, but when I got
 to this stage - I found out there were INDEED iptables running on 2 servers
 from 3.

 The strangest thing is that I don't recall enabling them at all. Perhaps
 some 3rd party software have enabled them?

 In any case, all seems to be working now.

 Thanks for everybody that helped - I will be sure to check iptables on all
 the cluster machines from now on :).

 Regards.



Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.


2009/4/22 jason hadoop jason.had...@gmail.com

 Most likely that machine is affected by some firewall somewhere that
 prevents traffic on port 50075. The no route to host is a strong indicator,
 particularly if the Datanote registered with the namenode.



Yes, this was my first thought as well. But there is no firewall, and the
port can be connected via netcat from any other machine.

Any other idea?

Thanks.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread jason hadoop
the no route to host message means one of two things, either there is no
actual route, which would have generated a different error, or some firewall
is sending back a new route message.

I have seen the now route to host problem several times, and it is usually
because there is a firewall in place that no one is expecting to be there.

In the following IP and PORT are the IP address and port from the failure
message in your log file. the server machine is the machine that has IP as
an address, and the remote machine is the machine that the connection is
failing on.

The way to diagnose this explicitly is:
1) on the server machine that should be accepting connections on the port,
telnet localhost PORT, and telnet IP PORT you should get a connection, if
not then the server is not binding the port.
2) on the remote machine verify that you can communicate to the server
machine via normal tools such as ssh and or ping and or traceroute, using
the IP address from the error message in your log file
3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and
(3) does not, then there is something blocking packets for the port range in
question. If (3) does succeed then there is some probably interesting
problem.



On Wed, Apr 22, 2009 at 7:31 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 No route to host generally means machines have routing problems. Machine
 A
  doesnt know how to route packets to Machine B. Reboot everything, router
  first, see if it goes away. Otherwise, now is the time to learn to debug
  routing problems. traceroute is the best starting place


 I used traceroute to check whether the problematic node is accessible by
 other machines. It just works - all except HDFS that it.

 Any way to check what causes this exception?

 Regards.




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi


There is some mismatch here.. what is the expected ip address of this 
machine (or does it have multiple interfaces and properly routed)? 
Looking at the Receiving Block message DN thinks its address is 
192.168.253.20 but NN thinks it is 253.32 (and client is able to connect 
 using 253.32).


If you want to find the destination ip that this DN is unable to connect 
to, you can check client's log for this block number.


Stas Oskin wrote:

Hi.


2009/4/22 jason hadoop jason.had...@gmail.com


Most likely that machine is affected by some firewall somewhere that
prevents traffic on port 50075. The no route to host is a strong indicator,
particularly if the Datanote registered with the namenode.




Yes, this was my first thought as well. But there is no firewall, and the
port can be connected via netcat from any other machine.

Any other idea?

Thanks.





Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.

There is some mismatch here.. what is the expected ip address of this
 machine (or does it have multiple interfaces and properly routed)? Looking
 at the Receiving Block message DN thinks its address is 192.168.253.20 but
 NN thinks it is 253.32 (and client is able to connect  using 253.32).

 If you want to find the destination ip that this DN is unable to connect
 to, you can check client's log for this block number.



Hmm, .253.32 is the client workstation (has only our test application with
core-hadoop.jar + configs).

The expected address of the DataNode should be 192.168.253.20.

According to what I seen, the problem is in DataNode itself - it just throws
the Datanoderegistration every so often:


2009-04-23 00:05:05,961 INFO org.apache.hadoop.dfs.DataNode: Receiving block
blk_7209884038924026671_8033 src: /192.168.253.32:42932
 dest: /192.168.253.32:50010
2009-04-23 00:05:05,962 INFO org.apache.hadoop.dfs.DataNode: writeBlock
blk_7209884038924026671_8033 received exception java.net.NoR
outeToHostException: No route to host
2009-04-23 00:05:05,962 ERROR org.apache.hadoop.dfs.DataNode:
DatanodeRegistration(192.168.253.20:50010, storageID=DS-1790181121-127
.0.0.1-50010-1239123237447, infoPort=50075, ipcPort=50020):DataXceiver:
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:402)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1255)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092)
at java.lang.Thread.run(Thread.java:619)

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.

The way to diagnose this explicitly is:
 1) on the server machine that should be accepting connections on the port,
 telnet localhost PORT, and telnet IP PORT you should get a connection, if
 not then the server is not binding the port.
 2) on the remote machine verify that you can communicate to the server
 machine via normal tools such as ssh and or ping and or traceroute, using
 the IP address from the error message in your log file
 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and
 (3) does not, then there is something blocking packets for the port range
 in
 question. If (3) does succeed then there is some probably interesting
 problem.


 Tried in step 3 to telnet both the 50010 and the 8010 ports of the
problematic datanode - both worked.

I agree there is indeed an interesting problem :). Question is how it can be
solved.

Thanks.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Matt Massie
Stas-

Is it possible to paste the output from the following command on both your
DataNode and NameNode?

% route -v -n

-Matt


On Wed, Apr 22, 2009 at 4:36 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 The way to diagnose this explicitly is:
  1) on the server machine that should be accepting connections on the
 port,
  telnet localhost PORT, and telnet IP PORT you should get a connection, if
  not then the server is not binding the port.
  2) on the remote machine verify that you can communicate to the server
  machine via normal tools such as ssh and or ping and or traceroute, using
  the IP address from the error message in your log file
  3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and
  (3) does not, then there is something blocking packets for the port range
  in
  question. If (3) does succeed then there is some probably interesting
  problem.
 

  Tried in step 3 to telnet both the 50010 and the 8010 ports of the
 problematic datanode - both worked.

 I agree there is indeed an interesting problem :). Question is how it can
 be
 solved.

 Thanks.



Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Stas Oskin
Hi.

Is it possible to paste the output from the following command on both your
 DataNode and NameNode?

 % route -v -n


Sure, here it is:

Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse
Iface
192.168.253.0   0.0.0.0 255.255.255.0   U 0  00 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0  00 eth0
0.0.0.0 192.168.253.1   0.0.0.0 UG0  00 eth0


As you might recall, the problematic data node runs in same server as the
NameNode.

Regards.


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Matt Massie
Just for clarity: are you using any type of virtualization (e.g. vmware,
xen) or just running the DataNode java process on the same machine?

What is fs.default.name set to in your hadoop-site.xml?

-Matt


On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Is it possible to paste the output from the following command on both your
  DataNode and NameNode?
 
  % route -v -n
 

 Sure, here it is:

 Kernel IP routing table
 Destination Gateway Genmask Flags Metric RefUse
 Iface
 192.168.253.0   0.0.0.0 255.255.255.0   U 0  00
 eth0
 169.254.0.0 0.0.0.0 255.255.0.0 U 0  00
 eth0
 0.0.0.0 192.168.253.1   0.0.0.0 UG0  00
 eth0


 As you might recall, the problematic data node runs in same server as the
 NameNode.

 Regards.



Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread jason hadoop
I wonder if this is an obscure case of out of file descriptors. I would
expect a different message out of the jvm core

On Wed, Apr 22, 2009 at 5:34 PM, Matt Massie m...@cloudera.com wrote:

 Just for clarity: are you using any type of virtualization (e.g. vmware,
 xen) or just running the DataNode java process on the same machine?

 What is fs.default.name set to in your hadoop-site.xml?

 -Matt


 On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote:

  Hi.
 
  Is it possible to paste the output from the following command on both
 your
   DataNode and NameNode?
  
   % route -v -n
  
 
  Sure, here it is:
 
  Kernel IP routing table
  Destination Gateway Genmask Flags Metric RefUse
  Iface
  192.168.253.0   0.0.0.0 255.255.255.0   U 0  00
  eth0
  169.254.0.0 0.0.0.0 255.255.0.0 U 0  00
  eth0
  0.0.0.0 192.168.253.1   0.0.0.0 UG0  00
  eth0
 
 
  As you might recall, the problematic data node runs in same server as the
  NameNode.
 
  Regards.
 




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi

Stas Oskin wrote:


 Tried in step 3 to telnet both the 50010 and the 8010 ports of the
problematic datanode - both worked.


Shouldn't you be testing connecting _from_ the datanode? The error you 
posted is while this DN is trying connect to another DN.


Raghu.


I agree there is indeed an interesting problem :). Question is how it can be
solved.

Thanks.





Re: No route to host prevents from storing files to HDFS

2009-04-21 Thread Stas Oskin
Hi again.

Other tools, like balancer, or the web browsing from namenode, don't work as
well.
This because other nodes complain about not reaching the offending node as
well.

I even tried netcat'ing the IP/port from another node - and it successfully
connected.

Any advice on this No route to host error?

2009/4/21 Stas Oskin stas.os...@gmail.com

 Hi.

 I have quite a strange issue, where one of the datanodes that I have,
 rejects any blocks with error messages.

 I looked in the datanode logs, and found the following error:

 2009-04-21 16:59:19,092 ERROR org.apache.hadoop.dfs.DataNode:
 DatanodeRegistration(192.168.253.20:50010,
 storageID=DS-1790181121-127.0.0.1-50010-1239123237447, infoPort=50075,
 ipcPort=50020):DataXceiver: java.net.NoRouteToHostException: No route to
 host
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:402)
 at
 org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1255)
 at
 org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092)
 at java.lang.Thread.run(Thread.java:619)

 2009-04-21 16:59:31,047 INFO org.apache.hadoop.dfs.DataNode: Receiving
 block blk_1882546734729403703_7805 src: /192.168.253.32:54917 dest: /
 192.168.253.32:50010
 2009-04-21 16:59:31,048 INFO org.apache.hadoop.dfs.DataNode: writeBlock
 blk_1882546734729403703_7805 received exception
 java.net.NoRouteToHostException: No route to host


 Several facts:

 1) I use the stable 0.18.3
 2) It worked before correctly, before I had an overall power crash which
 brought down all the machines.
 3) This datanode is located on same machine as the NameNode and the
 SecondaryNameNode.
 4) I can ping the machine from itself - no error messages.

 Any idea what should be done to resolve it?

 Thanks in advance.



Re: No route to host prevents from storing files to HDFS

2009-04-21 Thread Philip Zeyliger
Very naively looking at the code, the exception you see is happening in the
write path, on the way to sending a copy of your data to a second data
node.  One data node is pipelining the data to another, and that connection
is failing.  The fact that DatanodeRegistration is mentioned in the
exception is a red herring: that's merely the text that the datanode prints
for every exception that's thrown during a server response.  It's
frustrating that the exception message doesn't actually mention what host
it's trying to connect to.

Some quick avenues for debugging:

It sounds like you've identified a specific data node that isn't behaving.
Is the exception that you've pasted in coming from that DataNode or from
another?

Can you tell if the DataNode is listening on the right ports?  You might try
sudo netstat -pl | grep java and check to see that the DataNode is
listening on 50010 (I believe that's the default).

You might also try strace on the process that's showing the no route to
host error, to catch the system call fail.

You could, of course, instrument the code to do a try/catch around the
relevant block in DataNode.java, to find out what host/port the connection
is failing on.