Re: No route to host prevents from storing files to HDFS
I believe the datanode is the same physical machine as the namenode if I understand this problem correctly. Which really puts pay to our suggestions about traceroute and firewalls) I have one question, is the ip address consistent, I think in one of the thread mails, it was stated that the ip address sometimes changes. That may be because the dns lookup to the primary server timed out and the secondary returned a different address, or some other floating dns oddity, and that could be a part of the problem. We had problems with transient dns failures at one point on one of our larger clusters, and just hardcoded the ip addresses after that. On Wed, Apr 22, 2009 at 8:03 PM, Raghu Angadi rang...@yahoo-inc.com wrote: Stas Oskin wrote: Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. Shouldn't you be testing connecting _from_ the datanode? The error you posted is while this DN is trying connect to another DN. Raghu. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: No route to host prevents from storing files to HDFS
Hi. Shouldn't you be testing connecting _from_ the datanode? The error you posted is while this DN is trying connect to another DN. You might be into something here indeed: 1) Telnet to 192.168.253.20 8020 / 192.168.253.20 50010 works 2) Telnet to localhost 8020 / localhost 50010 doesn't work 3) Telnet to 127.0.0.1 8020 / 127.0.0.1 50010 doesn't work In the 2 last cases I get: Trying 127.0.0.1... telnet: connect to address 127.0.0.1: Connection refused telnet: Unable to connect to remote host: Connection refused Could it be related? Regards.
Re: No route to host prevents from storing files to HDFS
Hi. I have one question, is the ip address consistent, I think in one of the thread mails, it was stated that the ip address sometimes changes. Same static IP's for all servers. By the way, I have the fs.default.name defined in IP address could it be somehow related? I read that there were some issues with this, but it ran fine for me - that it, until the power crash. Regards.
Re: No route to host prevents from storing files to HDFS
Stas Oskin wrote: Hi. 2009/4/23 Matt Massie m...@cloudera.com Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? This machine has OpenVZ installed indeed, but all the applications run withing the host node, meaning all Java processes are running withing same machine. Maybe, but there will still be at least one virtual network adapter on the host. Try turning them off. The fs.default.name is: hdfs://192.168.253.20:8020 what happens if you switch to hostnames over IP addresses?
Re: No route to host prevents from storing files to HDFS
Hi. Maybe, but there will still be at least one virtual network adapter on the host. Try turning them off. Nope, still throws No route to host exceptions. I have another IP address defined on this machine - 192.168.253.21, for the same network adapter. Any idea if it has impact? The fs.default.name is: hdfs://192.168.253.20:8020 what happens if you switch to hostnames over IP addresses? Actually, I never tried this, but point is that the HDFS worked just fine with this before. Regards.
Re: No route to host prevents from storing files to HDFS
Hi. 2009/4/23 Matt Massie m...@cloudera.com Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? This machine has OpenVZ installed indeed, but all the applications run withing the host node, meaning all Java processes are running withing same machine. The fs.default.name is: hdfs://192.168.253.20:8020 Thanks.
Re: No route to host prevents from storing files to HDFS
Can you give us your network topology ? I see that at least 3 ip addresses 192.168.253.20, 192.168.253.32 and 192.168.253.21 In particular the fs.default.name which you have provided, the hadoop-site.xml for each machine, the slaves file, with ip address mappings if needed and a netstat -a -n -t -p | grep java (hopefully you run linux) and the output of jps for each machine That should let us see what servers are binding to what ports on what machines, and what you cluster things should be happening. Also iptables -L for each machine as an afterthought - just for paranoia's sake On Thu, Apr 23, 2009 at 2:45 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. Maybe, but there will still be at least one virtual network adapter on the host. Try turning them off. Nope, still throws No route to host exceptions. I have another IP address defined on this machine - 192.168.253.21, for the same network adapter. Any idea if it has impact? The fs.default.name is: hdfs://192.168.253.20:8020 what happens if you switch to hostnames over IP addresses? Actually, I never tried this, but point is that the HDFS worked just fine with this before. Regards. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: No route to host prevents from storing files to HDFS
Just to clarify one point - the iptables were running on 2nd DataNode which I didn't check, as I was sure the problem is in the NameNode/DataNode, and on NameNode/DataNode. But I can't understand what and when launched them, as I checked multiple times and nothing was running before. Moreover, they were disabled on start-up, so they shouldn't come up in the first place. Regards. 2009/4/23 Stas Oskin stas.os...@gmail.com Hi. Also iptables -L for each machine as an afterthought - just for paranoia's sake Well, I started preparing all the information you requested, but when I got to this stage - I found out there were INDEED iptables running on 2 servers from 3. The strangest thing is that I don't recall enabling them at all. Perhaps some 3rd party software have enabled them? In any case, all seems to be working now. Thanks for everybody that helped - I will be sure to check iptables on all the cluster machines from now on :). Regards.
Re: No route to host prevents from storing files to HDFS
Hi. 2009/4/22 jason hadoop jason.had...@gmail.com Most likely that machine is affected by some firewall somewhere that prevents traffic on port 50075. The no route to host is a strong indicator, particularly if the Datanote registered with the namenode. Yes, this was my first thought as well. But there is no firewall, and the port can be connected via netcat from any other machine. Any other idea? Thanks.
Re: No route to host prevents from storing files to HDFS
the no route to host message means one of two things, either there is no actual route, which would have generated a different error, or some firewall is sending back a new route message. I have seen the now route to host problem several times, and it is usually because there is a firewall in place that no one is expecting to be there. In the following IP and PORT are the IP address and port from the failure message in your log file. the server machine is the machine that has IP as an address, and the remote machine is the machine that the connection is failing on. The way to diagnose this explicitly is: 1) on the server machine that should be accepting connections on the port, telnet localhost PORT, and telnet IP PORT you should get a connection, if not then the server is not binding the port. 2) on the remote machine verify that you can communicate to the server machine via normal tools such as ssh and or ping and or traceroute, using the IP address from the error message in your log file 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and (3) does not, then there is something blocking packets for the port range in question. If (3) does succeed then there is some probably interesting problem. On Wed, Apr 22, 2009 at 7:31 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. No route to host generally means machines have routing problems. Machine A doesnt know how to route packets to Machine B. Reboot everything, router first, see if it goes away. Otherwise, now is the time to learn to debug routing problems. traceroute is the best starting place I used traceroute to check whether the problematic node is accessible by other machines. It just works - all except HDFS that it. Any way to check what causes this exception? Regards. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: No route to host prevents from storing files to HDFS
There is some mismatch here.. what is the expected ip address of this machine (or does it have multiple interfaces and properly routed)? Looking at the Receiving Block message DN thinks its address is 192.168.253.20 but NN thinks it is 253.32 (and client is able to connect using 253.32). If you want to find the destination ip that this DN is unable to connect to, you can check client's log for this block number. Stas Oskin wrote: Hi. 2009/4/22 jason hadoop jason.had...@gmail.com Most likely that machine is affected by some firewall somewhere that prevents traffic on port 50075. The no route to host is a strong indicator, particularly if the Datanote registered with the namenode. Yes, this was my first thought as well. But there is no firewall, and the port can be connected via netcat from any other machine. Any other idea? Thanks.
Re: No route to host prevents from storing files to HDFS
Hi. There is some mismatch here.. what is the expected ip address of this machine (or does it have multiple interfaces and properly routed)? Looking at the Receiving Block message DN thinks its address is 192.168.253.20 but NN thinks it is 253.32 (and client is able to connect using 253.32). If you want to find the destination ip that this DN is unable to connect to, you can check client's log for this block number. Hmm, .253.32 is the client workstation (has only our test application with core-hadoop.jar + configs). The expected address of the DataNode should be 192.168.253.20. According to what I seen, the problem is in DataNode itself - it just throws the Datanoderegistration every so often: 2009-04-23 00:05:05,961 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_7209884038924026671_8033 src: /192.168.253.32:42932 dest: /192.168.253.32:50010 2009-04-23 00:05:05,962 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_7209884038924026671_8033 received exception java.net.NoR outeToHostException: No route to host 2009-04-23 00:05:05,962 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(192.168.253.20:50010, storageID=DS-1790181121-127 .0.0.1-50010-1239123237447, infoPort=50075, ipcPort=50020):DataXceiver: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:402) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1255) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092) at java.lang.Thread.run(Thread.java:619) Regards.
Re: No route to host prevents from storing files to HDFS
Hi. The way to diagnose this explicitly is: 1) on the server machine that should be accepting connections on the port, telnet localhost PORT, and telnet IP PORT you should get a connection, if not then the server is not binding the port. 2) on the remote machine verify that you can communicate to the server machine via normal tools such as ssh and or ping and or traceroute, using the IP address from the error message in your log file 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and (3) does not, then there is something blocking packets for the port range in question. If (3) does succeed then there is some probably interesting problem. Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks.
Re: No route to host prevents from storing files to HDFS
Stas- Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n -Matt On Wed, Apr 22, 2009 at 4:36 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. The way to diagnose this explicitly is: 1) on the server machine that should be accepting connections on the port, telnet localhost PORT, and telnet IP PORT you should get a connection, if not then the server is not binding the port. 2) on the remote machine verify that you can communicate to the server machine via normal tools such as ssh and or ping and or traceroute, using the IP address from the error message in your log file 3) on the remote machine run telnet IP PORT. if (1) and (2) succeeded and (3) does not, then there is something blocking packets for the port range in question. If (3) does succeed then there is some probably interesting problem. Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks.
Re: No route to host prevents from storing files to HDFS
Hi. Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n Sure, here it is: Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 eth0 As you might recall, the problematic data node runs in same server as the NameNode. Regards.
Re: No route to host prevents from storing files to HDFS
Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? -Matt On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n Sure, here it is: Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 eth0 As you might recall, the problematic data node runs in same server as the NameNode. Regards.
Re: No route to host prevents from storing files to HDFS
I wonder if this is an obscure case of out of file descriptors. I would expect a different message out of the jvm core On Wed, Apr 22, 2009 at 5:34 PM, Matt Massie m...@cloudera.com wrote: Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? -Matt On Wed, Apr 22, 2009 at 5:22 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Is it possible to paste the output from the following command on both your DataNode and NameNode? % route -v -n Sure, here it is: Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 192.168.253.0 0.0.0.0 255.255.255.0 U 0 00 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 192.168.253.1 0.0.0.0 UG0 00 eth0 As you might recall, the problematic data node runs in same server as the NameNode. Regards. -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422
Re: No route to host prevents from storing files to HDFS
Stas Oskin wrote: Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. Shouldn't you be testing connecting _from_ the datanode? The error you posted is while this DN is trying connect to another DN. Raghu. I agree there is indeed an interesting problem :). Question is how it can be solved. Thanks.
Re: No route to host prevents from storing files to HDFS
Hi again. Other tools, like balancer, or the web browsing from namenode, don't work as well. This because other nodes complain about not reaching the offending node as well. I even tried netcat'ing the IP/port from another node - and it successfully connected. Any advice on this No route to host error? 2009/4/21 Stas Oskin stas.os...@gmail.com Hi. I have quite a strange issue, where one of the datanodes that I have, rejects any blocks with error messages. I looked in the datanode logs, and found the following error: 2009-04-21 16:59:19,092 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(192.168.253.20:50010, storageID=DS-1790181121-127.0.0.1-50010-1239123237447, infoPort=50075, ipcPort=50020):DataXceiver: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:402) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1255) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092) at java.lang.Thread.run(Thread.java:619) 2009-04-21 16:59:31,047 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_1882546734729403703_7805 src: /192.168.253.32:54917 dest: / 192.168.253.32:50010 2009-04-21 16:59:31,048 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_1882546734729403703_7805 received exception java.net.NoRouteToHostException: No route to host Several facts: 1) I use the stable 0.18.3 2) It worked before correctly, before I had an overall power crash which brought down all the machines. 3) This datanode is located on same machine as the NameNode and the SecondaryNameNode. 4) I can ping the machine from itself - no error messages. Any idea what should be done to resolve it? Thanks in advance.
Re: No route to host prevents from storing files to HDFS
Very naively looking at the code, the exception you see is happening in the write path, on the way to sending a copy of your data to a second data node. One data node is pipelining the data to another, and that connection is failing. The fact that DatanodeRegistration is mentioned in the exception is a red herring: that's merely the text that the datanode prints for every exception that's thrown during a server response. It's frustrating that the exception message doesn't actually mention what host it's trying to connect to. Some quick avenues for debugging: It sounds like you've identified a specific data node that isn't behaving. Is the exception that you've pasted in coming from that DataNode or from another? Can you tell if the DataNode is listening on the right ports? You might try sudo netstat -pl | grep java and check to see that the DataNode is listening on 50010 (I believe that's the default). You might also try strace on the process that's showing the no route to host error, to catch the system call fail. You could, of course, instrument the code to do a try/catch around the relevant block in DataNode.java, to find out what host/port the connection is failing on.