Hey Edson, Unfortunately I'm not sure what's going on here - for whatever reason, the kernel isn't allowing Java NIO to use epoll, and thus the IPC framework from Hadoop isn't working correctly. I don't think this is a hadoop specific bug.
Does this issue occur on all of the nodes? -Todd On Mon, Mar 29, 2010 at 2:26 PM, Edson Ramiro <erlfi...@gmail.com> wrote: > I'm not involved with Debian community :( > > ram...@h02:~/hadoop$ cat /proc/sys/fs/epoll/max_user_watches > 3373957 > > and the Java is not the OpenSDK. > The version is: > > ram...@lcpad:/usr/lib/jvm/java-6-sun$ java -version > java version "1.6.0_17" > Java(TM) SE Runtime Environment (build 1.6.0_17-b04) > Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) > > Edson Ramiro > > > On 29 March 2010 17:14, Todd Lipcon <t...@cloudera.com> wrote: > > > Hi Edson, > > > > It looks like for some reason your kernel does not have epoll enabled. > It's > > very strange, since your kernel is very recent (in fact, bleeding edge!) > > > > Can you check the contents of /proc/sys/fs/epoll/max_user_watches > > > > Are you involved with the Debian community? This sounds like a general > Java > > bug. Can you also please verify that you're using the Sun JVM and not > > OpenJDK (the debian folks like OpenJDK but it has subtle issues with > > Hadoop) > > You'll have to add a non-free repository and install sun-java6-jdk > > > > -Todd > > > > On Mon, Mar 29, 2010 at 1:05 PM, Edson Ramiro <erlfi...@gmail.com> > wrote: > > > > > I'm using > > > > > > Linux h02 2.6.32.9 #2 SMP Sat Mar 6 19:09:13 BRT 2010 x86_64 GNU/Linux > > > > > > ram...@h02:~/hadoop$ cat /etc/debian_version > > > squeeze/sid > > > > > > Thanks for reply > > > > > > Edson Ramiro > > > > > > > > > On 29 March 2010 16:56, Todd Lipcon <t...@cloudera.com> wrote: > > > > > > > Hi Edson, > > > > > > > > What operating system are you on? What kernel version? > > > > > > > > Thanks > > > > -Todd > > > > > > > > On Mon, Mar 29, 2010 at 12:01 PM, Edson Ramiro <erlfi...@gmail.com> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I'm trying to install Hadoop on a cluster, but I'm getting this > > error. > > > > > > > > > > I'm using java version "1.6.0_17" and hadoop-0.20.1+169.56.tar.gz > > from > > > > > Cloudera. > > > > > > > > > > Its running in a NFS home shared between the nodes and masters. > > > > > > > > > > The NameNode works well, but all nodes try to connect and fail. > > > > > > > > > > Any Idea ? > > > > > > > > > > Thanks in Advance. > > > > > > > > > > ==> logs/hadoop-ramiro-datanode-a05.log <== > > > > > 2010-03-29 15:56:00,168 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 0 time(s). > > > > > 2010-03-29 15:56:01,172 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 1 time(s). > > > > > 2010-03-29 15:56:02,176 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 2 time(s). > > > > > 2010-03-29 15:56:03,180 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 3 time(s). > > > > > 2010-03-29 15:56:04,184 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 4 time(s). > > > > > 2010-03-29 15:56:05,188 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 5 time(s). > > > > > 2010-03-29 15:56:06,192 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 6 time(s). > > > > > 2010-03-29 15:56:07,196 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 7 time(s). > > > > > 2010-03-29 15:56:08,200 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 8 time(s). > > > > > 2010-03-29 15:56:09,204 INFO org.apache.hadoop.ipc.Client: Retrying > > > > connect > > > > > to server: lcpad/192.168.1.51:9000. Already tried 9 time(s). > > > > > 2010-03-29 15:56:09,204 ERROR > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: > java.io.IOException: > > > > Call > > > > > to lcpad/192.168.1.51:9000 failed on local exception: > > > > java.io.IOException: > > > > > Function not implemented > > > > > at > org.apache.hadoop.ipc.Client.wrapException(Client.java:775) > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:743) > > > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > > > > > at $Proxy4.getProtocolVersion(Unknown Source) > > > > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > > > > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346) > > > > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383) > > > > > at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314) > > > > > at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:278) > > > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:225) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1309) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1264) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1272) > > > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) > > > > > Caused by: java.io.IOException: Function not implemented > > > > > at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) > > > > > at > > > sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:68) > > > > > at > > > sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52) > > > > > at > > > > > > > > > > > > > > > > > > > > sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > > > > > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:407) > > > > > at > > > > > > > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304) > > > > > at > > > > > > org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176) > > > > > at > org.apache.hadoop.ipc.Client.getConnection(Client.java:860) > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:720) > > > > > ... 13 more > > > > > > > > > > Edson Ramiro > > > > > > > > > > > > > > > > > > > > > -- > > > > Todd Lipcon > > > > Software Engineer, Cloudera > > > > > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > -- Todd Lipcon Software Engineer, Cloudera