Hello,
We have installed Hadoop 0.19.1 on an 8 node cluster for an initial test. We are planning to store and process data of approx. 400 TB with Hadoop and HBase. The Hadoop NameNode is on one of the machines, called s3, and all the nodes are DataNodes (s2, s3, s4, s5, s6, s7, s8, s9). All the 8 nodes are Sun machines with SunOS v5.10 and using java6. Java6 on Solaris supports -d64. We have set the user who owns the hadoop software to have a limit of 4096 of file descriptors open at a time. java -version java version "1.6.0_10" Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) Server VM (build 11.0-b15, mixed mode) uname -a SunOS s3 5.10 Generic_138889-08 i86pc i386 i86pc The cluster starts fine with more or less default configuration, and all data nodes are detected. <property> <name>fs.default.name</name> <value>hdfs://s3:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>s4:9001</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.data.dir</name> <value><list of data directories></value> </property> <property> <name>dfs.name.dir</name> <value>/export/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>1024</value> </property> <property> <name>hadoop.logfile.size</name> <value>1000000</value> <description>The max size of each log file</description> </property> <property> <name>hadoop.logfile.count</name> <value>10</value> <description>The max number of log files</description> </property> We want to make sure that hadoop works are fast as possible (and also that Hbase will do) so we want to run it 64-bit mode. However when specifying -d64 in hadoop-env.sh (export HADOOP_OPTS="-d64") the cluster does not start anymore. I extracted the exceptions from the different log files as follows : - On any of the datanodes which do not host the namenode (i.e. on the logs for the datanodes on s2, s4-9). 2009-05-08 18:21:46,590 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 0 time(s). 2009-05-08 18:21:47,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 1 time(s). 2009-05-08 18:21:48,610 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 2 time(s). 2009-05-08 18:21:49,620 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 3 time(s). 2009-05-08 18:21:50,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 4 time(s). 2009-05-08 18:21:51,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 5 time(s). 2009-05-08 18:21:52,650 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 6 time(s). 2009-05-08 18:21:53,660 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 7 time(s). 2009-05-08 18:21:54,670 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 8 time(s). 2009-05-08 18:21:55,680 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: s3/xx.xx.xx.xx:9000. Already tried 9 time(s). 2009-05-08 18:21:55,683 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to s3/xx.xx.xx.xx:9000 failed on local exception: java.io.IOException: Invalid argument at org.apache.hadoop.ipc.Client.wrapException(Client.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:700) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:335) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:372) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:309) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:286) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:258) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:205) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1238) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1193) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1201) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1323) Caused by: java.io.IOException: Invalid argument at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method) at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164) at sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88) at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177) at org.apache.hadoop.ipc.Client.getConnection(Client.java:801) at org.apache.hadoop.ipc.Client.call(Client.java:686) ... 13 more - On the datanode which is located on the same machine as the namenode there is a constant activity so the datanode is actually not shutting down automatically like the other ones (log keeps on growing, ignoring actually my maxsize setting) and the exception constantly spit out is: 2009-05-08 18:21:45,753 INFO org.apache.hadoop.net.SocketIOWithTimeout: Unexpected Exception while clearing selector : java.io.IOException: Invalid argument at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method) at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164) at sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:290) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:272) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:494) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:439) - On the namenode log, again there is constant activity. Initially it gives loads of : 2009-05-08 18:21:47,747 WARN org.apache.hadoop.ipc.Server: Exception in Responder java.io.IOException: Invalid argument at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method) at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164) at sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.ipc.Server$Responder.run(Server.java:458) and then it moves on to : 2009-05-08 18:21:48,500 INFO org.apache.hadoop.net.SocketIOWithTimeout: Unexpected Exception while clearing selector : java.io.IOException: Invalid argument at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method) at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164) at sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:290) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:272) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:494) Does anybody have any idea on what is that i am doing wrong? How otherwise can I instruct Hadoop to start itself 64-bit mode other than using the HADOOP_OPTS="-d64"? Also, a minor issue, why do you think that the logfile maxsize is ignored ? I keep on getting these huge log files which are difficult to manage. Many thanks, Alexandra Alecu. Institute of Astronomy University of Cambridge -- View this message in context: http://www.nabble.com/Hadoop-0.19.1-with--d64-option-on-Solaris-5.10-and-java6-doesn%27t-start-with-exception-%22java.io.IOException%3A-Invalid-argument%22-tp23451755p23451755.html Sent from the Hadoop core-user mailing list archive at Nabble.com.