Hadoop 0.19.1 with -d64 option on Solaris 5.10 and java6 doesn't start with exception "java.io.IOException: Invalid argument"

Alexandra Alecu Fri, 08 May 2009 12:03:00 -0700


Hello,


We have installed Hadoop 0.19.1 on an 8 node cluster for an initial test. We
are planning to store and process data of approx. 400 TB with Hadoop and
HBase.

The Hadoop NameNode is on one of the machines, called s3, and all the nodes
are DataNodes (s2, s3, s4, s5, s6, s7, s8, s9). All the 8 nodes are Sun
machines with SunOS v5.10 and using java6. Java6 on Solaris supports -d64. 

We have set the user who owns the hadoop software to have a limit of 4096 of
file descriptors open at a time.

java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Server VM (build 11.0-b15, mixed mode)
uname -a
SunOS s3 5.10 Generic_138889-08 i86pc i386 i86pc

The cluster starts fine with more or less default configuration, and all
data nodes are detected.
 <property>
    <name>fs.default.name</name>
    <value>hdfs://s3:9000</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>s4:9001</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
   <value><list of data directories></value>
  </property>
  <property>
  <name>dfs.name.dir</name>
  <value>/export/home/hadoop/dfs/name</value>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>1024</value>
  </property>
<property>
 <name>hadoop.logfile.size</name>
 <value>1000000</value>
 <description>The max size of each log file</description>
</property>
<property>
 <name>hadoop.logfile.count</name>
 <value>10</value>
 <description>The max number of log files</description>
</property>


We want to make sure that hadoop works are fast as possible (and also that
Hbase will do) so we want to run it 64-bit mode.

However when specifying -d64 in hadoop-env.sh (export HADOOP_OPTS="-d64")
the cluster does not start anymore.

I extracted the exceptions from the different log files as follows :

- On any of the datanodes which do not host the namenode (i.e. on the logs
for the datanodes on s2, s4-9).

2009-05-08 18:21:46,590 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 0 time(s).
2009-05-08 18:21:47,600 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 1 time(s).
2009-05-08 18:21:48,610 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 2 time(s).
2009-05-08 18:21:49,620 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 3 time(s).
2009-05-08 18:21:50,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 4 time(s).
2009-05-08 18:21:51,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 5 time(s).
2009-05-08 18:21:52,650 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 6 time(s).
2009-05-08 18:21:53,660 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 7 time(s).
2009-05-08 18:21:54,670 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 8 time(s).
2009-05-08 18:21:55,680 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: s3/xx.xx.xx.xx:9000. Already tried 9 time(s).
2009-05-08 18:21:55,683 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
to s3/xx.xx.xx.xx:9000 failed on local exception: java.io.IOException:
Invalid argument
       at org.apache.hadoop.ipc.Client.wrapException(Client.java:732)
       at org.apache.hadoop.ipc.Client.call(Client.java:700)
       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
       at $Proxy4.getProtocolVersion(Unknown Source)
       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:335)
       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:372)
       at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:309)
       at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:286)
       at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:258)
       at
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:205)
       at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1238)
       at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1193)
       at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1201)
       at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1323)
Caused by: java.io.IOException: Invalid argument
       at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
       at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164)
       at
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68)
       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
       at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
       at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135)
       at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
       at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
       at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
       at org.apache.hadoop.ipc.Client.getConnection(Client.java:801)
       at org.apache.hadoop.ipc.Client.call(Client.java:686)
       ... 13 more

- On the datanode which is located on the same machine as the namenode there
is a constant activity so the datanode is actually not shutting down
automatically like the other ones (log keeps on growing, ignoring actually
my maxsize setting) and the exception constantly spit out is:

2009-05-08 18:21:45,753 INFO org.apache.hadoop.net.SocketIOWithTimeout:
Unexpected Exception while clearing selector : java.io.IOException: Invalid
argument
       at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
       at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164)
       at
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68)
       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
       at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
       at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:290)
       at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
       at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
       at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
       at java.io.FilterInputStream.read(FilterInputStream.java:116)
       at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:272)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
       at java.io.DataInputStream.readInt(DataInputStream.java:370)
       at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:494)
       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:439)

- On the namenode log, again there is constant activity. Initially it gives
loads of :
2009-05-08 18:21:47,747 WARN org.apache.hadoop.ipc.Server: Exception in
Responder java.io.IOException: Invalid argument
       at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
       at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164)
       at
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68)
       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
       at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
       at org.apache.hadoop.ipc.Server$Responder.run(Server.java:458)

and then it moves on to :

2009-05-08 18:21:48,500 INFO org.apache.hadoop.net.SocketIOWithTimeout:
Unexpected Exception while clearing selector : java.io.IOException: Invalid
argument
       at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
       at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:164)
       at
sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68)
       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
       at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
       at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:290)
       at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
       at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
       at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
       at java.io.FilterInputStream.read(FilterInputStream.java:116)
       at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:272)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
       at java.io.DataInputStream.readInt(DataInputStream.java:370)
       at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:494)

Does anybody have any idea on what is that i am doing wrong? How otherwise
can I instruct Hadoop to start itself 64-bit mode other than using the
HADOOP_OPTS="-d64"?

Also, a minor issue, why do you think that the logfile maxsize is ignored ?
I keep on getting these huge log files which are difficult to manage. 

Many thanks,
Alexandra Alecu.
Institute of Astronomy
University of Cambridge

-- 
View this message in context: 
http://www.nabble.com/Hadoop-0.19.1-with--d64-option-on-Solaris-5.10-and-java6-doesn%27t-start-with-exception-%22java.io.IOException%3A-Invalid-argument%22-tp23451755p23451755.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Hadoop 0.19.1 with -d64 option on Solaris 5.10 and java6 doesn't start with exception "java.io.IOException: Invalid argument"

Reply via email to