Hi All
when I use nutch-1.0 fetching clusters datanode exception:
2009-04-04 14:00:27,858 ERROR datanode.DataNode -
DatanodeRegistration(123.15.51.84:50010,
storageID=DS-1424448517-123.15.51.84-50010-1238866110775, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2009-04-04 15:13:59,245 ERROR datanode.DataNode -
DatanodeRegistration(123.15.51.84:50010,
storageID=DS-1424448517-123.15.51.84-50010-1238866110775, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
and my config
master
ubuntu3
slaves
ubuntu6
ubuntu7
urllist.txt
http://www.163.com
crawl-urlfilter.txt
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*163.com/
hadoop-env.sh
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/home/hadoop/jdk6
export HADOOP_HOME=/home/hadoop/search
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
# Extra ssh options. Default: '-o ConnectTimeout=1 -o
SendEnv=HADOOP_CONF_DIR'.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
# Where log files are stored. $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids
# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER
hadoop-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu3:9000/</value>
<description>
The name of the default file system. Either the literal string
"local" or a host:port for NDFS.
</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://ubuntu3:9001/</value>
<description>
The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and
reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/filesystem/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/filesystem/data</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/hadoop/filesystem/mapreduce/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/filesystem/mapreduce/local</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>
</configuration>
nutch-site.xml
configuration>
<property>
<name>http.robots.agents</name>
<value>www.baidu.com</value>
</property>
<property>
<name>http.agent.name</name>
<value>www.baidu.com</value>
</property>
<property>
<name>http.agent.url</name>
<value>www.baidu.com</value>
</property>
</configuration>
and my limit open files set to 65535
--
View this message in context:
http://www.nabble.com/nutch-1.0-datanode-exception-when-fetching-tp22901052p22901052.html
Sent from the Nutch - User mailing list archive at Nabble.com.