Hi,all I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar, and running hadoop on one namenode and 4 slaves. attached is my hadoop-site.xml, and I didn't change the file hadoop-default.xml
when data in segments are large,this kind of errors occure: java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417) at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102) at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646) at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787) at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79) at org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112) at org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130) at org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398) at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56) at org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209) how can I correct this? thanks. Xu
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.map.tasks</name> <value>41</value> <description>The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>mapred.reduce.tasks</name> <value>8</value> <description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>hadoop.tmp.dir</name> <value>/mnt/nutch</value> </property> <property> <name>fs.default.name</name> <value>hdfs://namenode:50001/</value> </property> <property> <name>mapred.job.tracker</name> <value>namenode:50002</value> </property> <property> <name>tasktracker.http.threads</name> <value>80</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>dfs.client.block.write.retries</name> <value>3</value> </property> </configuration>