Hi,all
I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
and running hadoop on one namenode and 4 slaves.
attached is my hadoop-site.xml, and I didn't change the file
hadoop-default.xml

when data in segments are large,this kind of errors occure:

java.io.IOException: Could not obtain block: blk_-2634319951074439134_1129 
file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
        at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
        at 
org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1646)
        at 
org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1712)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1787)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:104)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
        at 
org.apache.hadoop.mapred.join.WrappedRecordReader.next(WrappedRecordReader.java:112)
        at 
org.apache.hadoop.mapred.join.WrappedRecordReader.accept(WrappedRecordReader.java:130)
        at 
org.apache.hadoop.mapred.join.CompositeRecordReader.fillJoinCollector(CompositeRecordReader.java:398)
        at 
org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:56)
        at 
org.apache.hadoop.mapred.join.JoinRecordReader.next(JoinRecordReader.java:33)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:165)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)


how can I correct this?
thanks.
Xu

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>


<property>
  <name>mapred.map.tasks</name>
  <value>41</value>
  <description>The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".  
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>8</value>
  <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/nutch</value>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://namenode:50001/</value>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>namenode:50002</value>
</property>

<property>
  <name>tasktracker.http.threads</name>
  <value>80</value>
</property>

<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>2</value>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>2</value>
</property>

<property>
  <name>mapred.output.compress</name>
  <value>true</value>
</property>

<property>
  <name>mapred.output.compression.type</name>
  <value>BLOCK</value>
</property>

<property>
  <name>dfs.client.block.write.retries</name>
  <value>3</value>
</property>



</configuration>

Reply via email to