when I used nutch1.0 fetch data to hadoop,and I have 1 mater 10 clusters all with 4G Memory,1T Hard disk of ubuntu system. my config is master: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Autogenerated by Cloudera's Configurator for Hadoop 0.1.0 on Fri May 15 06:49:30 2009 --> <configuration>
<property> <name>dfs.block.size</name> <value>134217728</value> <final>true</final> </property> <property> <name>dfs.data.dir</name> <value>/data/filesystem/data</value> <final>true</final> </property> <property> <name>dfs.datanode.du.reserved</name> <value>1073741824</value> <final>true</final> </property> <property> <name>dfs.datanode.handler.count</name> <value>3</value> <final>true</final> </property> <property> <name>dfs.name.dir</name> <value>/data/filesystem/namenode</value> <final>true</final> </property> <property> <name>dfs.namenode.handler.count</name> <value>5</value> <final>true</final> </property> <property> <name>dfs.permissions</name> <value>True</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>fs.checkpoint.dir</name> <value>/data/filesystem/secondary-nn</value> <final>true</final> </property> <property> <name>fs.default.name</name> <value>hdfs://ubuntu76:9000</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> <final>true</final> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> <final>true</final> </property> <property> <name>io.file.buffer.size</name> <value>65536</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx1945m</value> </property> <property> <name>mapred.child.ulimit</name> <value>3983360</value> <final>true</final> </property> <property> <name>mapred.job.tracker</name> <value>ubuntu76:9001</value> </property> <property> <name>mapred.job.tracker.handler.count</name> <value>5</value> <final>true</final> </property> <property> <name>mapred.local.dir</name> <value>${hadoop.tmp.dir}/mapred/local</value> <final>true</final> </property> <property> <name>mapred.map.tasks.speculative.execution</name> <value>true</value> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>10</value> </property> <property> <name>mapred.reduce.tasks</name> <value>10</value> </property> <property> <name>mapred.reduce.tasks.speculative.execution</name> <value>false</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>1</value> <final>true</final> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> <final>true</final> </property> <property> <name>tasktracker.http.threads</name> <value>12</value> <final>true</final> </property> </configuration> slaver <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Autogenerated by Cloudera's Configurator for Hadoop 0.1.0 on Fri May 15 06:49:29 2009 --> <configuration> <property> <name>dfs.block.size</name> <value>134217728</value> <final>true</final> </property> <property> <name>dfs.data.dir</name> <value>/data/filesystem/data</value> <final>true</final> </property> <property> <name>dfs.datanode.du.reserved</name> <value>1073741824</value> <final>true</final> </property> <property> <name>dfs.datanode.handler.count</name> <value>3</value> <final>true</final> </property> <property> <name>dfs.name.dir</name> <value>/data/filesystem/namenode</value> <final>true</final> </property> <property> <name>dfs.namenode.handler.count</name> <value>5</value> <final>true</final> </property> <property> <name>dfs.permissions</name> <value>True</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>fs.checkpoint.dir</name> <value>/data/filesystem/secondary-nn</value> <final>true</final> </property> <property> <name>fs.default.name</name> <value>hdfs://ubuntu76:9000</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> <final>true</final> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> <final>true</final> </property> <property> <name>io.file.buffer.size</name> <value>65536</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx1945m</value> </property> <property> <name>mapred.child.ulimit</name> <value>3983360</value> <final>true</final> </property> <property> <name>mapred.job.tracker</name> <value>ubuntu76:9001</value> </property> <property> <name>mapred.job.tracker.handler.count</name> <value>5</value> <final>true</final> </property> <property> <name>mapred.local.dir</name> <value>/data/filesystem/mapred/local</value> <final>true</final> </property> <property> <name>mapred.map.tasks.speculative.execution</name> <value>true</value> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>10</value> </property> <property> <name>mapred.reduce.tasks</name> <value>10</value> </property> <property> <name>mapred.reduce.tasks.speculative.execution</name> <value>false</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>1</value> <final>true</final> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> <final>true</final> </property> <property> <name>tasktracker.http.threads</name> <value>12</value> <final>true</final> </property> </configuration> but when I excute command "bin/nutch crawl urls -dir crawled -depth 3" in datanode logs file have exception like this: 009-05-15 21:04:10,328 ERROR datanode.DataNode - DatanodeRegistration(113.45.58.77:50010, storageID=DS-1293122987-113.45.58.77-50010-1242435813708, infoPort=50075, ipcPort=50020):DataXceiver org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_5260319812111246094_1002 is valid, and cannot be written to. at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:975) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:97) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2009-05-15 21:04:39,750 WARN datanode.DataNode - DatanodeRegistration(113.45.58.77:50010, storageID=DS-1293122987-113.45.58.77-50010-1242435813708, infoPort=50075, ipcPort=50020):Failed to transfer blk_9220934476097358434_1017 to 113.45.58.78:50010 got java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:418) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:519) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1108) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Connection reset by peer ... 8 more -- View this message in context: http://www.nabble.com/BlockAlreadyExistsException-tp23569580p23569580.html Sent from the Hadoop core-user mailing list archive at Nabble.com.