Hi Jameson, My first instinct is that you have an incomplete patch series for hdfs append, and that's what caused your problem. There were many bug fixes along the way for hadoop-0.20-append and maybe you've missed some in your manually patched build.
-Todd On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li <[email protected]> wrote: > Hi , > > My hadoop version is basic on hadoop 0.20.2 realase, patched > HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support > ganglia31,fairscheduler preemption,hdfs append), and patched > HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support > scribe). > > Last Friday I found that some of my test hadoop cluster nodes's time > is not in the normal state, they are some number of hours beyond the > normal time. > So I run the next command, and add it to the crontab job. > /usr/bin/rdate -s time-b.nist.gov > > And then my hadoop cluster namenode crashed, after my restarting the > namenode. > And I don't know whether it is relationed by modifying the time. > > The error log: > 2011-02-12 18:44:46,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of > blocks = 196 > 2011-02-12 18:44:46,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid > blocks = 0 > 2011-02-12 18:44:46,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of > under-replicated blocks = 29 > 2011-02-12 18:44:46,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of > over-replicated blocks = 41 > 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: > STATE* Leaving safe mode after 69 secs. > 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: > STATE* Safe mode is OFF. > 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: > STATE* Network topology has 1 racks and 5 datanodes > 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: > STATE* UnderReplicatedBlocks has 29 blocks > 2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 192.168.1.14:50010 to replicate > blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010 > 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 192.168.1.83:50010 to replicate > blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010 > 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 192.168.1.84:50010 to replicate > blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010 > 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 192.168.1.84:50010 to replicate > blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010 > 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 192.168.1.14:50010 to replicate > blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010 > 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 192.168.1.83:50010 to replicate > blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010 > 2011-02-12 18:44:46,889 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > ReplicationMonitor thread received Runtime exception. > java.lang.IllegalStateException: generationStamp (=1) == > GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException: > generationStamp (=1) == GenerationStamp.WILDCARD_STAMP > at > org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148) > at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156) > at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30) > at java.util.TreeMap.put(TreeMap.java:545) > at java.util.TreeSet.add(TreeSet.java:238) > at > org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357) > at java.lang.Thread.run(Thread.java:619) > 2011-02-12 18:44:46,892 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84 > ************************************************************/ > > > Thanks, > Jameson > -- Todd Lipcon Software Engineer, Cloudera
