Re: my hadoop cluster namenode crashed after modifying the timestamp in some of the nodes

Todd Lipcon Mon, 14 Feb 2011 07:58:25 -0800

Hi Jameson,

My first instinct is that you have an incomplete patch series for hdfs
append, and that's what caused your problem. There were many bug fixes along
the way for hadoop-0.20-append and maybe you've missed some in your manually
patched build.


-Todd

On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li <hovlj...@gmail.com> wrote:

> Hi ,
>
> My hadoop version is basic on hadoop 0.20.2 realase, patched
> HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support
> ganglia31,fairscheduler preemption,hdfs append), and patched
> HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support
> scribe).
>
> Last Friday I found that some of my test hadoop cluster nodes's time
> is not in the normal state, they are some number of hours beyond the
> normal time.
> So I run the next command, and add it to the crontab job.
> /usr/bin/rdate -s time-b.nist.gov
>
> And then my hadoop cluster namenode crashed, after my restarting the
> namenode.
> And I don't know whether it is relationed by modifying the time.
>
> The error log:
> 2011-02-12 18:44:46,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
> blocks = 196
> 2011-02-12 18:44:46,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
> blocks = 0
> 2011-02-12 18:44:46,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> under-replicated blocks = 29
> 2011-02-12 18:44:46,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> over-replicated blocks = 41
> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
> STATE* Leaving safe mode after 69 secs.
> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
> STATE* Safe mode is OFF.
> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
> STATE* Network topology has 1 racks and 5 datanodes
> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
> STATE* UnderReplicatedBlocks has 29 blocks
> 2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 192.168.1.14:50010 to replicate
> blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010
> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 192.168.1.83:50010 to replicate
> blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010
> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 192.168.1.84:50010 to replicate
> blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010
> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 192.168.1.84:50010 to replicate
> blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010
> 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 192.168.1.14:50010 to replicate
> blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010
> 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 192.168.1.83:50010 to replicate
> blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010
> 2011-02-12 18:44:46,889 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> ReplicationMonitor thread received Runtime exception.
> java.lang.IllegalStateException: generationStamp (=1) ==
> GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException:
> generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
>         at
> org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
>         at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
>         at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
>         at java.util.TreeMap.put(TreeMap.java:545)
>         at java.util.TreeSet.add(TreeSet.java:238)
>         at
> org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
>         at java.lang.Thread.run(Thread.java:619)
> 2011-02-12 18:44:46,892 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
> ************************************************************/
>
>
> Thanks,
> Jameson
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: my hadoop cluster namenode crashed after modifying the timestamp in some of the nodes

Reply via email to