Further update on this:

The name node version becomes 0.18.2-dev, while the data node version is 0.18.1.

I don't think I recompile the code on name node at all. The version number just 
got changed mysteriously. Anyway, I copied the hadoop code from data nodes to 
name node. The problem seems to go away.

So basically my problem was fixed - just hope my experience may help find some 
potential bugs.

Thanks,
-Songting


--- On Mon, 10/27/08, Songting Chen <[EMAIL PROTECTED]> wrote:

> From: Songting Chen <[EMAIL PROTECTED]>
> Subject: namenode failure
> To: core-user@hadoop.apache.org
> Date: Monday, October 27, 2008, 5:36 PM
> Hi,
>   I modified the classpath in hadoop-env.sh in namenode and
> datanodes before shutting down the cluster. Then problem
> appears: I cannot stop hadoop cluster at all. The
> stop-all.sh shows no datanode/namenode, while all the java
> processes are running. 
>   So I manually killed the java process. Now the namenode
> seems to be corrupted and always stays in Safe mode, while
> the datanodes complain the following weird error:
> 
> 2008-10-27 17:28:44,141 FATAL
> org.apache.hadoop.dfs.DataNode: Incompatible build versions:
> namenode BV = ; datanode BV = 694836
> 2008-10-27 17:28:44,244 ERROR
> org.apache.hadoop.dfs.DataNode: java.io.IOException:
> Incompatible build versions: namenode BV = ; datanode BV =
> 694836
>         at
> org.apache.hadoop.dfs.DataNode.handshake(DataNode.java:403)
>         at
> org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:250)
>         at
> org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:190)
>         at
> org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2987)
>         at
> org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2942)
>         at
> org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2950)
>         at
> org.apache.hadoop.dfs.DataNode.main(DataNode.java:3072)
> 
>   My question is how to recover from such failure. And I
> guess the correct practice for changing the CLASSPATH is to
> shut down the cluster, apply the change, restart the
> cluster.
> 
> Thanks,
> -Songting

Reply via email to