[ 
https://issues.apache.org/jira/browse/HDFS-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767980#action_12767980
 ] 

Ashutosh Chauhan commented on HDFS-107:
---------------------------------------

I saw this issue on our small 6-node cluster too. It took a while to identify 
the root cause of the problem. Symptoms were same as described here. In our 
case we have both 18 and 20 installed in our cluster, but we only run 20. A 
user saw the HDFS exception for their job, so they stopped 20 and thought of 
going back to 18 and tried to start it. And then they switched back to 20 
again. In doing all this, version files of datanode and namenode got messed up 
and DNs n NN had different set of information in their version files.  Apart 
from this peculiar usecase, as things are currently in hdfs, I think even one 
small misstep in upgrading the cluster can result in this bug, as is reported 
in previous comments. I think at the cluster startup time namenode and datanode 
should also exchange information contained in version file and in case of 
mismatch, they should reconcile the differences, potentially asking users input 
in case choices are not safe to make.

There are few  workarounds suggested in previous comments. Which one of these 
is recommended one? 


> Data-nodes should be formatted when the name-node is formatted.
> ---------------------------------------------------------------
>
>                 Key: HDFS-107
>                 URL: https://issues.apache.org/jira/browse/HDFS-107
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Konstantin Shvachko
>
> The upgrade feature HADOOP-702 requires data-nodes to store persistently the 
> namespaceID 
> in their version files and verify during startup that it matches the one 
> stored on the name-node.
> When the name-node reformats it generates a new namespaceID.
> Now if the cluster starts with the reformatted name-node, and not reformatted 
> data-nodes
> the data-nodes will fail with
> java.io.IOException: Incompatible namespaceIDs ...
> Data-nodes should be reformatted whenever the name-node is. I see 2 
> approaches here:
> 1) In order to reformat the cluster we call "start-dfs -format" or make a 
> special script "format-dfs".
> This would format the cluster components all together. The question is 
> whether it should start
> the cluster after formatting?
> 2) Format the name-node only. When data-nodes connect to the name-node it 
> will tell them to
> format their storage directories if it sees that the namespace is empty and 
> its cTime=0.
> The drawback of this approach is that we can loose blocks of a data-node from 
> another cluster
> if it connects by mistake to the empty name-node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to