[ 
https://issues.apache.org/jira/browse/HDFS-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687885#comment-13687885
 ] 

Junping Du commented on HDFS-4521:
----------------------------------

Attach a patch for branch-1 and the major differences are:
No changes on TableMapping as it is not existed in branch-1, also related 
tests, like TestSwitchMapping, TestTableMapping, etc.
The changes previously on DataNodeManager now is backport to FSNameSystem.
Some special changes in MiniDFSCluster, include:
 - handle exception thrown in registering DN to NN caused by fault topology
 - adjust the sequence of dataNodes.add(...) and 
DataNode.runDatanodeDaemon(...) in startDataNodes() to make sure failed node is 
tracked by MiniDFSCluster but just failed to started which consist with later 
restart DN logic
 - update shouldWait() method to take care DN registeration failed case.
Colin and ATM, would you help to review it? Chuan, it is great if you can help 
to verify if this patch fix the problem you met in branch-1. Thanks!
                
> invalid network toploogies should not be cached
> -----------------------------------------------
>
>                 Key: HDFS-4521
>                 URL: https://issues.apache.org/jira/browse/HDFS-4521
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta, 1.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>             Fix For: 2.1.0-beta
>
>         Attachments: HDFS-4521.001.patch, HDFS-4521.002.patch, 
> HDFS-4521.005.patch, HDFS-4521.006.patch, HDFS-4521.008.patch, 
> HDFS-4521-branch1.patch
>
>
> When the network topology is invalid, the DataNode refuses to start with a 
> message such as this:
> {quote}
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode from 
> 172.29.122.23:55886: error:
> org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid 
> network topology. You cannot have a rack and a non-rack node at the same 
> level of the network topology.
> {quote}
> This is expected if you specify a topology file or script which puts leaf 
> nodes at two different depths.  However, one problem we have now is that this 
> incorrect topology is cached forever.  Once the NameNode sees it, this 
> DataNode can never be added to the cluster, since this exception will be 
> rethrown each time.  The NameNode will not check to see if the topology file 
> or script has changed.  We should clear the topology mappings when there is 
> an InvalidTopologyException, to prevent this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to