[ https://issues.apache.org/jira/browse/HDFS-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated HDFS-10320: --------------------------- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) [~xiaochen] thanks for the contribution. I have committed the patch to trunk, branch-2 and branch-2.8. > Rack failures may result in NN terminate > ---------------------------------------- > > Key: HDFS-10320 > URL: https://issues.apache.org/jira/browse/HDFS-10320 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Xiao Chen > Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-10320.01.patch, HDFS-10320.02.patch, > HDFS-10320.03.patch, HDFS-10320.04.patch, HDFS-10320.05.patch, > HDFS-10320.06.patch > > > If there're rack failures which end up leaving only 1 rack available, > {{BlockPlacementPolicyDefault#chooseRandom}} may get > {{InvalidTopologyException}} when calling {{NetworkTopology#chooseRandom}}, > which then throws all the way out to {{BlockManager}}'s > {{ReplicationMonitor}} thread and terminate the NN. > Log: > {noformat} > 2016-02-24 09:22:01,514 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-02-24 09:22:01,958 ERROR > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Failed to > find datanode (scope="" excludedScope="/rack_a5"). > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:729) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:694) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:635) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3746) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3711) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1400) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1306) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3682) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3634) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org