[ https://issues.apache.org/jira/browse/HADOOP-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419805#comment-16419805 ]
Xiao Chen edited comment on HADOOP-15317 at 3/29/18 9:40 PM: ------------------------------------------------------------- That was intentional, as that's the number of nodes under the specified scope aka. {{innerNode.getNumOfLeaves()}}, possibly minus {{excludedScope}}'s {{getNumOfLeaves()}}. In [^HADOOP-15317.06.patch]I renamed it to {{totalInScopeNodes}} to make it more intuitive. The word 'available' is confusing, because it would need a definition and will collide with the last param, {{availableNodes}}. Javadoc also has the explanation, LMK if you have an idea on how to improve that. was (Author: xiaochen): That was intentional, as that's the number of nodes under the specified scope aka. {{innerNode.getNumOfLeaves()}}. In [^HADOOP-15317.06.patch]I renamed it to {{totalInScopeNodes}} to make it more intuitive. The word 'available' is confusing, because it would need a definition and will collide with the last param, {{availableNodes}}. Javadoc also has the explanation, LMK if you have an idea on how to improve that. > Improve NetworkTopology chooseRandom's loop > ------------------------------------------- > > Key: HADOOP-15317 > URL: https://issues.apache.org/jira/browse/HADOOP-15317 > Project: Hadoop Common > Issue Type: Bug > Reporter: Xiao Chen > Assignee: Xiao Chen > Priority: Major > Attachments: HADOOP-15317.01.patch, HADOOP-15317.02.patch, > HADOOP-15317.03.patch, HADOOP-15317.04.patch, HADOOP-15317.05.patch, > HADOOP-15317.06.patch, Screen Shot 2018-03-28 at 7.23.32 PM.png > > > Recently we found a postmortem case where the ANN seems to be in an infinite > loop. From the logs it seems it just went through a rolling restart, and DNs > are getting registered. > Later the NN become unresponsive, and from the stacktrace it's inside a > do-while loop inside {{NetworkTopology#chooseRandom}} - part of what's done > in HDFS-10320. > Going through the code and logs I'm not able to come up with any theory > (thought about incorrect locking, or the Node object being modified outside > of NetworkTopology, both seem impossible) why this is happening, but we > should eliminate this loop. > stacktrace: > {noformat} > Stack: > java.util.HashMap.hash(HashMap.java:338) > java.util.HashMap.containsKey(HashMap.java:595) > java.util.HashSet.contains(HashSet.java:203) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:786) > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:732) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:757) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:692) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:666) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:573) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:461) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:368) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:243) > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:115) > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4AdditionalDatanode(BlockManager.java:1596) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:3599) > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:717) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org