[ 
https://issues.apache.org/jira/browse/HDFS-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731583#comment-16731583
 ] 

Ayush Saxena commented on HDFS-14181:
-------------------------------------

Thanx [~sihai] for the patch.

I too agree with [~elgoiri] that we should implement the correct way.

I tried digging into the problem.

Thats something what I understood of the scenario.Let me Know If I get it wrong 
somewhere.

Let the setup be like ::

(Cluster Map)
 1 2 3 4 5 6 7 8 9 10

1-8 is scope

1-4 is excluded scope

2 and 7 is excluded node
 * *Present Scenario ::*

Avlbl node (Present)--> 5,6,8,9,10 (5)

_{Cluster Map - Excluded Scope and Excluded nodes out of Excluded Scope} \{ 10 
- 4 - 1 = 5}_

 

Avlbl node (Expected) ---> 5,6,8 (*3*)

_{Scope - Exclude Scope - Excluded nodes out of Excluded Scope but in scope }_

 

{ 8 - 4 - 1 = 3}
 * *Proposed Solution ::*

Part1 --> 1,3,4,5,6,8 (6)

_{Scope - Excluded Nodes in scope}_

Part2 --> 1,3,4 (3)

_{Excluded Scope - Excluded Nodes in Excluded Scope }_

*Solution Received* =

{ Part1 - Part2 }

6-3 = *3* (*Correct*)

By my understanding the solution seems correct.  :)
 Just I guess If we are expecting this method to be called from somewhere else 
too.We can add a NULL check for scope too in the else part for safety as before 
scope wasn't being used in the block. Secondly Am not very sure regarding 
taking the read Lock.

> Suspect there is a bug in NetworkTopology.java chooseRandom function.
> ---------------------------------------------------------------------
>
>                 Key: HDFS-14181
>                 URL: https://issues.apache.org/jira/browse/HDFS-14181
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>    Affects Versions: 2.9.2
>            Reporter: Sihai Ke
>            Priority: Major
>         Attachments: 0001-add-UT-for-NetworkTopology.patch, 
> 0001-fix-NetworkTopology.java-chooseRandom-bug.patch, HDFS-14181.01.patch, 
> HDFS-14181.02.patch, image-2018-12-29-15-02-19-415.png
>
>
> During reading the hadoop NetworkTopology.java, I suspect there is a bug in 
> function 
> chooseRandom (line 498, hadoop version 2.9.2-RC0), 
>  I think there is a bug in{color:#f79232} code, ~excludedScope doesn't mean 
> availableNodes under Scope node, and I also add unit test for this and get an 
> exception.{color}
> bug code in the else.
> {code:java}
> // code placeholder
>  if (excludedScope == null) {
>     availableNodes = countNumOfAvailableNodes(scope, excludedNodes);
>   } else {
>     availableNodes =
>         countNumOfAvailableNodes("~" + excludedScope, excludedNodes);
>   }{code}
> Source code:
> {code:java}
> // code placeholder
> protected Node chooseRandom(final String scope, String excludedScope,
>     final Collection<Node> excludedNodes) {
>   if (excludedScope != null) {
>     if (scope.startsWith(excludedScope)) {
>       return null;
>     }
>     if (!excludedScope.startsWith(scope)) {
>       excludedScope = null;
>     }
>   }
>   Node node = getNode(scope);
>   if (!(node instanceof InnerNode)) {
>     return excludedNodes != null && excludedNodes.contains(node) ?
>         null : node;
>   }
>   InnerNode innerNode = (InnerNode)node;
>   int numOfDatanodes = innerNode.getNumOfLeaves();
>   if (excludedScope == null) {
>     node = null;
>   } else {
>     node = getNode(excludedScope);
>     if (!(node instanceof InnerNode)) {
>       numOfDatanodes -= 1;
>     } else {
>       numOfDatanodes -= ((InnerNode)node).getNumOfLeaves();
>     }
>   }
>   if (numOfDatanodes <= 0) {
>     LOG.debug("Failed to find datanode (scope=\"{}\" excludedScope=\"{}\")."
>             + " numOfDatanodes={}",
>         scope, excludedScope, numOfDatanodes);
>     return null;
>   }
>   final int availableNodes;
>   if (excludedScope == null) {
>     availableNodes = countNumOfAvailableNodes(scope, excludedNodes);
>   } else {
>     availableNodes =
>         countNumOfAvailableNodes("~" + excludedScope, excludedNodes);
>   }
>   LOG.debug("Choosing random from {} available nodes on node {},"
>       + " scope={}, excludedScope={}, excludeNodes={}. numOfDatanodes={}.",
>       availableNodes, innerNode, scope, excludedScope, excludedNodes,
>       numOfDatanodes);
>   Node ret = null;
>   if (availableNodes > 0) {
>     ret = chooseRandom(innerNode, node, excludedNodes, numOfDatanodes,
>         availableNodes);
>   }
>   LOG.debug("chooseRandom returning {}", ret);
>   return ret;
> }
> {code}
>  
>  
> Add Unit Test in TestClusterTopology.java, but get exception.
>  
> {code:java}
> // code placeholder
> @Test
> public void testChooseRandom1() {
>   // create the topology
>   NetworkTopology cluster = NetworkTopology.getInstance(new Configuration());
>   NodeElement node1 = getNewNode("node1", "/a1/b1/c1");
>   cluster.add(node1);
>   NodeElement node2 = getNewNode("node2", "/a1/b1/c1");
>   cluster.add(node2);
>   NodeElement node3 = getNewNode("node3", "/a1/b1/c2");
>   cluster.add(node3);
>   NodeElement node4 = getNewNode("node4", "/a1/b2/c3");
>   cluster.add(node4);
>   Node node = cluster.chooseRandom("/a1/b1", "/a1/b1/c1", null);
>   assertSame(node.getName(), "node3");
> }
> {code}
>  
> Exception:
> {code:java}
> // code placeholder
> java.lang.IllegalArgumentException: 1 should >= 2, and both should be 
> positive. 
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) 
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:567) 
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:544) 
> atorg.apache.hadoop.net.TestClusterTopology.testChooseRandom1(TestClusterTopology.java:198)
> {code}
>  
> {color:#f79232}!image-2018-12-29-15-02-19-415.png!{color}
>  
>  
> [~vagarychen] this change is imported in PR HDFS-11577, could you help to 
> check whether this is a bug ?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to