sodonnel commented on a change in pull request #2591:
URL: https://github.com/apache/ozone/pull/2591#discussion_r697977474



##########
File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementRackAware.java
##########
@@ -524,4 +527,28 @@ public void 
testvalidateContainerPlacementSingleRackCluster() {
     assertTrue(stat.isPolicySatisfied());
     assertEquals(0, stat.misReplicationCount());
   }
+
+  @Test
+  public void testOutOfServiceNodesNotSelected() {
+    // Set all the nodes to out of service
+    for (DatanodeInfo dn : dnInfos) {
+      dn.setNodeStatus(new NodeStatus(DECOMMISSIONED, HEALTHY));
+    }
+
+    for (int i=0; i<10; i++) {
+      // Set a random DN to in_service and ensure it is always picked
+      int index = new Random().nextInt(dnInfos.size());
+      dnInfos.get(index).setNodeStatus(NodeStatus.inServiceHealthy());
+      try {
+        List<DatanodeDetails> datanodeDetails =
+            policy.chooseDatanodes(null, null, 1, 0, 0);
+        Assert.assertEquals(dnInfos.get(index), datanodeDetails.get(0));
+      } catch (SCMException e) {
+        // If we get SCMException: No satisfied datanode to meet the ... this 
is

Review comment:
       The retry count in the provider is hard coded at the moment, and this is 
not a realistic real-world scenario. If there are only a few healthy nodes out 
of 100's in a cluster, then the cluster has much bigger problems. If we want to 
change the behaviour of the provider, then we should discuss that in a new Jira.
   
   The contact of the provider is to provide the number of requested nodes, or 
throw the SCM Exception. So either we get the single IN_SERVICE node, or we get 
the exception. The key is that we never get the out_of_service node, which is 
the defect the original change was fixing. If you removed the fix for the 
original issue, this test would always fail as it is most likely to find the 
out_of_service node each time, and over 10 tries in the loop it would almost 
certainly find an out_of_service node and fail.
   
   I cannot think of a better way to test this without changing the provider 
code (which I don't want to do in this Jira), but if you have a better idea 
that will never be flaky I am happy to use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to