[jira] [Work logged] (HDDS-1637) Fix random test failure TestSCMContainerPlacementRackAware

ASF GitHub Bot (JIRA) Tue, 04 Jun 2019 07:58:12 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-1637?focusedWorklogId=253790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253790
 ]


ASF GitHub Bot logged work on HDDS-1637:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jun/19 14:57
            Start Date: 04/Jun/19 14:57
    Worklog Time Spent: 10m 
      Work Description: xiaoyuyao commented on pull request #904: HDDS-1637. 
Fix random test failure TestSCMContainerPlacementRackAware.
URL: https://github.com/apache/hadoop/pull/904#discussion_r290342177
 
 

 ##########
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/TestSCMContainerPlacementRackAware.java
 ##########
 @@ -82,7 +82,7 @@ public void setup() {
     when(nodeManager.getNodeStat(anyObject()))
         .thenReturn(new SCMNodeMetric(STORAGE_CAPACITY, 0L, 100L));
     when(nodeManager.getNodeStat(datanodes.get(2)))
-        .thenReturn(new SCMNodeMetric(STORAGE_CAPACITY, 90L, 10L));
+        .thenReturn(new SCMNodeMetric(STORAGE_CAPACITY, 90L, 20L));
 
 Review comment:
   Thanks @ChenSammi  for the patch. I had though about similar test only fix 
yesterday. 
   
   But this may hide a code bug when we deal with mixed nodes that some have 
enough space while others don't. If the chooseNodes() keep choosing the nodes 
without enough capacity for 3 times (default), we end up with failures even 
though other nodes have enough space. Here are two proposals:
   
   1. Filtering the candidate node list without enough capacity so that the 
placement algorithm won't need to deal with it.
   
   2. Handle capacity in the placement algorithm by adding the detected node 
without enough capacity to the exclude node list, so that the algorithm won't 
choose them in the next round. 
   
   I prefer 1 because it seems a cleaner approach and more efficient than 2. 
I'm OK with 2 as well if you feel it is easier to adjust the placement 
algorithm. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 253790)
    Time Spent: 0.5h  (was: 20m)

> Fix random test failure TestSCMContainerPlacementRackAware
> ----------------------------------------------------------
>
>                 Key: HDDS-1637
>                 URL: https://issues.apache.org/jira/browse/HDDS-1637
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This has been seen randomly in latest trunk CI, e.g., 
> [https://ci.anzix.net/job/ozone/16980/testReport/org.apache.hadoop.hdds.scm.container.placement.algorithms/TestSCMContainerPlacementRackAware/testFallback/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1637) Fix random test failure TestSCMContainerPlacementRackAware

Reply via email to