[ 
https://issues.apache.org/jira/browse/HDDS-2214?focusedWorklogId=321157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-321157
 ]

ASF GitHub Bot logged work on HDDS-2214:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Oct/19 09:56
            Start Date: 01/Oct/19 09:56
    Worklog Time Spent: 10m 
      Work Description: elek commented on pull request #1560: HDDS-2214. 
TestSCMContainerPlacementRackAware has an intermittent failure
URL: https://github.com/apache/hadoop/pull/1560
 
 
   For example from the nightly build:
   {code:java}
     <testcase name="testNoFallback[8]" 
classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware"
 time="0.014">
         
         
               <failure type="java.lang.AssertionError">java.lang.AssertionError
      
         
                at org.junit.Assert.fail(Assert.java:86)
         
         
                at org.junit.Assert.assertTrue(Assert.java:41)
         
         
                at org.junit.Assert.assertTrue(Assert.java:52)
         
         
                at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276)
         
         
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         
         
                at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         
         
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         
         
                at java.lang.reflect.Method.invoke(Method.java:498)
         
         
                at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    {code}
   The problem is in the testNoFallback:
   
   Let's say we have 11 nodes (from parameter) and we would like to choose 5 
nodes (hard coded in the test).
   
   As the first two replicas are chosen from the same rack an all the other 
from different racks it's not possible, so we except a failure.
   
   But we have an assertion that the success count is at least 3. But this is 
true only if the first two replicas are placed to the rack1 (5 nodes) or rack2 
(5nodes). If the replica is placed to the rack3 (one node) it will fail 
immediately:
   
    
   
   Lucky case when we have success count > 3
   {code:java}
    rack1 -- node1 
    rack1 -- node2 -- FIRST replica
    rack1 -- node3 -- SECOND replica
    rack1 -- node4
    rack1 -- node5 
    rack2 -- node6
    rack2 -- node7 -- THIRD replica
    rack2 -- node8
    rack2 -- node9 
    rack2 -- node10
    rack3 -- node11 -- FOURTH replica{code}
    The specific case when we have success count == 1, as we can't choose the 
second replica on rack3 (This is when the test is failing)
   {code:java}
    rack1 -- node1 
    rack1 -- node2
    rack1 -- node3
    rack1 -- node4
    rack1 -- node5 
    rack2 -- node6
    rack2 -- node7
    rack2 -- node8
    rack2 -- node9 
    rack2 -- node10
    rack3 -- node11 -- FIRST replica{code}
    
   
    
   
    
   
    
   
    
   
    
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 321157)
    Remaining Estimate: 0h
            Time Spent: 10m

> TestSCMContainerPlacementRackAware has an intermittent failure
> --------------------------------------------------------------
>
>                 Key: HDDS-2214
>                 URL: https://issues.apache.org/jira/browse/HDDS-2214
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Marton Elek
>            Assignee: Marton Elek
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For example from the nightly build:
> {code:java}
>   <testcase name="testNoFallback[8]" 
> classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware"
>  time="0.014">
>       
>       
>             <failure type="java.lang.AssertionError">java.lang.AssertionError
>    
>       
>               at org.junit.Assert.fail(Assert.java:86)
>       
>       
>               at org.junit.Assert.assertTrue(Assert.java:41)
>       
>       
>               at org.junit.Assert.assertTrue(Assert.java:52)
>       
>       
>               at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276)
>       
>       
>               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       
>       
>               at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       
>       
>               at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       
>       
>               at java.lang.reflect.Method.invoke(Method.java:498)
>       
>       
>               at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  {code}
> The problem is in the testNoFallback:
> Let's say we have 11 nodes (from parameter) and we would like to choose 5 
> nodes (hard coded in the test).
> As the first two replicas are chosen from the same rack an all the other from 
> different racks it's not possible, so we except a failure.
> But we have an assertion that the success count is at least 3. But this is 
> true only if the first two replicas are placed to the rack1 (5 nodes) or 
> rack2 (5nodes). If the replica is placed to the rack3 (one node) it will fail 
> immediately:
>  
> Lucky case when we have success count > 3
> {code:java}
>  rack1 -- node1 
>  rack1 -- node2 -- FIRST replica
>  rack1 -- node3 -- SECOND replica
>  rack1 -- node4
>  rack1 -- node5 
>  rack2 -- node6
>  rack2 -- node7 -- THIRD replica
>  rack2 -- node8
>  rack2 -- node9 
>  rack2 -- node10
>  rack3 -- node11 -- FOURTH replica{code}
>  The specific case when we have success count == 1, as we can't choose the 
> second replica on rack3 (This is when the test is failing)
> {code:java}
>  rack1 -- node1 
>  rack1 -- node2
>  rack1 -- node3
>  rack1 -- node4
>  rack1 -- node5 
>  rack2 -- node6
>  rack2 -- node7
>  rack2 -- node8
>  rack2 -- node9 
>  rack2 -- node10
>  rack3 -- node11 -- FIRST replica{code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to