[ 
https://issues.apache.org/jira/browse/HDFS-16171?focusedWorklogId=637683&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637683
 ]

ASF GitHub Bot logged work on HDFS-16171:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Aug/21 05:02
            Start Date: 13/Aug/21 05:02
    Worklog Time Spent: 10m 
      Work Description: virajjasani edited a comment on pull request #3280:
URL: https://github.com/apache/hadoop/pull/3280#issuecomment-897418170


   Thanks @ferhui for the review.
   
   > This PR tile is different from HDFS-12188
   
   Updated Jira title because testDecommissionStatus test is present in both 
`TestDecommissioningStatus` and `TestDecommissioningStatusWithBackoffMonitor`, 
hence by just mentioning testDecommissionStatus, we are taking care of both 
tests failures.
   
   > Do you explain why test is flaky and how you fix it?
   
   The no of under-replicated blocks on DN2 can either be 3 or 4 depending on 
actual blocks available in Datanode Storage. Hence, in order to make sure that 
once both DN1 and DN2 are decommissioned -- we have 4 under replicated blocks 
-- we need to first wait for total 8 blocks to be reported (including replicas) 
by both DNs together. This is the additional check. Once we make sure of this, 
we won't run in flaky test failures where sometimes due to 1 replica not being 
reported even before we start decommissioning, we might run into case where we 
can't asset all 4 blocks to be under replicated.
   Hence, I have added additional validation before we start decommissioning 
DN1.
   
   > I see you add synchronized to some functions, Does it help to fix flaky 
problems?
   
   Good point, it doesn't solve flaky problem as of now. I just kept it while 
running 2 tests in parallel so that config setup is synchronized but now it is 
not required. I will remove it. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 637683)
    Remaining Estimate: 0h
            Time Spent: 10m

> testDecommissionStatus is flaky (for both TestDecommissioningStatus and 
> TestDecommissioningStatusWithBackoffMonitor)
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16171
>                 URL: https://issues.apache.org/jira/browse/HDFS-16171
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>       at org.junit.Assert.fail(Assert.java:89)
>       at org.junit.Assert.failNotEquals(Assert.java:835)
>       at org.junit.Assert.assertEquals(Assert.java:647)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to