[jira] [Work logged] (HDFS-16213) Flaky test TestFsDatasetImpl#testDnRestartWithHardLink

ASF GitHub Bot (Jira) Sun, 05 Sep 2021 00:30:08 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16213?focusedWorklogId=646682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646682
 ]


ASF GitHub Bot logged work on HDFS-16213:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Sep/21 07:29
            Start Date: 05/Sep/21 07:29
    Worklog Time Spent: 10m 
      Work Description: virajjasani opened a new pull request #3386:
URL: https://github.com/apache/hadoop/pull/3386


   ### Description of PR
   TestFsDatasetImpl#testDnRestartWithHardLink is flapper:
   ```
   [ERROR] 
testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
  Time elapsed: 7.768 s  <<< FAILURE!
   java.lang.AssertionError
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   ### How was this patch tested?
   Unit testing. The current flaky behaviour is easy to reproduce by running 
the test code twice as part of same test.
   The resolution is to disable the detection as well as deletion of duplicate 
finalized replica by BlockPoolSlice instance.
   
   When Datanode comes up, BPServiceActors handshakes to Namenode and tries to 
initialize Block pool and in the process, it tries to get VolumeMap using 
BlockPoolSlice instance. While doing so, reading replicas from cache fails and 
hence, the thread tries to add Finalized and RBW replicas to 
addReplicaThreadPool fork-join pool in order to build the map. This process 
also tries to identify if there exists any duplicate replica. For this 
particular test, sometimes this process can detect duplicate replica on /data2 
while processing finalized replica of /data1. Hence, before we can confirm 
newReplicaInfo.getBlockURI() exists, finalized replica on /data2 might get 
deleted (rare and flaky case). Although the probability for the thread 
processing the identification and deletion of duplicate finalized replica to be 
faster than main thread is less, it cannot be avoided. Hence, we disable adding 
Finalized and RBW replicas to addReplicaThreadPool in BlockPoolSlice here and 
re-enable it only after we confirm the existence of newReplicaInfo on "/data2" 
ARCHIVE storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 646682)
    Remaining Estimate: 0h
            Time Spent: 10m

> Flaky test TestFsDatasetImpl#testDnRestartWithHardLink
> ------------------------------------------------------
>
>                 Key: HDFS-16213
>                 URL: https://issues.apache.org/jira/browse/HDFS-16213
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failure case: 
> [here|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3359/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
> {code:java}
> [ERROR] 
> testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 7.768 s  <<< FAILURE![ERROR] 
> testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 7.768 s  <<< FAILURE!java.lang.AssertionError at 
> org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16213) Flaky test TestFsDatasetImpl#testDnRestartWithHardLink

Reply via email to