[ https://issues.apache.org/jira/browse/HDFS-15308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095309#comment-17095309 ]
Toshihiko Uchida edited comment on HDFS-15308 at 4/29/20, 10:10 AM: -------------------------------------------------------------------- One possible solution would be to increase dfs.namenode.reconstruction.pending.timeout-sec in the unit test. {code} private void testNNSendsErasureCodingTasks(int deadDN) throws Exception { cluster.shutdown(); final int numDataNodes = dnNum + 1; conf.setInt( DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY, 10); {code} In addition, assertTrue can be simplified to assertEquals, as [~elgoiri] mentioned in HDFS-14353 :) was (Author: touchida): One possible solution would be to increase dfs.namenode.reconstruction.pending.timeout-sec in the unit test. {code} private void testNNSendsErasureCodingTasks(int deadDN) throws Exception { cluster.shutdown(); final int numDataNodes = dnNum + 1; conf.setInt( DFSConfigKeys.DFS_NAMENODE_RECONSTRUCTION_PENDING_TIMEOUT_SEC_KEY, 10); {code} In addition, assertTrue can be simplified to assertEquals, as @Inigo mentioned in HDFS-14353 :) > TestReconstructStripedFile.testNNSendsErasureCodingTasks is flaky > ----------------------------------------------------------------- > > Key: HDFS-15308 > URL: https://issues.apache.org/jira/browse/HDFS-15308 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding > Affects Versions: 3.3.0 > Reporter: Toshihiko Uchida > Priority: Minor > Labels: flaky-test > > In HDFS-14353, TestReconstructStripedFile.testNNSendsErasureCodingTasks > failed once due to pending reconstruction timeout as follows. > {code} > java.lang.AssertionError: Found 4 timeout pending reconstruction tasks > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:502) > at > org.apache.hadoop.hdfs.TestReconstructStripedFile.testNNSendsErasureCodingTasks(TestReconstructStripedFile.java:458) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > The error occurred on the following assertion. > {code} > // Make sure that all pending reconstruction tasks can be processed. > while (ns.getPendingReconstructionBlocks() > 0) { > long timeoutPending = ns.getNumTimedOutPendingReconstructions(); > assertTrue(String.format("Found %d timeout pending reconstruction tasks", > timeoutPending), timeoutPending == 0); > Thread.sleep(1000); > } > {code} > The failure could not be reproduced in the reporter's docker environment > (start-build-environment.sh). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org