[ https://issues.apache.org/jira/browse/HDDS-1908?focusedWorklogId=293280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-293280 ]
ASF GitHub Bot logged work on HDDS-1908: ---------------------------------------- Author: ASF GitHub Bot Created on: 12/Aug/19 17:42 Start Date: 12/Aug/19 17:42 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1282: HDDS-1908. TestMultiBlockWritesWithDnFailures is failing URL: https://github.com/apache/hadoop/pull/1282 ## What changes were proposed in this pull request? Multi-block writes tests are failing most of the time because Ratis leader election timeout is about the same length as the client retry timeout (5 times 1 second). This frequently caused an entire pipeline to be excluded (by `KeyOutputStream.handleException`) just because client gives up before leader is elected. There are only 6 nodes in TestMultiBlockWritesWithDnFailures test, 2 of which is shut down as part of the test. Thus, if this happens, subsequent write fails because new block cannot be allocated. This change decreases leader election timeout and increases client retries. It is basically an extension of [HDDS-1780](https://issues.apache.org/jira/browse/HDDS-1780). Additional changes: * move `testMultiBlockWritesWithIntermittentDnFailures` to `TestMultiBlockWritesWithDnFailures` * remove unused `maxRetries` member * call cluster `shutdown()` regardless of test success/failure (see also [HDDS-1949](https://issues.apache.org/jira/browse/HDDS-1949)) https://issues.apache.org/jira/browse/HDDS-1908 ## How was this patch tested? Ran both test classes 10+ times, without any intermittent failure. ``` [INFO] Running org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 157.086 s - in org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient [INFO] Running org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.308 s - in org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 293280) Time Spent: 10m Remaining Estimate: 0h > TestMultiBlockWritesWithDnFailures is failing > --------------------------------------------- > > Key: HDDS-1908 > URL: https://issues.apache.org/jira/browse/HDDS-1908 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test > Reporter: Nanda kumar > Assignee: Doroszlai, Attila > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > TestMultiBlockWritesWithDnFailures is failing with the following exception > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 30.992 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures > [ERROR] > testMultiBlockWritesWithDnFailures(org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures) > Time elapsed: 30.941 s <<< ERROR! > INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 > blocks. Requested 1 blocks > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:720) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:752) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:248) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:296) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:201) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:376) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:325) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:231) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures.testMultiBlockWritesWithDnFailures(TestMultiBlockWritesWithDnFailures.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org