[ https://issues.apache.org/jira/browse/HADOOP-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559133#comment-16559133 ]
Steve Loughran commented on HADOOP-15426: ----------------------------------------- Though it means that tests can start timing out. Note also that AWS Does do its own internal retrying, passed down from the property {{ "fs.s3a.attempts.maximum"}}. That's just all opaque to us. This extra retry is around that; probably worth working out what is going on underneath to avoid replicating too much. I think it may be best just to crank down the value of {{fs.s3a.s3guard.ddb.max.retries}} to something smaller. Alternatively, shrink the #of retry events in the AWS SDK, so we see it all, and so will our metrics. {code} [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 486.41 s <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractMkdir [ERROR] testMkdirsDoesNotRemoveParentDirectories(org.apache.hadoop.fs.contract.s3a.ITestS3AContractMkdir) Time elapsed: 180.025 s <<< ERROR! java.lang.Exception: test timed out after 180000 milliseconds at java.lang.Thread.sleep(Native Method) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doPauseBeforeRetry(AmazonHttpClient.java:1707) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.pauseBeforeRetry(AmazonHttpClient.java:1681) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1189) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2925) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2901) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeQuery(AmazonDynamoDBClient.java:2166) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.query(AmazonDynamoDBClient.java:2142) at com.amazonaws.services.dynamodbv2.document.internal.QueryCollection.firstPage(QueryCollection.java:53) at com.amazonaws.services.dynamodbv2.document.internal.PageIterator.next(PageIterator.java:45) at com.amazonaws.services.dynamodbv2.document.internal.IteratorSupport.nextResource(IteratorSupport.java:87) at com.amazonaws.services.dynamodbv2.document.internal.IteratorSupport.hasNext(IteratorSupport.java:55) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$listChildren$4(DynamoDBMetadataStore.java:553) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore$$Lambda$30/289224580.execute(Unknown Source) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260) at org.apache.hadoop.fs.s3a.Invoker$$Lambda$18/1225582259.execute(Unknown Source) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317) at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256) at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.listChildren(DynamoDBMetadataStore.java:542) at org.apache.hadoop.fs.s3a.s3guard.DescendantsIterator.next(DescendantsIterator.java:132) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.deleteSubtree(DynamoDBMetadataStore.java:438) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerDelete(S3AFileSystem.java:1788) at org.apache.hadoop.fs.s3a.S3AFileSystem.delete(S3AFileSystem.java:1697) at org.apache.hadoop.fs.contract.ContractTestUtils.rm(ContractTestUtils.java:399) at org.apache.hadoop.fs.contract.ContractTestUtils.cleanup(ContractTestUtils.java:375) at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.deleteTestDirInTeardown(AbstractFSContractTestBase.java:213) at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.teardown(AbstractFSContractTestBase.java:204) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} > Make S3guard client resilient to DDB throttle events and network failures > ------------------------------------------------------------------------- > > Key: HADOOP-15426 > URL: https://issues.apache.org/jira/browse/HADOOP-15426 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.1.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Attachments: HADOOP-15426-001.patch, Screen Shot 2018-07-24 at > 15.16.46.png, Screen Shot 2018-07-25 at 16.22.10.png, Screen Shot 2018-07-25 > at 16.28.53.png > > > managed to create on a parallel test run > {code} > org.apache.hadoop.fs.s3a.AWSServiceThrottledException: delete on > s3a://hwdev-steve-ireland-new/fork-0005/test/existing-dir/existing-file: > com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: > The level of configured provisioned throughput for the table was exceeded. > Consider increasing your provisioning level with the UpdateTable API. > (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of > configured provisioned throughput for the table was exceeded. Consider > increasing your provisioning level with the UpdateTable API. (Service: > AmazonDynamoDBv2; Status Code: 400; Error Code: > ProvisionedThroughputExceededException; Request ID: > RDM3370REDBBJQ0SLCLOFC8G43VV4KQNSO5AEMVJF66Q9ASUAAJG) > at > {code} > We should be able to handle this. 400 "bad things happened" error though, not > the 503 from S3. > h3. We need a retry handler for DDB throttle operations -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org