steveloughran commented on issue #1814: HADOOP-16823. Manage S3 Throttling exclusively in S3A client. URL: https://github.com/apache/hadoop/pull/1814#issuecomment-579910544 My "little" fix to turn off retries in the AWS client causes issues in the DDB clients where there's a significant mismatch between prepaid IO and load; ITestDynamoDBMetadataStoreScale is the example of this. Looking at the AWS metrics, part of the fun is that the way bursty traffic is handled, you may get your capacity at the time of the initial load, but get blocked after. That is: the throttling may not happen under load, but during the next time a low-load API call is made. Also, S3GuardTableAccess isn't retrying, and some code in tests and the purge/dump table entry points go on to fail when throttling happens when iterating through scans. Fix: you can ask a DDBMetastore to wrap your scan with one bonded to its retry and metrics...plus use of this where appropriate. ITestDynamoDBMetadataStoreScale is really slow; either the changes make it worse, or its always been really slow and we haven't noticed as it was happening during the (slow) parallel test runs. Proposed: we review it, look at what we want to show and then see if we can make things fail faster Latest Patch makes the SDK throttling disablement exclusive to S3, fixed up DDB clients to retry better and tries to make a better case for that ITestDynamoDBMetadataStoreScale suite. I think I'm going to tune those tests to always downgrade if none is detected. ``` [INFO] ------------------------------------------------------- [INFO] Running org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale [ERROR] Tests run: 11, Failures: 5, Errors: 1, Skipped: 0, Time elapsed: 190.404 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale [ERROR] test_030_BatchedWrite(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale) Time elapsed: 10.259 s <<< FAILURE! java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata} at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_030_BatchedWrite(ITestDynamoDBMetadataStoreScale.java:285) [ERROR] test_040_get(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale) Time elapsed: 4.15 s <<< FAILURE! java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata} at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_040_get(ITestDynamoDBMetadataStoreScale.java:341) [ERROR] test_050_getVersionMarkerItem(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale) Time elapsed: 3.311 s <<< FAILURE! java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata} at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_050_getVersionMarkerItem(ITestDynamoDBMetadataStoreScale.java:356) [ERROR] test_070_putDirMarker(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale) Time elapsed: 2.486 s <<< ERROR! org.apache.hadoop.fs.s3a.AWSServiceThrottledException: getVersionMarkerItem on ../VERSION: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 52JGLGQ7B8SLQD3BDQCI9U6NH3VV4KQNSO5AEMVJF66Q9ASUAAJG): The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 52JGLGQ7B8SLQD3BDQCI9U6NH3VV4KQNSO5AEMVJF66Q9ASUAAJG) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:153) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:163) Caused by: com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: 52JGLGQ7B8SLQD3BDQCI9U6NH3VV4KQNSO5AEMVJF66Q9ASUAAJG) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.createMetadataStore(ITestDynamoDBMetadataStoreScale.java:153) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.setup(ITestDynamoDBMetadataStoreScale.java:163) [ERROR] test_090_delete(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale) Time elapsed: 2.804 s <<< FAILURE! java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata} at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_090_delete(ITestDynamoDBMetadataStoreScale.java:462) [ERROR] test_100_forgetMetadata(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale) Time elapsed: 2.278 s <<< FAILURE! java.lang.AssertionError: No throttling detected in Tracker with read throttle events = 0; write throttles = 0; batch throttles = 0; scan throttles = 0 against DynamoDBMetadataStore{region=eu-west-1, tableName=s3guard-metadata, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata} at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.execute(ITestDynamoDBMetadataStoreScale.java:578) at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreScale.test_100_forgetMetadata(ITestDynamoDBMetadataStoreScale.java:478) ``` For the setup failure (here in test_070_putDirMarker); not sure. We either skip the test or retry. It's always surfacing in test_070; test_060 tests list scale. Looking at that code, I think the retry logic is too coarse -it retries the entire list, when we may want to just retry on the hasnext/next calls. That is: push it down. This will avoid so much load on any retry.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org