[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Fabbri updated HADOOP-13904: ---------------------------------- Attachment: HADOOP-13904-HADOOP-13345.001.patch Attaching v1 patch. It adds new scale tests for DynamoDBMetadataStore and LocalMetadataStore. I think we should get HADOOP-13589 in first, and I am happy to rebase this when that is committed. The included change to the docs (s3guard.md) describes a configuration I used to reliably trigger DynamoDB throttling. I was able to observe both a significant slowdown in the test execution, as well WriteThrottle events in my AWS CloudWatch UI. I also added some instrumentation around our use of DynamoDB's batched write API, as the docs imply that we need to add our own backoff timers there. The output looks like this: {quote} 2017-01-18 15:21:15,930 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 10 retries to complete 2017-01-18 15:21:18,447 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 7 retries to complete 2017-01-18 15:21:20,987 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 6 retries to complete 2017-01-18 15:21:23,530 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 6 retries to complete 2017-01-18 15:21:25,975 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 9 retries to complete 2017-01-18 15:21:28,561 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 6 retries to complete 2017-01-18 15:21:31,037 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 8 retries to complete 2017-01-18 15:21:33,407 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 5 retries to complete 2017-01-18 15:21:35,685 [JUnit-testMoves] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 6 retries to complete {quote} Next I will dig into the AWS SDK source and/or put timing around the retry calls to `batchWriteItemUnprocessed()` to see if (A) the SDK is doing exponential backoff for us, or (B) we need to add a sleep timer in that retry loop. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > ---------------------------------------------------------------------------- > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: HADOOP-13345 > Reporter: Steve Loughran > Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org