[ 
https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-13904:
----------------------------------
    Attachment: HADOOP-13904-HADOOP-13345.001.patch

Attaching v1 patch.  It adds new scale tests for DynamoDBMetadataStore and 
LocalMetadataStore.  I think we should get HADOOP-13589 in first, and I am 
happy to rebase this when that is committed.

The included change to the docs (s3guard.md) describes a configuration I used 
to reliably trigger DynamoDB throttling.  I was able to observe both a 
significant slowdown in the test execution, as well WriteThrottle events in my 
AWS CloudWatch UI.

I also added some instrumentation around our use of DynamoDB's batched write 
API, as the docs imply that we need to add our own backoff timers there. The 
output looks like this:

{quote}
2017-01-18 15:21:15,930 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
10 retries to complete
2017-01-18 15:21:18,447 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
7 retries to complete
2017-01-18 15:21:20,987 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
6 retries to complete
2017-01-18 15:21:23,530 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
6 retries to complete
2017-01-18 15:21:25,975 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
9 retries to complete
2017-01-18 15:21:28,561 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
6 retries to complete
2017-01-18 15:21:31,037 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
8 retries to complete
2017-01-18 15:21:33,407 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
5 retries to complete
2017-01-18 15:21:35,685 [JUnit-testMoves] INFO  s3guard.DynamoDBMetadataStore 
(DynamoDBMetadataStore.java:processBatchWriteRequest(444)) - Batched write took 
6 retries to complete
{quote}

Next I will dig into the AWS SDK source and/or put timing around the retry 
calls to `batchWriteItemUnprocessed()` to see if (A) the SDK is doing 
exponential backoff for us, or (B) we need to add a sleep timer in that retry 
loop.



> DynamoDBMetadataStore to handle DDB throttling failures through retry policy
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13904
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13904
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13904-HADOOP-13345.001.patch
>
>
> When you overload DDB, you get error messages warning of throttling, [as 
> documented by 
> AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes]
> Reduce load on DDB by doing a table lookup before the create, then, in table 
> create/delete operations and in get/put actions, recognise the error codes 
> and retry using an appropriate retry policy (exponential backoff + ultimate 
> failure) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to