[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875126#comment-15875126 ] Hadoop QA commented on HADOOP-13904: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 37s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 59s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} HADOOP-13345 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 21s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestKDiag | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13904 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12853606/HADOOP-13904-HADOOP-13345.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux 0b6f1a8132dd 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HADOOP-13345 / 8b37b6a | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/11661/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11661/testReport/ | | modules | C: hadoop-common-project/hadoop-common
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875077#comment-15875077 ] Steve Loughran commented on HADOOP-13904: - bq. Are you concerned that the batchWriteItem() implementation in the Dynamo SDK returns auth failures as retryable entries from BatchWriteOutcome#getUnprocessedItems()? Everything I've read implies auth failures are not retryable and would be propagated as usual. sounds like I'm the confused one, aren't I? OK, so this isn't catch + retry, it's all about handlign the specific unprocessed item message that gets sent back. In this case, ignore my comments, I'm wrong. LTGM, +1, pending yetus > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, > HADOOP-13904-HADOOP-13345.004.patch, screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874982#comment-15874982 ] Aaron Fabbri commented on HADOOP-13904: --- {quote} there's some TODO in the code. {quote} I'm removing those and noting places we should add metrics in the JIRA instead (HADOOP-13453). {quote} always good to have the javadoc for a constant to include the {{@value marker, {quote} Will do. I'll roll another patch but first I think we should come to consensus on the next question: {quote} I think the retryBackoff logic should look a bit about what failed. {quote} Here is the change to {{DynamoDBMetadataStore#processBatchWriteRequest()}}: {noformat} @@ -418,7 +437,7 @@ public void move(Collection pathsToDelete, * @param itemsToPut new items to be put; can be null */ private void processBatchWriteRequest(PrimaryKey[] keysToDelete, - Item[] itemsToPut) { + Item[] itemsToPut) throws IOException { final int totalToDelete = (keysToDelete == null ? 0 : keysToDelete.length); final int totalToPut = (itemsToPut == null ? 0 : itemsToPut.length); int count = 0; @@ -449,13 +468,41 @@ private void processBatchWriteRequest(PrimaryKey[] keysToDelete, BatchWriteItemOutcome res = dynamoDB.batchWriteItem(writeItems); // Check for unprocessed keys in case of exceeding provisioned throughput Mapunprocessed = res.getUnprocessedItems(); + int retryCount = 0; while (unprocessed.size() > 0) { +retryBackoff(retryCount++); res = dynamoDB.batchWriteItemUnprocessed(unprocessed); unprocessed = res.getUnprocessedItems(); } } } {noformat} {quote} At the very least, auth failures should be recognised and propagated. {quote} I'm not following this comment. I'm not catching any exceptions. Auth failures would still be propagated, no? Not sure what I'm missing here. Are you concerned that the batchWriteItem() implementation in the Dynamo SDK returns auth failures as retryable entries from {{BatchWriteOutcome#getUnprocessedItems()}}? Everything I've read implies auth failures are not retryable and would be propagated as usual. Let's assume that I'm wrong, and the SDK returns auth failures as a list of unprocessed items (though [the docs|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#BatchOperations] say these should be retried). How do I tell if an unprocessedItem is due to an auth failure? All I get is a {{com.amazonaws.services.dynamodbv2.model.WriteRequest}}, with no differentiator on why it needs to be retried. My understanding is that auth failures are not "retriable" and thus cause an exception from batchWriteItem(), *not*, unprocessed items. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, > screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874507#comment-15874507 ] Steve Loughran commented on HADOOP-13904: - * Again., there's some TODO in the code. Better to add a comment in the relevant JIRA to mention "and patch {{DynamoDBMetadataStore & AbstractITestS3AMetadataStoreScale}}" as one of the work items. There are 557 TODO entries in branch-2: don't add any more unless you are prepared to go through the old ones and fix a couple. ( to be fair, I think some are mine) * {{S3GUARD_DDB_MAX_RETRIES_DEFAULT, MIN_RETRY_SLEEP_MSEC, ... }}: always good to have the javadoc for a constant to include the {{@value}} marker, so the javadocs show what the value is. I think the {{retryBackoff}} logic should look a bit about what failed. At the very least, auth failures should be recognised and propagated. It's really annoying when auth problems trigger failure/retry, and again, too much of the hadoop stack gets this wrong (e.g ZOOKEEPER-2346; We can handle anything else which happens (assuming all connectivity errors are transient), but if the user can't log on, we should fail fast > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, > screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872401#comment-15872401 ] Aaron Fabbri commented on HADOOP-13904: --- Jenkins above is clean except HADOOP-14030 (TestKDiag failure) which should be unrelated. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, > screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869063#comment-15869063 ] Hadoop QA commented on HADOOP-13904: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 25s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 53s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} HADOOP-13345 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 13s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestKDiag | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13904 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12852943/HADOOP-13904-HADOOP-13345.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux 9eed38575087 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HADOOP-13345 / 94287ce | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/11640/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11640/testReport/ | | modules | C: hadoop-common-project/hadoop-common
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868982#comment-15868982 ] Aaron Fabbri commented on HADOOP-13904: --- Filed HADOOP-14087. [~ste...@apache.org] any remaining concerns I didn't address here? (See my [last reply|https://issues.apache.org/jira/browse/HADOOP-13904?focusedCommentId=15852047=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15852047] to your comment.) I got Sean's +1 above but wanted to make sure you are also happy (assuming Yetus comes back clean) > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, HADOOP-13904-HADOOP-13345.003.patch, > screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868949#comment-15868949 ] Sean Mackrory commented on HADOOP-13904: Works for me - I think we're safe. +1! > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868944#comment-15868944 ] Aaron Fabbri commented on HADOOP-13904: --- I did not find the SDK code that implements retries, but I just attached a screenshot showing some of the testing I've done. This is a >30 minute scale test run on a 1/1 capacity DDB table. There were zero exceptions surfaced with the current patch: retry logic is working as expected. I will at least attach a new patch rebased on latest code. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch, screenshot-1.png > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868754#comment-15868754 ] Aaron Fabbri commented on HADOOP-13904: --- Ok.. I started on this, again realizing what a crazy API choice it would have been for the DDB SDK to treat this "nothing was processed" throttling behavior as completely separate from the "some things were processed" throttling case. Handling that would have been pretty complex, with the unprocessedItems only being used in the former case and not the latter (i.e. if you get partially throttled, then fully throttled on first retry, you have to retry the previous unprocessedItems list, etc.) I digged deeper in the AWS SDK docs and found this: {noformat} ProvisionedThroughputExceededException - Your request rate is too high. The AWS SDKs for DynamoDB automatically retry requests that receive this exception. {noformat} This is consistent with my testing: Even with a tiny provisioned throughput and abusive scale test, I never once saw this exception. I did frequently see throttling via the UnprocessedItems return value. So, I now think we should *not* try to handle that exception here. I'm trying to find the SDK code to confirm this behavior. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866260#comment-15866260 ] Sean Mackrory commented on HADOOP-13904: {quote}This patch essentially keeps existing exception behavior but just slows down batch work resubmittal. So I think it is an improvement, but we may have to add a higher-level retry loop for the ProvisionedThroughputExceededException case. Why they don't just return all items as unprocessed is beyond me.{quote} I'm of the opinion that we should be catching that one. It seems required to reasonably and correctly handle the behavior as documented, even though we haven't seen that specific edge case. Everything else sounds good to me... > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852047#comment-15852047 ] Aaron Fabbri commented on HADOOP-13904: --- Thank you for the review. {quote} Yetus is unhappy with it...is it in sync with the branch? {quote} See the end of my previous comment. This is based on top of HADOOP-13876. That one is higher priority to get in than this one (this one is an efficiency issue AFAICT). {quote} the retry policy should really detect and reject the auth failures as non-retryable {quote} I think that already happens here. My retry implementation doesn't catch any exceptions. I'd expect the DDB batch write API to throw exception if we hit an auth failure, which naturally bypasses the retry logic. I've been digging through docs (including [SDK|http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#batchWriteItem-com.amazonaws.services.dynamodbv2.model.BatchWriteItemRequest] and [REST|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#BatchOperations] doc for batch write) and I think this patch is correct, except, I just learned that batch write *may* throw {{ProvisionedThroughputExceededException}} if *none* of the items in the batch could be executed. I could not reproduce this despite abusive testing on a 10 I/O unit provisioned table. {quote} (a) handle interruptions by interrupting thread again, {quote} I wondered about this. Currently I do not catch InterruptedException; I let it propagate. Are you saying I should catch it, set the interrupt flag, break out of the retry loop, and continue execution? {quote} (b) handling any other exception by just returning false to the shouldRetry probe. {quote} This Batch API is a bit of a special case.. Instead of just throwing exceptions for failures, it appears to propagate non-retryable exceptions, but translates retryable ones into BatchWriteItem "unprocessed items". So this patch just slows down resubmittal of the retryable items. This patch essentially keeps existing exception behavior but just slows down batch work resubmittal. So I think it is an improvement, but we may have to add a higher-level retry loop for the {{ProvisionedThroughputExceededException}} case. Why they don't just return all items as unprocessed is beyond me. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851644#comment-15851644 ] Steve Loughran commented on HADOOP-13904: - Yetus is unhappy with it...is it in sync with the branch? * that fix to line 196 of the pom should go into branch-2...submit a separate patch for that and I'll get it in h2. {{retryBackoff}} * the retry policy should really detect and reject the auth failures as non-retryable. Looking @ the s3a block output stream, we get away with it only because you don't get as far as completing a multipart write without having the credentials —though I should add the check there too, to failfast on situations like session credential expiry during a multiday streaming app. * Take a look at {{S3aBlockOutputStream.shouldRetry}} for some things to consider: (a) handle interruptions by interrupting thread again, and (b) handling any other exception by just returning false to the shouldRetry probe. Why? it means the caller can fail with whatever exception caused the initial problem, which is presumably the most useful. other than that, LGTM. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849567#comment-15849567 ] Hadoop QA commented on HADOOP-13904: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} HADOOP-13904 does not apply to HADOOP-13345. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13904 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12850597/HADOOP-13904-HADOOP-13345.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11558/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch, > HADOOP-13904-HADOOP-13345.002.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848103#comment-15848103 ] Aaron Fabbri commented on HADOOP-13904: --- Following up on determining whether or not the DynamoDB SDK (or service) does exponential backoff for us when batched writes get throttled. TL;DR it would be good to add a backoff timer here. I created a table with 10/10 read/write units provisioned. I ran this scale test and added a timer around the batch write retries: {code} StopWatch sw = new StopWatch(); long retryCount = 0; while (unprocessed.size() > 0) { sw.start(); res = dynamoDB.batchWriteItemUnprocessed(unprocessed); unprocessed = res.getUnprocessedItems(); sw.stop(); LOG.info("Retry {} took {} msec", retryCount++, sw.now(TimeUnit.MILLISECONDS)); sw.reset(); } {code} And it looks like we are not getting any backoff from the SDK or service: {noformat} 2017-01-31 23:55:25,534 [JUn... cessBatchWriteRequest(461)) - Retry 0 took 412 msec 2017-01-31 23:55:25,534 [JUn... cessBatchWriteRequest(461)) - Retry 1 took 374 msec 2017-01-31 23:55:25,586 ... :processBatchWriteRequest(461)) - Retry 2 took 51 msec 2017-01-31 23:55:25,643 ... :processBatchWriteRequest(461)) - Retry 3 took 56 msec 2017-01-31 23:55:26,182 ... :processBatchWriteRequest(461)) - Retry 4 took 538 msec 2017-01-31 23:55:26,626 ... :processBatchWriteRequest(461)) - Retry 5 took 444 msec 2017-01-31 23:55:26,672 ... :processBatchWriteRequest(461)) - Retry 6 took 45 msec 2017-01-31 23:55:27,183 ... :processBatchWriteRequest(461)) - Retry 7 took 511 msec 2017-01-31 23:55:27,745 ... :processBatchWriteRequest(461)) - Retry 0 took 499 msec 2017-01-31 23:55:28,130 ... :processBatchWriteRequest(461)) - Retry 1 took 385 msec 2017-01-31 23:55:28,185 ... :processBatchWriteRequest(461)) - Retry 2 took 54 msec 2017-01-31 23:55:28,239 ... :processBatchWriteRequest(461)) - Retry 3 took 53 msec 2017-01-31 23:55:28,627 ... :processBatchWriteRequest(461)) - Retry 4 took 387 msec 2017-01-31 23:55:28,676 ... :processBatchWriteRequest(461)) - Retry 5 took 49 msec {noformat} > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848052#comment-15848052 ] Aaron Fabbri commented on HADOOP-13904: --- The time it takes to run these varies depending on scale parameters. These runs were on my laptop over home internet connection: scale directory count = 2, operation count = 100 28 seconds scale directory count = 3, operation count = 100 142 seconds > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847856#comment-15847856 ] Aaron Fabbri commented on HADOOP-13904: --- Thank you for the helpful review [~ste...@apache.org] {quote} do these run iff `-Dscale` is set? {quote} Yep.. They extend {{S3AScaleTestBase}}. {quote} clearMetadataStore(ms, count); may need to go into a finally clause of testMoves() {quote} Good call. I'll do the same in testPut() as well. I'll retest with those changes and get you timing details as well. These could run in parallel if we ensured they had their own DDB table to work with. I think it would be more simple just to mark them as serial for scale tests. I'll add that as well. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830348#comment-15830348 ] Steve Loughran commented on HADOOP-13904: - good to see this, and nice to see test first dev at work # do these run iff `-Dscale` is set? I'd prefer that, as I've tried to split the slow stuff from the fast stuff in the s3a tests to date # If you want a standard `NanoTimer.operationsPerSecond()` value for printing, feel free to add it > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829212#comment-15829212 ] Hadoop QA commented on HADOOP-13904: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 4m 52s{color} | {color:red} root in HADOOP-13345 failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13904 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12848205/HADOOP-13904-HADOOP-13345.001.patch | | Optional Tests | asflicense mvnsite compile javac javadoc mvninstall unit findbugs checkstyle | | uname | Linux 4767df8557db 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HADOOP-13345 / f10114c | | Default Java | 1.8.0_111 | | mvninstall | https://builds.apache.org/job/PreCommit-HADOOP-Build/11465/artifact/patchprocess/branch-mvninstall-root.txt | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11465/testReport/ | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11465/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829172#comment-15829172 ] Aaron Fabbri commented on HADOOP-13904: --- Also, note that the DynamoDBMetadataStore scale test fails during cleanup. This is because the internal path normalization depends on having a S3AFileSystem instance available. However, I used the new {{MetadataStore#initialize(Configuration)}} interface, which does not set a S3AFileSystem instance in the DynamoDBMetadataStore. This new initialize() logic for DynamoDBMetadataStore seems to be broken, so I will file a JIRA on that. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > Attachments: HADOOP-13904-HADOOP-13345.001.patch > > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829036#comment-15829036 ] Aaron Fabbri commented on HADOOP-13904: --- Taking this JIRA since [~liuml07] is on vacation. I'll post a patch for my MetadataStore scale tests. Would like to get HADOOP-13589 (maven test profiles) in first. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Aaron Fabbri > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828829#comment-15828829 ] Aaron Fabbri commented on HADOOP-13904: --- I'm creating a new scale test on top of MetadataStore so we can drive some load against the DynamoDBMetadataStore and gain some confidence in our retry/backoff stability. I have a basic put() workload working and am adding one around move(). move() makes good use of the DynamoDB batchWriteItem API, which may still need to have exponential backoff added in the retry loop in DynamoDBMetadataStore. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Mingliang Liu > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762606#comment-15762606 ] Mingliang Liu commented on HADOOP-13904: >From >http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.RetryAndBackoff {quote} The AWS SDKs implement automatic retry logic and exponential backoff. Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. Because you aren't trying to avoid such collisions in these cases, you do not need to use this random number. However, if you use concurrent clients, jitter can help your requests succeed faster. For more information, see the blog post for Exponential Backoff and Jitter. {quote} Currently the {{DynamoDBClientFactory}} uses the same max error retry as S3ClientFactory, whose default value is 20. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Mingliang Liu > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13904) DynamoDBMetadataStore to handle DDB throttling failures through retry policy
[ https://issues.apache.org/jira/browse/HADOOP-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749778#comment-15749778 ] Mingliang Liu commented on HADOOP-13904: In [HADOOP-13899], Steve said that: {quote} Presumably mocking is the simplest way to test this {quote} Agreed. DynamoDB Local does not throttle read or write activity. We have pre-defined DynamoDBClientFactory interface so mocking seems the simple way to go. > DynamoDBMetadataStore to handle DDB throttling failures through retry policy > > > Key: HADOOP-13904 > URL: https://issues.apache.org/jira/browse/HADOOP-13904 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Steve Loughran >Assignee: Mingliang Liu > > When you overload DDB, you get error messages warning of throttling, [as > documented by > AWS|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.MessagesAndCodes] > Reduce load on DDB by doing a table lookup before the create, then, in table > create/delete operations and in get/put actions, recognise the error codes > and retry using an appropriate retry policy (exponential backoff + ultimate > failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org